Workload Performance Architect (Multiple Locations)
Posted on: September 24, 2022
AI is redefining the computing paradigm. The new paradigm
computation demand is incommensurable with the existing software
and hardware criteria. The best AI solutions require unifying the
innovations in the software programming model, compiler technology,
heterogenous computation platform, networking technology, and
semiconductor process and packaging technology. Tenstorrent drives
the innovations through holistic views of each technological
component in software and hardware to unify them to create the best
As a performance architect in the dynamic and motivated Tenstorrent
Platform Architecture team, you will work in a cross-functional
team on ML software stacks, HPC and general purpose workloads,
graph compiler, cache coherence protocols, superscalar CPU,
fabric/interconnection, networking, and DPU.
Collaborate with the software and platform architecture teams to
understand hardware requirements for AI accelerator compiler, OS,
video/image/voice processing, security, networking, and
virtualization technology. Identify the application performance
bottlenecks and functional requirements.
Perform full-stack workload characterization and performance
analysis for AI, HPC, and CPU general-purpose applications.
Identify representative benchmarks for the workloads. Perform
data-driven analysis based on software profiling, performance model
simulation, or analytical models to evaluate software and
architecture solutions to PPA.
Set CPU architecture direction based on the data analysis and work
with a cross-functional team to achieve the best hardware/software
solutions to meet PPA goals.
Characterizing real-world workloads, conducting end-to-end system
performance analysis and workload decomposition to gather
requirements for SoC solutions. Generate representative CPU,
accelerators, and SoC traces for the performance model to study PPA
impacts and guide architecture decisions.
Work with Tenstorrent's graph compiler team and LLVM/GCC open
source community to drive AI/CPU performance improvements. Identify
the compiler optimization and align architecture and the compiler
teams for implementing the improvements.
Drive analysis and correlation of performance feature both pre and
Experience and qualifications:
BS/MS/PhD in EE/ECE/CE/CS
Strong background in CPU ISA, u-architecture research, and
Understanding SOC fabric, coherency protocols, memory technology,
and accelerator technology is a plus.
Familiar with program tracing flows (SIMPOINT, SMART,..) to capture
traces for applications.
Strong understanding of ML/AI algorithms, GCC and LLVM compilers,
and OS kernel.
Proficient in C/C++ programming. Experience in the development of
highly efficient C/C++ performance models.
We have presence in Toronto, Austin, Santa Clara, Portland, and
Raleigh. We are open to remote candidates on a case by case
Tenstorrent offers a highly competitive compensation package and
benefits, and we are an equal opportunity employer.
Keywords: Tenstorrent, Austin , Workload Performance Architect (Multiple Locations), Professions , Austin, Texas
Didn't find what you're looking for? Search again!