Quality & Reliability Engineer, Trainium Manufacturing, Quality & Reliability
Company: Amazon
Location: Austin
Posted on: April 3, 2026
|
|
|
Job Description:
The Trainium Manufacturing, Quality and Reliability (MQR) Team
is part of AWS Annapurna Labs focused on Machine Learning products
that designs cutting AI platforms for the world’s largest Cloud
Services provider. As a Senior Reliability Engineer you will engage
with an experienced cross-disciplinary staff to conceive and design
infrastructure technologies. You will work closely with an internal
inter-disciplinary team, and outside partners to drive key aspects
of product definition, execution and test in manufacturing. A
successful candidate will be responsive, flexible and able to
succeed within an open collaborative peer environment. You will: *
Be responsible for the test validation of future technologies. *
Drive manufacturing process improvements to address reliability
issues and concerns. * You will have a fundamental understanding of
Reliability statistics/Reliability tests and/or solid understanding
of computer systems to influence design for reliability. * Lead
identifying and validating product/component risks and work with
design teams to mitigate them and define the test methodology and
test coverage to assure product reliability. * Deep-dive in
technologies aligned with product roadmap. * Provide technical
leadership and mentor engineers. * Perform Reliability prediction
of failure mechanisms, products under development and products in
the field. * Working with multiple vendors and ODMs to standardize
component manufacturing and reliability expectations. Key job
responsibilities * Responsible for defining reliability tests to be
implemented during manufacturing * Drive manufacturing process
improvements to address reliability issues and concerns. * Perform
Reliability prediction of failure mechanisms, products under
development and products in the field. * Working with multiple
vendors and ODMs to standardize component manufacturing and
reliability expectations. About the team Annapurna Labs is a wholly
owned subsidiary of AWS, focused on developing custom silicon and
servers including the Nitro, Graviton, Inferentia, and Trainium
families of processors. Machine Learning Annapurna (MLA) functions
as a vertically integrated team including software, firmware,
hardware, and silicon design in a single organization. We are the
Training Servers and Systems organization under MLA focused on
Hardware Development, Software Development, Fleet Ops Systems, and
Manufacturing, Quality, and Reliability. This position is in the
Manufacturing, Quality and Reliability team. - Bachelor's or
Master's degree in Electrical/Mechanical Engineering, Physics or a
related field with a focus on Reliability or equivalent experience
- 7 years of reliability engineering work experience with server
compute platforms or on high-tech hardware - Experience working in
a fast-paced environment similar to a high-tech start-up -
Reliability modeling and materials characterization experience -
Master's Degree or PhD in Reliability Engineering or related field
- Demonstrated ability to uncover systemic issues prior to new
product introduction - Working understanding of server
subcomponents (CPU, GPU, memory, HDD, SSD, motherboard, thermal
system, peripherals, etc.) - Analytical, test plan and test
procedure development experience related to server compute
platforms or with high-tech hardware Amazon is an equal opportunity
employer and does not discriminate on the basis of protected
veteran status, disability, or other legally protected status. Los
Angeles County applicants: Job duties for this position include:
work safely and cooperatively with other employees, supervisors,
and staff; adhere to standards of excellence despite stressful
conditions; communicate effectively and respectfully with
employees, supervisors, and staff to ensure exceptional customer
service; and follow all federal, state, and local laws and Company
policies. Criminal history may have a direct, adverse, and negative
relationship with some of the material job duties of this position.
These include the duties and responsibilities listed above, as well
as the abilities to adhere to company policies, exercise sound
judgment, effectively manage stress and work safely and
respectfully with others, exhibit trustworthiness and
professionalism, and safeguard business operations and the
Company’s reputation. Pursuant to the Los Angeles County Fair
Chance Ordinance, we will consider for employment qualified
applicants with arrest and conviction records. Our inclusive
culture empowers Amazonians to deliver the best results for our
customers. If you have a disability and need a workplace
accommodation or adjustment during the application and hiring
process, including support for the interview or onboarding process,
please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more
information. If the country/region you’re applying in isn’t listed,
please contact your Recruiting Partner. The base salary range for
this position is listed below. Your Amazon package will include
sign-on payments and restricted stock units (RSUs). Final
compensation will be determined based on factors including
experience, qualifications, and location. Amazon also offers
comprehensive benefits including health insurance (medical, dental,
vision, prescription, Basic Life & AD&D insurance and option
for Supplemental life plans, EAP, Mental Health Support, Medical
Advice Line, Flexible Spending Accounts, Adoption and Surrogacy
Reimbursement coverage), 401(k) matching, paid time off, and
parental leave. Learn more about our benefits at
https://amazon.jobs/en/benefits . USA, CA, Cupertino - 157,300.00 -
212,800.00 USD annually USA, TX, Austin - 136,000.00 - 184,000.00
USD annually USA, WA, Seattle - 136,000.00 - 184,000.00 USD
annually
Keywords: Amazon, Austin , Quality & Reliability Engineer, Trainium Manufacturing, Quality & Reliability, Engineering , Austin, Texas