Chaos & Reliability Engineer 2 (remote)
Posted on: May 16, 2022
Are you interested in using the hottest tech to deploy
containers and manage large clusters of machines where all an
enterprise's critical apps run? Do you take joy in figuring out how
to make applications more reliable?
Nordstrom is investing in how we design and deploy software. We are
obsessed with creating a joyful and automatic DevOps experience,
consistent across clouds and data centers, for our users.As a Chaos
and Resiliency Engineer, you will be working closely with
Application Development teams and Platform teams to guide them in
optimizing their applications and platforms to deliver
best-in-class reliability to Nordstrom's applications. Our Load and
Resiliency tests are business-critical components supporting our
customer's experience. In addition, the CRE team is responsible for
providing capabilities that engineering teams can use to ensure its
software is scalable, highly available, and resilient.
A day in the life...Design, maintain and optimize the Go-based Load
Testing Engine for Nordstrom applications. Research and develop
chaos and reliability tests into the CI pipeline. Work with
development teams to integrate their systems into routine
load/reliability testing. Contribute code back to open-source
projects to fix bugs and add features needed by our team. Build
sensors to monitor load/reliability tests; troubleshoot and fix the
load/reliability tests when they break. Share the responsibility of
being on-call and conducting system architecture reviews.
You own this if you have...Proficiency with software development in
a well-known language, Go/Python/Rust strongly preferred Experience
with Chaos Engineering tools like Gremlin, Litmus, Chaos Toolkit.
Experience with managing and scaling workloads on Kubernetes or
another Container Orchestration Platform (e.g., OpenShift).
Familiarity with networking, APIs, Kafka, and distributed systems
is a plus. Ability to debug, optimize, and automate routine tasks.
Experience with Dispatcher / Worker architecture and experience
with secure vault concepts and tools. Strong interest in SRE topics
like SLOs, Resiliency, Scaling, and Performance. Strong interest in
Chaos Engineering topics like Failure Mode, Fault Analysis,
Alerting, and Synthetic Failures. Eager to learn and soak in new
We've got you covered---Our employees are our most important asset
and that's reflected in our benefits. Nordstrom is proud to offer a
variety of benefits to support employees and their families,
- Medical/Vision, Dental, Retirement and Paid Time Away
- Life Insurance and Disability
- Merchandise Discount and EAP Resources
A few more important points...The job posting highlights the most
critical responsibilities and requirements of the job. It's not
all-inclusive. There may be additional duties, responsibilities and
qualifications for this job.Nordstrom will consider qualified
applicants with criminal histories in a manner consistent with all
Keywords: Nordstrom, Austin , Chaos & Reliability Engineer 2 (remote), Engineering , Austin, Texas
Didn't find what you're looking for? Search again!