Staff DevOps Engineer
Company: webAI
Location: Austin
Posted on: April 1, 2026
|
|
|
Job Description:
About Us: webAI is pioneering the future of artificial
intelligence by establishing the first distributed AI
infrastructure dedicated to personalized AI. We recognize the
evolving demands of a data-driven society for scalability and
flexibility, and we firmly believe that the future of AI lies in
distributed processing at the edge, bringing computation closer to
the source of data generation. Our mission is to build a future
where a company's valuable data and intellectual property remain
entirely private, enabling the deployment of large-scale AI models
directly on standard consumer hardware without compromising the
information embedded within those models. We are developing an
end-to-end platform that is secure, scalable, and fully under the
control of our users, empowering enterprises with AI that
understands their unique business. We are a team driven by truth,
ownership, tenacity, and humility , and we seek individuals who
resonate with these core values and are passionate about shaping
the next generation of AI. About the Role: We are seeking a Staff
DevOps Engineer to architect, build, and scale secure
infrastructure for deploying AI workloads across cloud and edge
environments. This is a high-impact, staff-level individual
contributor role where you will drive infrastructure strategy, lead
technical initiatives, and serve as the subject matter expert on
cloud architecture, security best practices, and platform
reliability. You will design scalable, automated infrastructure
solutions that enable our AI platform to operate efficiently across
diverse deployment scenarios—from public cloud to on-premises and
edge computing environments. This role requires deep technical
expertise, architectural thinking, and the ability to translate
complex requirements into production-ready infrastructure
automation. Responsibilities: Design and architect secure, scalable
cloud and edge infrastructure for deploying AI workloads across
multi-cloud (AWS, Azure, GCP) and hybrid environments Build and
maintain production-grade Infrastructure as Code (IaC) using
Terraform, Ansible, or Pulumi, managing 100 resources with GitOps
workflows and automated validation Design and operate production
Kubernetes clusters optimized for AI/ML workloads with GPU support,
implementing container security, multi-tenancy, and resource
optimization Implement secure CI/CD pipelines with integrated
security controls (SAST, DAST, vulnerability scanning, secrets
management) and automated deployment workflows for containerized AI
models Lead MLOps infrastructure initiatives including model
deployment pipelines, versioning, feature stores, experiment
tracking, and monitoring for model performance and drift Design
comprehensive observability and monitoring using Prometheus,
Grafana, ELK, or Datadog with distributed tracing, APM, and
real-time alerting aligned to SLIs/SLOs Implement security best
practices including least-privilege access, encryption at rest/in
transit, network segmentation, and automated compliance validation
Lead incident response and reliability initiatives , participate in
on-call rotation, conduct post-mortems, and drive continuous
improvement for system reliability Architect disaster recovery and
business continuity strategies with automated backup, failover, and
recovery processes Develop reusable infrastructure modules and
templates to accelerate environment provisioning and standardize
deployment patterns across teams Mentor mid-level and senior
engineers on cloud architecture, DevOps best practices, and
platform reliability through design reviews and technical guidance
Drive technical documentation and knowledge sharing including
runbooks, architecture decision records (ADRs), and infrastructure
standards Qualifications: 7 years of hands-on experience in DevOps,
Site Reliability Engineering, or Infrastructure Engineering with
proven track record of architecting production systems Expert-level
proficiency with Docker, Kubernetes (CKA/CKAD preferred), and
cloud-native technologies in production environments 5 years
implementing Infrastructure as Code with Terraform, Ansible, or
Pulumi, managing large-scale (50) cloud resources Deep experience
with cloud platforms (AWS, Azure, or GCP) including compute,
networking, storage, and managed services Proven experience
building and scaling CI/CD pipelines with integrated security
controls (GitHub Actions, GitLab CI, Jenkins, ArgoCD) Strong
programming skills in Python (preferred for automation), Bash, or
Go for infrastructure tooling and automation Production experience
with observability and monitoring tools: Prometheus, Grafana, ELK,
CloudWatch, Datadog, or similar Experience with MLOps workflows :
model deployment automation, versioning, and lifecycle management
Demonstrated experience with GitOps methodologies and declarative
infrastructure management Strong understanding of security best
practices : encryption, secrets management, identity and access
management (IAM), network security Excellent written and verbal
communication skills for technical documentation and
cross-functional collaboration Preferred Skills: Experience
architecting multi-cloud or hybrid cloud environments with
portability and interoperability considerations Hands-on experience
deploying large language models (LLMs) or transformer models at
scale with model serving infrastructure Expertise in Zero Trust
architecture and modern security patterns for cloud-native
applications Experience with service mesh technologies (Istio,
Linkerd) for microservices communication and observability Strong
understanding of AI/ML infrastructure : feature stores, model
registries, A/B testing infrastructure, and model monitoring
Experience with edge computing deployments and distributed system
architectures Cost optimization expertise : FinOps practices,
resource rightsizing, and cloud cost management Experience
mentoring or leading technical initiatives across engineering teams
Certifications : CKA, CKAD, Terraform Associate, AWS Solutions
Architect, Azure Administrator, or GCP Professional Cloud Architect
Core Values: We at webAI are committed to living out the core
values we have put in place as the foundation on which we operate
as a team. We seek individuals who exemplify the following: Truth -
Emphasizing transparency and honesty in every interaction and
decision. Ownership - Taking full responsibility for one’s actions
and decisions, demonstrating commitment to the success of our
clients. Tenacity - Persisting in the face of challenges and
setbacks, continually striving for excellence and improvement.
Humility - Maintaining a respectful and learning-oriented mindset,
acknowledging the strengths and contributions of others. Benefits:
Competitive salary and performance-based incentives. Comprehensive
health, dental, and vision benefits package. 401k Match (US-based
only) $200/mos Health and Wellness Stipend $400/year Continuing
Education Credit $500/year Function Health subscription (US-based
only) Free parking, for in-office employees Unlimited Approved PTO
Parental Leave for Eligible Employees Supplemental Life Insurance
webAI is an Equal Opportunity Employer and does not discriminate
against any employee or applicant on the basis of age, ancestry,
color, family or medical care leave, gender identity or expression,
genetic information, marital status, medical condition, national
origin, physical or mental disability, protected veteran status,
race, religion, sex (including pregnancy), sexual orientation, or
any other characteristic protected by applicable laws, regulations
and ordinances. We adhere to these principles in all aspects of
employment, including recruitment, hiring, training, compensation,
promotion, benefits, social and recreational programs, and
discipline. In addition, it is the policy of webAI to provide
reasonable accommodation to qualified employees who have protected
disabilities to the extent required by applicable laws, regulations
and ordinances where a particular employee works.
Keywords: webAI, Austin , Staff DevOps Engineer, IT / Software / Systems , Austin, Texas