Site Reliability Engineer (SRE)
Company: IBM
Location: Austin
Posted on: March 19, 2023
Job Description:
IntroductionAt IBM, work is more than a job - it's a calling: To
build. To design. To code. To consult. To think along with clients
and sell. To make markets. To invent. To collaborate. Not just to
do something better, but to attempt things you've never thought
possible. Are you ready to lead in this new era of technology and
solve some of the world's most challenging problems? If so, lets
talk.Your Role and ResponsibilitiesAs a Storage Platform
Engineer/SRE, you will be part of the Cirrus Hybrid Cloud storage
support team responsible for ensuring the architectural integrity
and successful delivery of a scalable storage platform for the IBM
CIO Organization. In this role you will focus on the management of
storage for Cirrus Hybrid Cloud. This entails working on all
aspects of disk management and monitoring, including disk
initialization, fault monitoring and handling, and reporting. You
will be tasked with solving intriguing problems while partnering
with other team members, customers, and vendors. To find success,
you will need a strong Linux development background and a passion
for learning and continuous improvement.What you'll do
- Management, maintenance, and support of various data storage
solutions especially in the RedHat OpenShift Virtualization (OSV)
environment.
- Plan, coordinate and upgrade components of the SAN/NAS/TSM as
needed with software, hardware and microcode upgrades
- Incorporate storage replication into Disaster Recovery (DR)
solutions
- Operate in an agile manner and under strict change control
- Engage on technical level discussions around data center
solutions and storage integration
- Design, implement, and manage integrations between internal and
external solutions, as well as storage-related monitoring
solutions
- Handle storage provisioning tasks, for example, end user
requests, creation of volumes, mapping and ensuring storage
availability at the operating system, storage performance analysis,
troubleshooting, storage capacity management and planning
- Provision VMWare and Physical LUNS, applying patches, upgrading
software, performance, capacity planning and ensuring data is
secured
- Monitor for errors, hardware failures, optimization, capacity,
and performance of storage arrays and associated networking to
ensure and restore normal operations
- Maintain the fiber channel switches and SAN configurations,
especially Brocade-compatible, by ensuring that software and
firmware are in keeping with the latest needs of an
organization
- Provide on-call support and implementation after-hours on a
rotating basis
- Think and act like a Site Reliability Engineer (SRE) as the
environment relates to storage and depends on storageMost of our
teams are located in Atlanta, GA; Austin, TX; Boston Metro, MA; New
York, NY; Raleigh, NC; Armonk, NY; and Southbury, CT. However,
since the nature of our work is hybrid, we welcome applications
from other locations.Required Technical and Professional Expertise
- OpenLDAP administration
- Apache Server Administration
- Samba (aka: SMB) administration
- AIX administration
- Websphere Load Balancer administrationPreferred Technical and
Professional Expertise
- BS degree in computer science or similar technical degree
- Ability to drive innovation and operational excellence
- Administration or usage of cloud native delivery solutions such
as ArgoCD or Flux
Keywords: IBM, Austin , Site Reliability Engineer (SRE), Engineering , Austin, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...