Senior Site Reliability Engineer (Storage)
Company: IBM
Location: Austin
Posted on: January 24, 2023
Job Description:
IntroductionAt IBM, work is more than a job - it's a calling: To
build. To design. To code. To consult. To think along with clients
and sell. To make markets. To invent. To collaborate. Not just to
do something better, but to attempt things you've never thought
possible. Are you ready to lead in this new era of technology and
solve some of the world's most challenging problems? If so, lets
talk.Your Role and ResponsibilitiesAs a Storage Platform
Engineer/SRE, you will be part of the Cirrus Hybrid Cloud storage
support team responsible for ensuring the architectural integrity
and successful delivery of a scalable storage platform for the IBM
CIO Organization. In this role you will focus on the management of
storage for Cirrus Hybrid Cloud. This entails working on all
aspects of disk management and monitoring, including disk
initialization, fault monitoring and handling, and reporting. You
will be tasked with solving intriguing problems while partnering
with other team members, customers, and vendors. To find success,
you will need a strong Linux development background and a passion
for learning and continuous improvement.What you'll do
- Management, maintenance, and support of various data storage
solutions especially in the RedHat OpenShift Virtualization (OSV)
environment.
- Plan, coordinate and upgrade components of the SAN/NAS/TSM as
needed with software, hardware and microcode upgrades
- Incorporate storage replication into Disaster Recovery (DR)
solutions
- Operate in an agile manner and under strict change control
- Engage on technical level discussions around data center
solutions and storage integration
- Design, implement, and manage integrations between internal and
external solutions, as well as storage-related monitoring
solutions
- Handle storage provisioning tasks, for example, end user
requests, creation of volumes, mapping and ensuring storage
availability at the operating system, storage performance analysis,
troubleshooting, storage capacity management and planning
- Provision VMWare and Physical LUNS, applying patches, upgrading
software, performance, capacity planning and ensuring data is
secured
- Monitor for errors, hardware failures, optimization, capacity,
and performance of storage arrays and associated networking to
ensure and restore normal operations
- Maintain the fiber channel switches and SAN configurations,
especially Brocade-compatible, by ensuring that software and
firmware are in keeping with the latest needs of an
organization
- Provide on-call support and implementation after-hours on a
rotating basis
- Think and act like a Site Reliability Engineer (SRE) as the
environment relates to storage and depends on storageMost of our
teams are located in Atlanta, GA; Austin, TX; Boston Metro, MA; New
York, NY; Raleigh, NC; Armonk, NY; and Southbury, CT. However,
since the nature of our work is hybrid, we welcome applications
from other locations.Required Technical and Professional Expertise
- Extensive experience in the management of highly available
enterprise-class Storage Area Networks (SAN) and Network Attached
Storage (NAS) including Fiber Channel (FC) and Internet Protocol
(IP) networking and connected storage arrays.
- Experience with one of the following: IBM Spectrum Scale,
Spectrum Protect, CP4D, DataStage, SKLM, NFS, CIFS, data mirroring
or DR solutions
- A strong understanding of diverse infrastructure platforms and
infrastructure concepts
- Solid scripting and automation abilities leveraging Ansible or
similar tools
- Excellent communicate skills to work with clients/customers,
3rd parties such as support and supply vendors, internal
stakeholders, and team
- Ability to provide on-call support and implementation
after-hours on a rotating basisPreferred Technical and Professional
Expertise
- BS degree in computer science or similar technical degree
- Ability to drive innovation and operational excellence
- Administration or usage of cloud native delivery solutions such
as ArgoCD or Flux
- Strong leadership ability to drive technical deployments and
steady state operations for a global team
- Deep technical specialization in a required field - Spectrum
Scale, Spectrum Protect, IBM Storage, Brocade SAN fabrics, or
SKLM
- Ability to technically represent the team with executives and
manage critcal outage events
Keywords: IBM, Austin , Senior Site Reliability Engineer (Storage), Professions , Austin, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...