Senior Site Reliability Engineer, Data Platform Infrastructure SRE
Apple
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Summary
At Apple, we believe that innovation flourishes in an environment where ideas are challenged, collaboration is encouraged and technology is pushed to its limits. This environment is only possible when diverse minds come together, bringing unique perspectives and experiences. Our people and their ideas inspire innovation in everything we do. Imagine what you could accomplish here! Join Apple and help us make the world a better place.
As a principal contributor in our Apple Data Platform SRE organization you will apply SRE principles as you mentor and partner with our engineers and partner teams, ensuring petabyte-scale analytics infrastructure runs reliably and efficiently. This role focuses on managing bare-metal and cloud based infrastructure, levering and extending our infrastructure-as-code based tooling, analyzing and optimizing performance, helping to plan and execute long term fleet management logistics, capacity planning, and ultimately maintaining operational excellence across distributed data platforms that power analytics across Apple. This role includes production on-call responsibilities.
Description
Apple Service Engineering (ASE) teams build and scale the platforms and infrastructure behind many of Apple's services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple's software developers build the products that our customers love. We are looking for a passionate and dedicated Senior Site Reliability Engineer, Data Platform Infrastructure SRE to provide technical leadership on our team to help ensure our customers have the highest quality Apple Services experience. The Apple Data Platform (ADP) Compute SRE team is responsible for the core infrastructure, including our legacy bare-metal platforms and modern cloud based infrastructure stack. We partner with both peer SRE teams and several of our world-class software and product engineering teams to support infrastructure reliability, multi-year parallel migrations for Apple properties, as well as the automation, tooling, incident, and process management necessary to ensure smooth 24x7 operations for ADP customers.
Responsibilities
- Principal/Lead SRE for ADP Compute SRE team, mentoring other ICs, and partnering with other senior engineers to ensure service architecture, tooling, design, and implementations are of the highest quality.
- Identifying efficiency improvements in technical operations and training partner teams, leading by example.
- Managing infrastructure fleet capacity, growth, and hardware lifecycle with innovative tooling, effective planning, and proactive check-ins with our leadership and program management teams.
- Providing technical leadership for Hadoop and Kubernetes infrastructure, tooling (including infrastructure-as-code), and services.
- Programming in Python and Golang supported by Generative AI tooling to accelerate development of mission critical automation and tools.
- Proactive collaboration and presentation skills to effectively communicate ideas and represent the deliverables and needs of the SRE team with ASE leadership.
- Production on-call and incident management responsibilities.
Minimum Qualifications
- BS/MS in Computer Science or Equivalent
- 2+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale
- 5+ years of experience in management or technical leadership roles
- History of end-to-end project management and delivery
- Demonstrable programming skills to both develop software/tools and lead code reviews
- Experience managing Hadoop and Kubernetes infrastructure and related services, or equivalent experience
- Advanced knowledge of Linux, Networking, and Containers
Preferred Qualifications
- 15+ YoE in SRE or related work managing infrastructure at scale
- Experience with scale testing, disaster recovery, and capacity planning
- Ability to define the technical roadmap for infrastructure and drive cross-functional alignment on architectural standards and best practices
Key skills/competency
- Site Reliability Engineering (SRE)
- Distributed Systems
- Infrastructure-as-Code
- Capacity Planning
- Hadoop
- Kubernetes
- Linux
- Networking
- Containers
- Python/Golang
How to Get Hired at Apple
- Research Apple's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight SRE, infrastructure management, and technical leadership experience with keywords specific to Apple's Data Platform.
- Showcase your projects: Demonstrate experience with Hadoop, Kubernetes, Python, Golang, and large-scale distributed systems.
- Prepare for technical interviews: Expect deep dives into Linux, networking, containers, and SRE principles at scale.
- Emphasize collaboration and leadership: Be ready to discuss how you mentor, drive projects, and foster operational excellence within teams.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background