Software Engineering Manager, Site Reliability Engineering, Google Cloud
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Software Engineering Manager, Site Reliability Engineering, Google Cloud
At Google, Site Reliability Engineering (SRE) integrates software and systems engineering to construct and operate massively distributed, fault-tolerant systems at scale. This role is crucial for ensuring the reliability and uptime of Google's internal and external services, meeting user needs, and driving continuous improvement. SREs also closely monitor system capacity and performance to proactively address potential issues.
A significant portion of our work involves optimizing existing systems, developing robust infrastructure, and eliminating manual effort through comprehensive automation. As part of the SRE team, you will tackle unique and complex scaling challenges inherent to Google, leveraging your expertise in coding, algorithms, complexity analysis, and large-scale system design.
The SRE culture thrives on intellectual curiosity, problem-solving, and openness. We foster a diverse environment where individuals from varied backgrounds collaborate, innovate, and take calculated risks in a blame-free setting. We champion self-direction for impactful projects while providing essential support and mentorship for continuous learning and professional growth.
As an Engineering Manager, you will lead a dedicated team responsible for products on a global scale. Your role involves providing pivotal technical leadership on key projects, empowering your team members, and fostering their development to achieve similar leadership and execution excellence.
The Technical Infrastructure team underpins everything users experience online, building and maintaining the foundational architecture. From managing our data centers to developing Google's next-generation platforms, we enable Google's entire product portfolio. We pride ourselves on being the engineers' engineers, often dissecting systems to rebuild them for optimal performance. We ensure our networks operate flawlessly, providing users with the fastest and best possible experience.
Minimum Qualifications
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- 8 years of experience with software development in one or more programming languages.
- 3 years of experience managing people or teams.
- 3 years of experience leading projects.
- 3 years of experience designing, analyzing, and troubleshooting distributed systems.
Preferred Qualifications
- Experience working in computing, distributed systems, storage, or networking.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code, and to automate routine tasks.
- Excellent problem-solving, verbal and written communication skills.
Responsibilities
- Lead a team of Software/Systems Engineers on projects for users and be directly responsible for uptime.
- Own end-to-end availability and performance of key services and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions.
- Lead by example, mentor the team and establish credibility through quality technical execution.
- Manage on-call rotations across continents, using a follow-the-sun model.
- Design, write and deliver software to improve the availability, scalability, latency and efficiency of Google's services.
Key skills/competency
- Site Reliability Engineering (SRE)
- Distributed Systems
- Software Development
- Automation
- Technical Leadership
- People Management
- System Design
- Scalability
- Troubleshooting
- Google Cloud Platform (GCP)
How to Get Hired at Google
- Research Google's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to align your application.
- Tailor your resume for SRE: Customize your resume to highlight experience in distributed systems, automation, software development, and leadership relevant to Site Reliability Engineering at Google.
- Demonstrate technical depth: Prepare to showcase strong problem-solving skills, coding proficiency, and expertise in large-scale system design during Google's rigorous technical interviews.
- Practice behavioral questions: Be ready to discuss your leadership style, conflict resolution, project management, and how you foster team growth, specifically for a Google management role.
- Network effectively: Connect with current Google employees, especially within Site Reliability Engineering, for insights and potential referrals, enhancing your application visibility.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background