Intermediate Site Reliability Engineer, Tenant Services
GitLab
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
An Overview of the Intermediate Site Reliability Engineer, Tenant Services Role at GitLab
GitLab, an open-core software company, is at the forefront of AI-powered DevSecOps Platforms, serving over 100,000 organizations globally. Our mission centers on empowering everyone to contribute and co-create the software that shapes our world, accelerating human progress. By integrating AI into daily workflows, we drive efficiency, innovation, and impact, fostering a high-performance culture where careers flourish, and every voice is valued.
As an Intermediate Site Reliability Engineer, Tenant Services at GitLab, you will be crucial in ensuring the smooth operation of GitLab.com and other production systems for millions of users. This role combines pragmatic operations with strong software engineering practices, focusing on systems layers (operating systems, storage, networking), edge services, and Kubernetes workloads. You will design and operate highly scalable, reliable, and secure infrastructure that supports one of the largest single-tenancy open-source SaaS sites on the Internet. Working within the Infrastructure organization, you will automate toil, enhance availability and performance, and participate in a globally distributed on-call rotation during local daytime hours. Your contributions will help Tenant Services safeguard and scale customer data, driving automation to meet enterprise-level expectations for reliability and availability as GitLab continues to grow.
What You'll Do
- Design and implement highly scalable infrastructure for GitLab.com, supporting current and future growth requirements.
- Collaborate with cross-functional teams within the Infrastructure organization to plan and deliver projects that define GitLab’s platform direction.
- Operate and enhance edge services and Kubernetes workloads, establishing yourself as a subject matter expert in the infrastructure department.
- Participate in a global on-call rotation during local daytime hours, effectively responding to production incidents and contributing to constructive incident reviews.
- Reduce operational toil by automating tasks and developing tools to improve overall reliability, availability, and scalability.
- Apply infrastructure as code and configuration management practices to ensure consistent management of cloud resources and environments.
- Write and maintain production-quality code, preferably in Go or Ruby, to augment our systems and automation toolchain.
What You'll Bring
- Proven background working with the Kubernetes ecosystem, including tools like Helm, and experience running production workloads.
- Practical experience operating cloud infrastructure on platforms such as Google Cloud Platform (GCP) or Amazon Web Services (AWS), with a focus on networking, hosted Kubernetes services, and scaling.
- Hands-on experience with infrastructure as code and configuration management tools, including Ansible or Chef.
- Strong programming skills in a modern language, ideally Go or Ruby, applied to solving automation and reliability challenges.
- Ability to clearly define complex problems, develop long-term solutions beyond quick fixes, and continuously improve systems.
- A consistent focus on reducing manual effort through automation and thoughtful system design.
- An independent, proactive working style with a bias for action and comfort operating as a “manager of one” in a distributed, asynchronous environment.
- Excellent written and verbal communication skills, welcoming candidates with transferable experience from related reliability, infrastructure, or platform roles.
About The Tenant Services Team
The Tenant Services team at GitLab is dedicated to safeguarding and securing customer data stored by the GitLab application, while also establishing clear guidelines for data access. This team operates the largest GitLab instance, one of the largest single-tenancy open-source SaaS sites globally. This unique scale presents daily reliability and availability challenges that deeply impact users. As an all-remote, globally distributed team, Tenant Services leverages asynchronous collaboration across time zones and heavily relies on automation to meet stringent enterprise expectations for reliability, availability, and data protection, all while continuously scaling. Further details on the team's operations can be found on our Team Handbook page.
How GitLab Will Support You
- Comprehensive benefits covering health, finances, and overall well-being.
- Flexible Paid Time Off (PTO) policy.
- Access to Team Member Resource Groups.
- Equity Compensation & Employee Stock Purchase Plan.
- Dedicated Growth and Development Fund.
- Generous Parental Leave.
- Support for home office setup.
Key Skills/Competency
- Site Reliability Engineering (SRE)
- Kubernetes
- Google Cloud Platform (GCP)
- Amazon Web Services (AWS)
- Infrastructure as Code (IaC)
- Automation
- Go Programming Language
- Ruby Programming Language
- Distributed Systems
- Incident Management
How to Get Hired at GitLab
- Research GitLab's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor. Understand their all-remote, asynchronous work model and "manager of one" philosophy.
- Tailor your resume for SRE: Highlight experience with Kubernetes, GCP/AWS, infrastructure as code (Ansible/Chef), and programming skills in Go or Ruby, specifically for automation and reliability challenges. Quantify your impact on system scalability and uptime.
- Showcase technical expertise: Prepare to discuss your experience designing and operating highly scalable, reliable, and secure infrastructure. Emphasize incident response, root cause analysis, and proactive toil reduction through automation.
- Demonstrate problem-solving and communication: Be ready to articulate how you define complex problems, propose long-term solutions, and collaborate effectively with globally distributed, cross-functional teams, leveraging strong written and verbal communication.
- Understand GitLab's unique scale: Familiarize yourself with the challenges of running one of the largest single-tenancy open-source SaaS sites. Highlight how your experience aligns with safeguarding and scaling customer data in a high-availability environment.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background