
Staff Site Reliability Engineer, Fabric
MongoDB · United States
- Hybrid
- Full-time
- $180,000 / year
- United States
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Staff Site Reliability Engineer, Fabric role at MongoDB
Hi Morgan — I came across the Staff Site Reliability Engineer, Fabric opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and MongoDB stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
Job highlights
- Build and maintain critical network infrastructure.
- Ensure secure, efficient system communication.
- Leverage expertise in networking and distributed systems.
- Work with cloud platforms (AWS, Azure, GCP).
- Participate in on-call rotation for network issues.
About the role
About the Team
Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization. Among these are our multi-cloud-provider Kubernetes infrastructure, deployment machinery, and observability and alerting systems.
The Fabric team manages the infrastructure that enables secure communication between systems and from the public internet. Their responsibilities encompass network architecture, service mesh, and edge load balancing, ensuring customer data remains safe in transit. The team plays a crucial role in developing and maintaining the reliable and globally connected multi-cloud network that supports MongoDB products.
This role can sit in our NYC HQ, our smaller Austin, Palo Alto, or San Francisco offices, or fully remote from anywhere in North America. When based in an office, we provide hybrid work accommodation.
Role Overview
We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. As an SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and reliable.
The Ideal Candidate Should
- Have 10+ years of experience working on software and operating distributed systems, with deep expertise in networking fundamentals and a good understanding of how the internet works, e.g. TCP/IP (including IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles
- Possess a customer-focused mindset, driving improvements that benefit end-users
- Value efficiency in processes and operations, and display a strong preference for automation over manual processes (“allergic to ops work”)
- Be intimately familiar with modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP, e.g. VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs
- Have a strong knowledge of service mesh and load-balancing concepts, and be eager to implement these in a multi-cloud environment
Expectations
- Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services
- Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity
- Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability
About MongoDB
MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the data platform for the AI era, enabling builders to create, transform, and disrupt industries with software. MongoDB’s unified data platform, the most widely available, globally distributed data platform on the market, helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Our cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud data platform and is available across AWS, Google Cloud, and Microsoft Azure.
With offices worldwide and over 67,000 customers, including 75% of the Fortune 100 and AI-native startups, relying on MongoDB for their most important applications, we’re powering the next era of software.
Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It’s what makes us MongoDB.
To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world!
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Key skills/competency
- Site Reliability Engineering
- Network Architecture
- Distributed Systems
- Cloud Infrastructure (AWS, Azure, GCP)
- Kubernetes
- Service Mesh
- Load Balancing
- Automation
- TCP/IP, DNS, TLS/mTLS
- BGP, SDN
Skills & topics
- Site Reliability Engineer
- SRE
- Fabric
- Networking
- Distributed Systems
- Cloud Infrastructure
- Kubernetes
- Service Mesh
- Load Balancing
- Automation
- AWS
- Azure
- GCP
- TCP/IP
- BGP
- SDN
- MongoDB
How to get hired
- Tailor your resume: Highlight your extensive experience in distributed systems and networking fundamentals, specifically mentioning technologies like TCP/IP, DNS, TLS/mTLS, BGP, and SDN. Quantify your achievements in automating operations and managing cloud infrastructure (AWS, Azure, GCP).
- Craft a compelling cover letter: Emphasize your customer-focused mindset and passion for automation. Clearly articulate how your skills align with the Fabric team's responsibilities in network architecture, service mesh, and load balancing.
- Prepare for technical interviews: Expect deep dives into networking protocols, distributed systems design, and cloud networking primitives. Be ready to discuss your experience with Kubernetes, service mesh, and your approach to troubleshooting complex infrastructure issues.
- Showcase your collaborative spirit: Demonstrate your ability to work with service-owning teams, provide guidance, and contribute effectively to a 24/7 on-call rotation.
Technical preparation
Behavioral questions
Frequently asked questions
- What specific networking protocols are most important for the Staff Site Reliability Engineer Fabric role at MongoDB?
- For the Staff Site Reliability Engineer Fabric role at MongoDB, deep expertise in networking fundamentals is crucial. This includes TCP/IP (with a strong understanding of IPv6), DNS, TLS/mTLS for secure communication, BGP for routing, understanding tunnels and overlays, and Software-Defined Networking (SDN) principles. Familiarity with these ensures robust and secure system communication.
- How does MongoDB support remote employees for this Staff Site Reliability Engineer Fabric position?
- MongoDB offers fully remote work for the Staff Site Reliability Engineer Fabric role from anywhere in North America. While office-based employees enjoy hybrid work arrangements, remote employees are fully integrated and supported, ensuring a flexible work environment regardless of location.
- What is the expected level of experience for a Staff Site Reliability Engineer Fabric at MongoDB?
- MongoDB is looking for a Staff Site Reliability Engineer with a minimum of 10 years of experience. This experience should encompass working with software and operating distributed systems, with a particular emphasis on deep networking expertise and a strong understanding of internet protocols and cloud infrastructure.
- Can you describe the on-call responsibilities for the Staff Site Reliability Engineer Fabric role?
- Yes, the Staff Site Reliability Engineer Fabric role involves participating in a 24/7 on-call rotation. This ensures swift resolution of critical issues related to network architecture and service-to-service connectivity, maintaining minimal disruption and high availability for MongoDB's services.
- What cloud platforms are most relevant for the Staff Site Reliability Engineer Fabric role at MongoDB?
- The Staff Site Reliability Engineer Fabric role requires intimate familiarity with modern cloud-based infrastructure, specifically the network design primitives of at least one of AWS, Azure, or GCP. This includes understanding concepts like VPCs, subnetting, routing, VPNs, private linking, and CDNs within these environments.
- What does MongoDB mean by 'allergic to ops work' for the Staff Site Reliability Engineer Fabric role?
- MongoDB values efficiency and automation. Being 'allergic to ops work' means the ideal Staff Site Reliability Engineer Fabric candidate strongly prefers automating repetitive or manual operational tasks rather than performing them manually. The focus is on building self-healing, resilient systems through code and automation.
Similar roles
Open positions we recommend based on this role.
