
Senior Site Reliability Engineer
Mastercard · Salt Lake City, UT
- Hybrid
- Full-time
- $163,000 / year
- Salt Lake City, UT
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Senior Site Reliability Engineer role at Mastercard
Hi Casey — I came across the Senior Site Reliability Engineer opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Mastercard stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
Job highlights
- Lead reliability for a new enterprise application.
- Own system reliability from design to production.
- Drive automation and improve developer experience.
- Manage observability and incident response.
- Utilize AI tools for development efficiency.
About the role
Senior Site Reliability Engineer
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Overview
Commerce Media is hiring a Senior Site Reliability Engineer to lead the reliability, scalability, and production operations of a greenfield application within our enterprise platform. This is a high-impact, individual contributor role with end-to-end ownership of system reliability—from design influence through production operations. You will partner across engineering and platform teams to ensure services are resilient, observable, and production-ready from day one.
About The Role
Design Influence & Production Readiness
- Drive reliability-focused design in partnership with engineering and platform teams
- Lead architecture and launch readiness reviews, including: Capacity planning, Failure-mode and risk analysis
- Define and enforce non-functional requirements (availability, latency, resilience)
Production Ownership & Incident Leadership
- Own production reliability and service health
- Act as incident commander, leading triage, mitigation, and communication
- Lead blameless post-mortems with clear, actionable follow-ups
- Proactively identify and reduce operational risk across the system
Observability & SLO Management
- Define and manage SLIs, SLOs, and error budgets
- Design and operate monitoring and alerting using: Prometheus, Grafana, OpenSearch / Elasticsearch, Opsgenie
- Build dashboards aligned to user impact and system health
Automation, Scalability & Platform Enablement
- Drive automation-first operations to scale systems sustainably
- Enhance CI/CD pipelines (GitHub Actions) with deployment gating and validation
- Identify and resolve performance and reliability bottlenecks
- Improve developer experience through operational tooling and best practices
Technology Environment
- Kubernetes, Docker
- GitHub Actions (CI/CD)
- Prometheus, Grafana (observability)
- OpenSearch / Elasticsearch (logging/search)
- Opsgenie (incident management)
- AWS or equivalent cloud platforms
- Preferred: Spring Boot and/or Golang services
All About You
- Years of professional experience operating distributed systems at scale in production
- Strong expertise in: Kubernetes and containerized environments, Observability (metrics, logging, tracing), Spring Boot and/or Golang ecosystems
- Hands-on across application, infrastructure, and release pipelines
- Demonstrated ownership of service reliability, incident response, and operational strategy
- Ability to influence system design through technical leadership and data-driven decisions
- Pragmatic mindset—balancing automation, trade-offs, and system evolution
- Experience navigating enterprise environments while maintaining delivery velocity
- Leverages AI tools (e.g., Copilot, ChatGPT, Claude) to: Accelerate design, coding, and testing, Improve code quality and operational outcomes
- Integrates AI into workflows: Architecture reviews, code generation, testing, and documentation
- Applies strong judgment in production-critical, low-latency environments
Key skills/competency
- Site Reliability Engineering
- Kubernetes
- Observability
- Incident Management
- Automation
- CI/CD
- Cloud Platforms
- Distributed Systems
- System Design
- Problem Solving
Skills & topics
- Site Reliability Engineer
- SRE
- Kubernetes
- Docker
- AWS
- Cloud
- Observability
- Prometheus
- Grafana
- CI/CD
- GitHub Actions
- OpenSearch
- Elasticsearch
- Opsgenie
- Spring Boot
- Golang
- Distributed Systems
- Production Operations
- Incident Management
- Scalability
- Automation
- System Design
- Technical Leadership
- Enterprise Environment
How to get hired
- Tailor your resume: Highlight experience with Kubernetes, observability, and incident management relevant to Mastercard's Senior Site Reliability Engineer role.
- Showcase your expertise: Emphasize your track record in operating large-scale distributed systems and driving reliability in production environments.
- Demonstrate technical leadership: Provide examples of influencing system design and leading incident response efforts.
- Prepare for technical interviews: Be ready to discuss system design, Kubernetes, cloud platforms, and CI/CD pipelines.
- Understand the culture: Research Mastercard's commitment to innovation, security, and building a sustainable economy.
Technical preparation
Behavioral questions
Frequently asked questions
- What are the key technologies for the Senior Site Reliability Engineer role at Mastercard?
- The Senior Site Reliability Engineer role at Mastercard heavily utilizes Kubernetes, Docker, GitHub Actions for CI/CD, Prometheus and Grafana for observability, and OpenSearch/Elasticsearch for logging. Experience with AWS or similar cloud platforms is essential, with a preference for Spring Boot and/or Golang services. Familiarity with Opsgenie for incident management is also important.
- How important is experience with AI tools for this Mastercard SRE position?
- Mastercard encourages the use of AI tools like Copilot, ChatGPT, and Claude for this Senior Site Reliability Engineer role. Candidates are expected to leverage these tools to accelerate design, coding, and testing, improve code quality, and integrate AI into workflows such as architecture reviews, code generation, and documentation.
- What is the expected experience level for a Senior Site Reliability Engineer at Mastercard?
- For the Senior Site Reliability Engineer position at Mastercard, candidates should have several years of professional experience operating distributed systems at scale in production. Strong expertise in Kubernetes, containerized environments, observability tools (metrics, logging, tracing), and the Spring Boot/Golang ecosystems is required.
- What is the salary range for a Senior Site Reliability Engineer at Mastercard?
- For a remote Senior Site Reliability Engineer position in Utah, Mastercard offers a salary range of $96,000 to $163,000 USD per year. This range can vary based on factors such as location, job-related knowledge, skills, and experience.
- Does Mastercard offer benefits for full-time employees?
- Yes, Mastercard offers a comprehensive benefits package for full-time employees, including medical, dental, vision, life insurance, flexible spending accounts, 401k with a company match, paid leaves (parental, bereavement, vacation, sick time), and more. Specific details are provided during the offer process.
- How does Mastercard approach incident response for its Senior Site Reliability Engineers?
- Senior Site Reliability Engineers at Mastercard are expected to own production reliability and act as incident commanders. This involves leading triage, mitigation, and communication during incidents, as well as conducting blameless post-mortems with actionable follow-ups to prevent future occurrences.
Similar roles
Open positions we recommend based on this role.
