
Senior Site Reliability Champion
Vanguard · Dallas, TX
- On site
- Full-time
- $150,000 / year
- Dallas, TX
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Senior Site Reliability Champion role at Vanguard
Hi Jamie — I came across the Senior Site Reliability Champion opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Vanguard stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
Job highlights
- Lead SRE role ensuring application resiliency and stability.
- Define and implement enterprise resiliency standards.
- Automate incident response and pioneer AI diagnostics.
- Collaborate to strengthen critical systems.
- Shape next-generation client experiences.
About the role
Senior Site Reliability Champion
The Site Reliability Engineer (SRE) for Global Technology Operations (GTO) is a strategic technical leader responsible for ensuring the resiliency and stability of the critical applications our crew and clients rely on every day. This role combines deep hands‑on engineering expertise with enterprise‑level influence. You will help define what resiliency means at Vanguard and partner across teams to design, test, and strengthen some of our most critical systems. In addition, you will automate incident response capabilities and pioneer AI‑enhanced diagnostics and analysis to improve detection, response, and recovery. You will work alongside a collaborative, technically focused team where your innovations in resiliency engineering directly shape Vanguard’s next generation of reliable, client‑centric experiences.
Core Responsibilities:
- Evaluate applications, platforms, and vendors to assess resiliency, reliability, and operational risk.
- Design and implement processes that enforce enterprise resiliency and reliability standards.
- Lead blameless post‑incident reviews for high‑severity incidents or incidents spanning multiple complex product families.
- Partner with product and platform teams to proactively identify and remediate reliability risks before they impact clients.
- Develop, communicate, and evangelize new standards, tools, and frameworks across subdivisions, ensuring consistent adoption.
- Troubleshoot complex production issues and implement durable solutions that prevent recurrence.
- Participate in a periodic on‑call rotation to support production stability.
- Evaluate and onboard resiliency and reliability tooling.
- Actively participate in reliability engineering and resilience communities of practice, contributing to shared learning and enterprise consistency.
- Contribute to strategic initiatives that advance Vanguard’s operational maturity and resiliency posture.
Qualifications | Technical Skills:
- Observability Platforms: Experience with modern observability and monitoring tools, such as Splunk, Honeycomb, CloudWatch, Dynatrace, or AppDynamics.
- Reliability Metrics: Strong understanding of SLIs, SLOs, and SLAs, including dashboarding and reporting practices.
- Monitoring & Alerting: Experience with alert design, anomaly detection, predictive alerting, and synthetic monitoring using structured methodologies.
- Automation & Resilience Engineering: Experience with automation and resilience practices such as Python-based automation, RPA platforms (e.g., Blue Prism, UiPath), chaos engineering, and failure analysis techniques (e.g., FMEA).
Special Factors
- Sponsorship: Vanguard is not offering visa sponsorship for this position.
About Vanguard
At Vanguard, we don't just have a mission—we're on a mission. To work for the long-term financial wellbeing of our clients. To lead through product and services that transform our clients' lives. To learn and develop our skills as individuals and as a team. From Malvern to Melbourne, our mission drives us forward and inspires us to be our best.
How We Work
Vanguard has implemented a hybrid working model for the majority of our crew members, designed to capture the benefits of enhanced flexibility while enabling in-person learning, collaboration, and connection. We believe our mission-driven and highly collaborative culture is a critical enabler to support long-term client outcomes and enrich the employee experience.
Key skills/competency
- Site Reliability Engineering
- Resiliency and Stability
- Application Monitoring
- Incident Response Automation
- AI-Enhanced Diagnostics
- Observability Platforms
- Reliability Metrics (SLIs, SLOs, SLAs)
- Automation (Python, RPA)
- Chaos Engineering
- Production Issue Troubleshooting
Skills & topics
- Site Reliability Engineering
- SRE
- Resiliency
- Reliability
- Observability
- Monitoring
- Alerting
- Automation
- Python
- CloudWatch
- Splunk
- SLI
- SLO
- SLA
- Chaos Engineering
- Incident Response
- Production Support
- Global Technology Operations
- GTO
- Vanguard
How to get hired
- Tailor your resume: Highlight SRE experience, observability tools, and automation skills relevant to Vanguard's needs.
- Showcase impact: Quantify achievements in improving system reliability, reducing incidents, and implementing automation.
- Prepare for technical deep-dives: Be ready to discuss SLIs, SLOs, SLAs, and experience with monitoring and alerting tools.
- Emphasize collaboration: Demonstrate experience leading blameless post-mortems and partnering with product teams.
- Research Vanguard's values: Understand their commitment to client financial well-being and their hybrid work model.
Technical preparation
Behavioral questions
Frequently asked questions
- What is the work arrangement for the Senior Site Reliability Engineer role at Vanguard?
- The Senior Site Reliability Engineer role at Vanguard follows a hybrid working model. This model balances the benefits of flexibility with in-person learning, collaboration, and connection, which are considered crucial for supporting client outcomes and enriching the employee experience.
- Does Vanguard offer visa sponsorship for the Senior Site Reliability Engineer position?
- No, Vanguard is not offering visa sponsorship for this Senior Site Reliability Engineer position. Candidates must be authorized to work in the location where the job is based without requiring sponsorship.
- What are the key technical skills required for a Senior Site Reliability Engineer at Vanguard?
- Key technical skills for this role include experience with observability platforms (Splunk, Honeycomb, CloudWatch, Dynatrace, AppDynamics), a strong understanding of reliability metrics (SLIs, SLOs, SLAs), expertise in monitoring and alerting, and proficiency in automation and resilience engineering practices like Python-based automation, RPA, chaos engineering, and failure analysis techniques.
- How does Vanguard define the role of a Site Reliability Engineer?
- Vanguard defines the Site Reliability Engineer role as a strategic technical leader focused on ensuring the resiliency and stability of critical applications. It involves deep hands-on engineering expertise, enterprise-level influence, defining resiliency standards, automating incident response, and pioneering AI-enhanced diagnostics.
- What is Vanguard's mission statement?
- Vanguard's mission is to work for the long-term financial wellbeing of their clients, to lead through products and services that transform clients' lives, and to foster continuous learning and development for individuals and the team.
- What are the core responsibilities of a Senior Site Reliability Engineer at Vanguard?
- Core responsibilities include evaluating applications for risk, designing and implementing resiliency standards, leading blameless post-incident reviews, partnering with teams to identify and remediate reliability risks, developing and evangelizing new standards, troubleshooting production issues, and participating in on-call rotations.
- How can I stand out when applying for the Senior Site Reliability Engineer role at Vanguard?
- To stand out, tailor your resume to highlight specific SRE achievements, quantify your impact on system reliability and incident reduction, and be prepared to discuss your experience with resilience engineering practices and tools in detail during the interview process. Demonstrating an understanding of Vanguard's mission and values is also beneficial.
Similar roles
Open positions we recommend based on this role.
