Technical Program Manager, Safeguards Infrastructure and Evals
Anthropic
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About The Role
Safeguards Engineering builds and operates the infrastructure that keeps Anthropic's AI systems safe in production — the classifiers, detection pipelines, evaluation platforms, and monitoring systems that sit between our models and the real world. That infrastructure needs to be not just correct, but reliable: when a safety-critical pipeline goes down or degrades, the consequences can be serious, and they can be invisible until someone looks closely.
As a Technical Program Manager, Safeguards Infrastructure and Evals, you'll own the operational health and forward momentum of this stack. Your primary responsibility is driving reliability — owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out. Alongside that ongoing operational rhythm, you'll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.
This role sits at the intersection of operations and program management. It requires genuine technical depth — you need to understand how these systems work well enough to triage effectively, judge what's actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them. But the core of the job is keeping the machine running well and the work moving.
What You'll Do
- Own the Safeguards Engineering ops review - Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made. This is the heartbeat of how Safeguards Eng stays ahead of operational risk.
- Drive incident tracking and post-mortem execution - When incidents happen — and in this space, they happen regularly — you'll make sure they get followed through properly. That means tracking incidents across the organization (including those owned by partner teams like Inference), ensuring post-mortems get written, and most critically, making sure the action items that come out of them actually get done. Closing the loop on post-mortem actions is one of the highest-leverage things this role does.
- Establish and maintain SLOs with partner teams - Work with Safeguards Engineering teams and key partners — particularly Inference and Cloud Inference — to define service-level objectives for safety-critical pipelines. Then build the tracking and reporting that makes it possible to tell whether those SLOs are being met, and surface it when they're not.
- Maintain runbook quality and incident-ownership clarity - Safety-critical systems need clear playbooks for when things go wrong. Partner with engineering leads to keep runbooks accurate, actionable, and up to date — and ensure that ownership of incidents (including for areas like account-banning false positives and CSAM detection) is unambiguous so that nothing falls through the cracks during an active incident.
- Drive platform migrations and infrastructure projects - Own the program management for the larger infrastructure work on the roadmap: migrating the infra from one platform to the next, moving from one incident platform to the next and from one cloud system monitoring to another, and other migrations as they come. These are cross-team efforts with real dependencies — your job is to keep them sequenced, on track, and connected to the teams that need them.
- Coordinate evals platform improvements - Partner with the evals engineering team to drive improvements to the evaluation platform — including self-serve capabilities and the broader eval factory infrastructure. Help scope the work, track dependencies on other Safeguards systems, and make sure the evals platform is keeping pace with the team's needs.
You Might Be a Good Fit If You
- Have solid technical program management experience, particularly in operational or infrastructure-heavy environments — you're comfortable owning a mix of ongoing operational cadences and discrete project work simultaneously.
- Understand how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what's going wrong and why — you don't need to write the code, but you need to follow the technical thread.
- Are energized by closing loops. Post-mortem action items that never get done, SLOs that no one checks, runbooks that go stale — these things bother you, and you know how to build the processes and follow-ups that fix them.
- Can work effectively across team boundaries — comfortable coordinating with partner teams (like Inference) where you don't have direct authority, and skilled at keeping shared work moving through influence and clear communication.
- Thrive in environments where the work shifts between "keep the lights on" and "build something new" — and can context-switch between incident follow-ups and longer-horizon platform projects without dropping either.
- Have experience with or strong interest in AI safety — you understand why the reliability of a safety-critical pipeline is a different kind of problem than the reliability of a product feature, and that distinction motivates you.
Strong Candidates May Also
- Have experience with SRE practices, incident management frameworks, or on-call operations at scale.
- Have worked on or with evaluation infrastructure for ML systems — understanding how evals get designed, run, and interpreted.
- Have experience driving infrastructure migrations in complex, multi-team environments — particularly where the migration touches operational systems that can't go offline.
- Be familiar with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents) and the operational culture around them.
Compensation and Logistics
The annual compensation range for this role is $290,000—$365,000 USD. We require at least a Bachelor's degree in a related field or equivalent experience. Currently, we expect all staff to be in one of our offices at least 25% of the time, typically in San Francisco. We do sponsor visas and encourage diverse candidates to apply.
How We're Different
At Anthropic, we believe in big science for AI research, working as a single cohesive team on large-scale efforts to build steerable, trustworthy AI. We value impact and view AI research as an empirical science, highly collaborative with frequent discussions. We encourage you to read our recent research to understand our directions. Anthropic is a public benefit corporation headquartered in San Francisco, offering competitive compensation, benefits, and flexible working hours.
Key skills/competency
- Technical Program Management
- Incident Management
- ML System Operations
- Infrastructure Reliability
- Evaluation Platforms
- Cross-functional Coordination
- SLO Definition & Monitoring
- Post-Mortem Analysis
- Runbook Management
- AI Safety Principles
How to Get Hired at Anthropic
- Research Anthropic's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their unique approach to AI safety.
- Tailor your resume: Customize your resume to highlight technical program management, infrastructure reliability, and AI safety experience relevant to Anthropic's mission.
- Showcase technical depth: Prepare to discuss your understanding of ML systems, incident triage, and operational challenges during your Anthropic interviews.
- Emphasize cross-functional skills: Provide concrete examples of coordinating complex projects with partner teams and driving consensus within a collaborative environment.
- Articulate AI safety passion: Be ready to clearly articulate your motivation and passion for building safe, beneficial AI systems aligned with Anthropic's core mission.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background