Research Engineer, Model Evaluations
Anthropic
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. Our growing team of researchers, engineers, policy experts, and business leaders is focused on building beneficial AI systems.
About The Role
As a Research Engineer, Model Evaluations at Anthropic, you will lead the design and implementation of the evaluation platform that informs our understanding and improvement of model capabilities and safety. This role involves both strategic vision and hands-on engineering, working closely with training teams, alignment researchers, and safety teams.
Responsibilities
- Design evaluation methodologies for assessing model capabilities.
- Architect and build a scalable evaluation platform.
- Implement high-throughput evaluation pipelines in production.
- Analyze results to identify patterns and opportunities for model improvement.
- Partner with research teams and build domain-specific evaluations.
- Develop rapid iteration infrastructure including automated and human-in-the-loop systems.
- Set best practices for evaluation development across the organization.
- Mentor team members and coordinate evaluation efforts during training runs.
- Contribute to research publications and external communications.
You May Be a Good Fit If You
- Have experience with evaluation systems for machine learning models.
- Demonstrate technical leadership and complex project experience.
- Are skilled in systems engineering and experimental design.
- Possess strong Python programming and distributed computing skills.
- Can bridge research and engineering effectively.
- Are results-oriented in a fast-paced setting.
- Communicate technical concepts clearly to diverse stakeholders.
- Care about AI safety and societal impacts.
- Have experience in statistical analysis of large-scale experimental data.
Strong Candidates May Also Have
- Experience with production evaluation during model training.
- Familiarity with safety evaluation frameworks and red teaming methodologies.
- Background in psychometrics or experimental psychology.
- Experience with reinforcement learning evaluation or multi-agent systems.
- Contributions to open-source evaluation benchmarks.
- Knowledge of prompt engineering and evaluation design.
- Experience managing evaluation infrastructure at scale.
- Published research in machine learning evaluation or benchmarking.
Compensation and Logistics
The annual salary for this role is between $300,000 and $405,000 USD. A Bachelor's degree in a related field or equivalent experience is required. This role follows a location-based hybrid policy with a minimum of 25% office presence. Visa sponsorship is provided in most cases.
How We're Different
Anthropic works as a cohesive team on major research efforts focused on safe and beneficial AI systems. We emphasize impactful research that balances scientific rigor with practical implementation and prioritize clear communication throughout our team.
Key skills/competency
- Evaluation Systems
- Model Capabilities
- AI Safety
- Python
- Distributed Computing
- Experimental Design
- Infrastructure
- Research Collaboration
- Data Analysis
- Technical Leadership
How to Get Hired at Anthropic
- Research Anthropic's culture: Understand their mission and recent research.
- Customize your resume: Highlight evaluation systems experience.
- Network online: Engage via LinkedIn and professional forums.
- Prepare for technical interviews: Review Python and distributed computing projects.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background