19 hours ago

Prompt Engineer, Agent Prompts and Evals

Anthropic

On Site
Full Time
$360,000
San Francisco, CA

Job Overview

Job TitlePrompt Engineer, Agent Prompts and Evals
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$360,000
LocationSan Francisco, CA

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About The Role: Prompt Engineer, Agent Prompts and Evals

We’re looking for prompt and context engineers to join our product engineering team to help build AI-first products, features, and evaluations. Your mission will be to bridge the gap between model capabilities and real product experience, working with product teams to build consistent, safe, and beneficial user experiences across all product surfaces.

You will be deeply involved in new product feature and model releases at Anthropic, combining engineering expertise with an understanding of frontier AI applications and model quality. You’ll become an expert on Claude’s behavioral quirks and capabilities and apply that knowledge to deliver the best possible user experience across models and domains. You’ll be the first resource for product teams working on Claude’s AI infrastructure: system prompts, tool prompts, skills, and evaluations.

This role requires someone who can effectively balance caring deeply about making Claude the best it can be while also supporting a wide variety of concurrent projects and efforts across many product teams.

Key Responsibilities

  • Prompt Engineering Excellence: Design, test, and optimize system prompts and feature-specific prompts that shape Claude’s behavior across consumer and API products.
  • Evaluation Development: Build and maintain comprehensive evaluation suites that ensure model quality and consistency across product launches and updates.
  • Cross-functional Collaboration: Partner closely with product teams, research teams, and safeguards to ensure new features meet quality and safety standards.
  • Model Launch Support: Play a critical role in model releases, ensuring smooth rollouts and catching regressions before they impact users.
  • Infrastructure Contribution: Help build and improve the frameworks and tools that allow teams to develop and test prompts and features with confidence.
  • Knowledge Transfer: Mentor product engineers on prompt engineering best practices and help teams build their first evaluations.
  • Rapid Iteration: Work in a fast-paced environment where model capabilities advance daily, requiring quick adaptation and creative problem-solving.

Required Qualifications

  • 5+ years of software engineering experience with Python or similar languages.
  • Demonstrated experience with LLMs and prompt engineering (through work, research, or significant personal projects).
  • Strong understanding of evaluation methodologies and metrics for AI systems.
  • Excellent written and verbal communication skills – you’ll need to explain complex model behaviors to diverse stakeholders.
  • Ability to manage multiple concurrent projects and prioritize effectively.
  • Experience with version control, CI/CD, and modern software development practices.

Preferred Qualifications

  • Experience with Claude or other frontier AI models in production settings.
  • Background in machine learning, NLP, or related fields.
  • Experience with A/B testing and experimentation frameworks (e.g., Statsig).
  • Familiarity with AI safety and alignment considerations.
  • Experience building tools and infrastructure for ML/AI workflows.
  • Track record of improving AI system performance through systematic evaluation and iteration.

You Might Thrive in This Role If You…

  • Get excited about the nuances of how language models behave and love finding creative ways to improve their outputs.
  • Enjoy being at the intersection of research and product, translating cutting-edge capabilities into user value.
  • Are comfortable with ambiguity and can define success metrics for novel AI features.
  • Have a strong sense of ownership and drive projects from conception to production.
  • Are passionate about building AI systems that are helpful, harmless, and honest.
  • Thrive in collaborative environments and enjoy teaching others.

Logistics

The annual compensation range for this role is $320,000—$405,000 USD. We require at least a Bachelor's degree in a related field or equivalent experience. Currently, we expect all staff to be in one of our offices at least 25% of the time, following a location-based hybrid policy. We do sponsor visas and will make every reasonable effort if an offer is extended. We encourage applications from all backgrounds, regardless of whether you meet every qualification, as diversity is important for AI system development. Be cautious of recruitment scams; legitimate Anthropic recruiters use @anthropic.com email addresses and will never ask for money or banking information.

How We're Different

At Anthropic, we believe in high-impact AI research, working as a single cohesive team on large-scale efforts focused on steerable, trustworthy AI. We view AI research as an empirical science, valuing impact over specific puzzles. Our collaborative environment fosters frequent research discussions and values strong communication skills. Our research directions include GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space.

Key skills/competency

  • Prompt Engineering
  • Large Language Models (LLMs)
  • AI System Evaluation
  • Python Programming
  • Cross-functional Collaboration
  • Product Development
  • AI Infrastructure
  • Model Quality Assurance
  • AI Safety and Alignment
  • Version Control & CI/CD

Tags:

Prompt Engineer
Prompt design
Model evaluation
AI product development
Cross-functional collaboration
Feature optimization
System prompts
Tool prompts
AI safety
Rapid iteration
Python
LLMs
AI infrastructure
Version control
CI/CD
Machine learning
NLP
A/B testing
Experimentation
Frontier AI

Share Job:

How to Get Hired at Anthropic

  • Research Anthropic's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Highlight Python, LLM, and prompt engineering experience with keywords from the job description.
  • Showcase project impact: Detail specific contributions to AI system quality, safety, and user experience.
  • Prepare for technical depth: Expect questions on prompt design, evaluation methodologies, and model behavior.
  • Demonstrate alignment: Be ready to discuss AI safety, interpretability, and ethical considerations in interviews.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background