Conversational AI Systems Evaluator
Mercor
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Mercor
Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.
Position: Conversational AI Systems Evaluator
This role offers either full-time or part-time contract work, operating remotely with compensation ranging from $45–$80 per hour.
Role Responsibilities
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
- Conduct fact-checking using trusted public sources and authoritative references.
- Execute code and validate outputs using appropriate tools to ensure accuracy.
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
- Assess code quality, readability, algorithmic soundness, and explanation quality.
- Ensure model responses align with expected conversational behavior and system guidelines.
- Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines.
Qualifications
Must-Have
- BS, MS, or PhD in Computer Science or a closely related field
- Significant real-world experience in software engineering or related technical roles
- Expertise in at least one relevant programming language (e.g., Python, Java, C++, JavaScript, Go, Rust)
- Ability to solve HackerRank or LeetCode Medium and Hard–level problems independently
- Experience contributing to well-known open-source projects, including merged pull requests
- Significant experience using LLMs while coding and understanding their strengths and failure modes
- Strong attention to detail and comfort evaluating complex technical reasoning, identifying subtle bugs or logical flaws
Preferred
- Prior experience with RLHF, model evaluation, or data annotation work
- Track record in competitive programming
- Experience reviewing code in production environments
- Familiarity with multiple programming paradigms or ecosystems
- Experience explaining complex technical concepts to non-expert audiences
Application Process
The application takes approximately 20–30 minutes to complete:
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
For detailed information on the interview process and platform, please visit: talent.docs.mercor.com/welcome/welcome. For any assistance, reach out to: support@mercor.com. Our team reviews applications daily; ensure you complete all steps, including the AI interview, for consideration.
Key skills/competency
- LLM Evaluation
- Code Quality Assessment
- Algorithmic Soundness
- Fact-Checking
- Data Annotation
- Technical Reasoning
- Bug Identification
- Programming Languages
- Open-Source Contributions
- HackerRank/LeetCode
How to Get Hired at Mercor
- Research Mercor's mission: Study their focus on connecting elite AI talent with leading research labs, and understand their vision.
- Tailor resume for AI evaluation: Highlight your experience with LLMs, programming languages, and complex technical reasoning, aligning with the Conversational AI Systems Evaluator role.
- Prepare for AI interview: Practice articulating your thought process for solving LeetCode-style problems and explaining complex technical concepts clearly.
- Showcase LeetCode expertise: Be ready to demonstrate your ability to solve HackerRank or LeetCode Medium and Hard-level problems independently during technical assessments.
- Demonstrate open-source contributions: Emphasize any significant contributions to open-source projects, especially merged pull requests, to show real-world coding impact.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background