1 day ago

Conversational AI Systems Evaluator

Mercor

Hybrid
Part Time
$145,600
Hybrid

Job Overview

Job TitleConversational AI Systems Evaluator
Job TypePart Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$145,600
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Mercor

Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.

Position: Conversational AI Systems Evaluator

This role offers either full-time or part-time contract work, operating remotely with compensation ranging from $45–$80 per hour.

Role Responsibilities

  • Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
  • Conduct fact-checking using trusted public sources and authoritative references.
  • Execute code and validate outputs using appropriate tools to ensure accuracy.
  • Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
  • Assess code quality, readability, algorithmic soundness, and explanation quality.
  • Ensure model responses align with expected conversational behavior and system guidelines.
  • Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines.

Qualifications

Must-Have
  • BS, MS, or PhD in Computer Science or a closely related field
  • Significant real-world experience in software engineering or related technical roles
  • Expertise in at least one relevant programming language (e.g., Python, Java, C++, JavaScript, Go, Rust)
  • Ability to solve HackerRank or LeetCode Medium and Hard–level problems independently
  • Experience contributing to well-known open-source projects, including merged pull requests
  • Significant experience using LLMs while coding and understanding their strengths and failure modes
  • Strong attention to detail and comfort evaluating complex technical reasoning, identifying subtle bugs or logical flaws
Preferred
  • Prior experience with RLHF, model evaluation, or data annotation work
  • Track record in competitive programming
  • Experience reviewing code in production environments
  • Familiarity with multiple programming paradigms or ecosystems
  • Experience explaining complex technical concepts to non-expert audiences

Application Process

The application takes approximately 20–30 minutes to complete:

  • Upload resume
  • AI interview based on your resume
  • Submit form

Resources & Support

For detailed information on the interview process and platform, please visit: talent.docs.mercor.com/welcome/welcome. For any assistance, reach out to: support@mercor.com. Our team reviews applications daily; ensure you complete all steps, including the AI interview, for consideration.

Key skills/competency

  • LLM Evaluation
  • Code Quality Assessment
  • Algorithmic Soundness
  • Fact-Checking
  • Data Annotation
  • Technical Reasoning
  • Bug Identification
  • Programming Languages
  • Open-Source Contributions
  • HackerRank/LeetCode

Tags:

Conversational AI Systems Evaluator
LLM evaluation
code quality
algorithmic soundness
fact-checking
data annotation
technical reasoning
bugs
logical flaws
guidelines
conversational behavior
Python
Java
C++
JavaScript
Go
Rust
LLMs
AI
HackerRank
LeetCode

Share Job:

How to Get Hired at Mercor

  • Research Mercor's mission: Study their focus on connecting elite AI talent with leading research labs, and understand their vision.
  • Tailor resume for AI evaluation: Highlight your experience with LLMs, programming languages, and complex technical reasoning, aligning with the Conversational AI Systems Evaluator role.
  • Prepare for AI interview: Practice articulating your thought process for solving LeetCode-style problems and explaining complex technical concepts clearly.
  • Showcase LeetCode expertise: Be ready to demonstrate your ability to solve HackerRank or LeetCode Medium and Hard-level problems independently during technical assessments.
  • Demonstrate open-source contributions: Emphasize any significant contributions to open-source projects, especially merged pull requests, to show real-world coding impact.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background