13 days ago

AI Agent Testing Specialist

Braintrust

Hybrid
Full Time
$90,000
Hybrid

Job Overview

Job TitleAI Agent Testing Specialist
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$90,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Job Description

Please submit your resume in English and indicate your level of English.

At Braintrust, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.

What We Do

The Mindrift platform connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.

About the Role

As an AI Agent Testing Specialist, you will design realistic and structured evaluation scenarios for LLM-based agents. Your focus will be on creating test cases that simulate human-performed tasks and defining gold-standard behavior to compare agent actions against.

You will work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. This role requires a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Responsibilities

  • Design structured test scenarios based on real-world tasks.
  • Define the golden path and acceptable agent behavior.
  • Annotate task steps, expected outputs, and edge cases.
  • Collaborate with developers to test and refine scenarios.
  • Review agent outputs and adjust tests accordingly.

How To Get Started

Simply apply to this post, qualify, and contribute to projects aligned with your skills on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science, AI/ML, Computational Linguistics, or related fields.
  • Background in QA, software testing, data analysis, or NLP annotation.
  • Strong understanding of test design principles including reproducibility, coverage, and edge cases.
  • Excellent written communication skills in English.
  • Familiarity with structured formats like JSON and YAML for scenario description.
  • Ability to define expected agent behaviors and scoring logic.
  • Basic experience with Python and JavaScript.
  • Curious mindset and willingness to work with AI-generated content and prompts.

Nice to Have

  • Experience in writing manual or automated test cases.
  • Familiarity with LLM capabilities and typical failure modes.
  • Understanding of scoring metrics such as precision, recall, and reward functions.

Benefits

This freelance role is fully remote, allowing you to work on your own schedule from anywhere in the world. Gain valuable experience in advanced AI projects and contribute to shaping the future of model behavior evaluation.

Key skills/competency

  • Test Design
  • QA
  • NLP
  • Python
  • JavaScript
  • Scenario Testing
  • Data Analysis
  • Analytical
  • Structured Formats
  • Agent Evaluation

Tags:

AI Agent Testing Specialist
test design
QA
NLP
Python
JavaScript
scenario testing
data analysis
JSON
structured formats
AI evaluation
LLM

Share Job:

How to Get Hired at Braintrust

  • Customize your resume: Highlight relevant IT and testing skills.
  • Tailor application: Align your experience with test design.
  • Showcase projects: Include AI or scenario evaluation examples.
  • Prepare for interviews: Review technical test cases and methodologies.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background