AI Agent Testing Specialist
Braintrust
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Job Description
Please submit your resume in English and indicate your level of English.
At Braintrust, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
What We Do
The Mindrift platform connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.
About the Role
As an AI Agent Testing Specialist, you will design realistic and structured evaluation scenarios for LLM-based agents. Your focus will be on creating test cases that simulate human-performed tasks and defining gold-standard behavior to compare agent actions against.
You will work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. This role requires a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.
Responsibilities
- Design structured test scenarios based on real-world tasks.
- Define the golden path and acceptable agent behavior.
- Annotate task steps, expected outputs, and edge cases.
- Collaborate with developers to test and refine scenarios.
- Review agent outputs and adjust tests accordingly.
How To Get Started
Simply apply to this post, qualify, and contribute to projects aligned with your skills on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.
Requirements
- Bachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science, AI/ML, Computational Linguistics, or related fields.
- Background in QA, software testing, data analysis, or NLP annotation.
- Strong understanding of test design principles including reproducibility, coverage, and edge cases.
- Excellent written communication skills in English.
- Familiarity with structured formats like JSON and YAML for scenario description.
- Ability to define expected agent behaviors and scoring logic.
- Basic experience with Python and JavaScript.
- Curious mindset and willingness to work with AI-generated content and prompts.
Nice to Have
- Experience in writing manual or automated test cases.
- Familiarity with LLM capabilities and typical failure modes.
- Understanding of scoring metrics such as precision, recall, and reward functions.
Benefits
This freelance role is fully remote, allowing you to work on your own schedule from anywhere in the world. Gain valuable experience in advanced AI projects and contribute to shaping the future of model behavior evaluation.
Key skills/competency
- Test Design
- QA
- NLP
- Python
- JavaScript
- Scenario Testing
- Data Analysis
- Analytical
- Structured Formats
- Agent Evaluation
How to Get Hired at Braintrust
- Customize your resume: Highlight relevant IT and testing skills.
- Tailor application: Align your experience with test design.
- Showcase projects: Include AI or scenario evaluation examples.
- Prepare for interviews: Review technical test cases and methodologies.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background