1 day ago

Multimodal GenAI Evaluation Analyst

Braintrust

Hybrid
Full Time
$45,760
Hybrid

Job Overview

Job TitleMultimodal GenAI Evaluation Analyst
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$45,760
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Position Overview

iMerit seeks detail-oriented and analytically minded Multimodal GenAI Evaluation Analysts to perform highly nuanced evaluations of AI system outputs across different modalities: text, image, video, and multimodal interactions. Analysts will assess the accuracy, appropriateness, quality, clarity, and cultural alignment of model outputs against complex guidelines, ensuring that results align with project standards and real-world use cases. These evaluations will directly inform the development and fine-tuning of advanced large language models (LLMs), vision models (LVMs), and multimodal AI systems.

Role Responsibilities for the Multimodal GenAI Evaluation Analyst

  • Evaluate outputs generated by LLMs across multiple modalities (text, image captions, video descriptions, and multimodal prompts).
  • Assess quality against project-specific criteria such as correctness, coherence, completeness, style, cultural appropriateness, and safety.
  • Identify subtle errors, hallucinations, or biases in AI responses.
  • Apply domain expertise and logical reasoning to resolve ambiguous or unclear outputs.
  • Provide detailed written feedback, tagging, and scoring of outputs to ensure consistency across the evaluation team.
  • Escalate unclear cases and contribute to refining evaluation guidelines.
  • Collaborate with Project Managers and Quality Leads to meet accuracy, reliability, and turnaround benchmarks.

Skills & Competencies

  • Strong critical reading, observational, and evaluative skills across different modalities.
  • Ability to articulate nuanced judgments with precision and clarity.
  • Excellent English comprehension (CEFR B2 or above); additional languages a plus.
  • Familiarity with LLMs, generative AI, and multimodal systems.
  • Strong attention to detail and ability to apply guidelines consistently.
  • Awareness of cultural and linguistic nuances, including potential bias and harm in AI outputs.
  • Comfort with evolving workflows, rapid feedback cycles, and complex quality frameworks.

Requirements

  • Bachelor's degree/ diploma or equivalent educational qualification.
  • 1+ years of experience in data annotation, LLM evaluation, content moderation, or related AI/ML domains.
  • Demonstrated experience working with data annotation tools and software platforms.
  • Strong understanding of language and multimodal communication (instruction following in image generation, fact-checking, narrative coherence in video, etc.).
  • Ability to adapt quickly to changing project directions and fast-paced work environments.
  • Previous experience creating or annotating complex data specifically for Large Language Model (LLM) training.
  • Prior exposure to generative AI, prompt engineering, or LLM fine-tuning workflows is a plus.
  • Comfort working in environments where incidental exposure to NSFW or otherwise sensitive content may occur.

What We Offer

  • Opportunities to shape the evaluation standards for next-generation multimodal AI systems.
  • Innovative and supportive global working environment.
  • Competitive compensation and flexible remote working arrangements.
  • Continuous learning and growth in applied AI evaluation.

Commitment

  • Minimum 20 hours per week (flexible schedule).
  • Opportunity to work more hours if desired.

Key skills/competency

  • AI Evaluation
  • Multimodal AI
  • Large Language Models (LLMs)
  • Data Annotation
  • Quality Assurance
  • Bias Detection
  • Generative AI
  • Content Moderation
  • Critical Thinking
  • Prompt Engineering

Tags:

Multimodal GenAI Evaluation Analyst
AI evaluation
data annotation
content moderation
quality assurance
LLM assessment
multimodal analysis
bias detection
cultural alignment
feedback
Large Language Models
LLMs
Generative AI
Multimodal AI
Vision Models
LVMs
AI systems
data annotation tools
software platforms

Share Job:

How to Get Hired at Braintrust

  • Research iMerit's culture: Study their mission, values, recent projects, and impact on AI development, especially regarding ethical AI and data quality.
  • Tailor your resume for AI evaluation: Highlight experience in data annotation, LLM evaluation, content moderation, and your familiarity with generative AI and multimodal systems. Emphasize attention to detail.
  • Prepare for the iMerit assessment: Expect a 15-30 minute platform assessment testing your evaluation skills; practice critical reading and logical reasoning.
  • Showcase your evaluation expertise: Be ready for a quality test after 10 hours of work, demonstrating consistent application of complex guidelines and articulate feedback.
  • Demonstrate adaptability and precision: In interviews, discuss experiences adapting to evolving workflows and providing nuanced judgments in fast-paced AI environments.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background