10 hours ago

Member of Technical Staff, LLM Evaluation

Microsoft AI

On Site
Full Time
$250,000
Redmond, WA

Job Overview

Job TitleMember of Technical Staff, LLM Evaluation
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$250,000
LocationRedmond, WA

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Member of Technical Staff, LLM Evaluation

As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting-edge methodologies to help us evaluate how well Copilot performs in real-world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real-time signals on Copilot performance. We're looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.

Responsibilities

  • Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training.
  • Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious.
  • Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes.
  • Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems.
  • Maintain a user-oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters.
  • Track advances in research, identify relevant state-of-the-art techniques, and adapt algorithms to drive innovation in production systems serving millions of users.

Qualifications

Required Qualifications:

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 7+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 10+ years data science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)OR equivalent experience.

Preferred Qualifications:

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 8+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 10+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 12+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
  • Experience prompting and working with large language models.
  • Experience writing production-quality Python code.
  • Demonstrated interest in Responsible AI.

Key skills/competency

  • LLM Evaluation
  • Machine Learning
  • Natural Language Processing
  • Data Science
  • Prompt Engineering
  • Statistical Techniques
  • Automated Testing
  • User Research
  • Algorithm Adaptation
  • Problem Solving

Tags:

LLM Evaluation Specialist
LLM evaluation
prompt engineering
classifier training
data mining
automated testing
user research
natural language processing
failure mode analysis
mitigation strategies
algorithm adaptation
Python
Large Language Models
Machine Learning
Data Science
Statistical techniques
AI systems
Evaluation frameworks
Natural Language Processing
MLOps
Data analysis

Share Job:

How to Get Hired at Microsoft AI

  • Research Microsoft AI's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Customize your resume to highlight experience in LLM evaluation, machine learning, and natural language processing, aligning with Microsoft AI's focus.
  • Showcase relevant projects: Prepare to discuss specific projects where you developed evaluation methodologies, trained classifiers, or worked with large language models.
  • Master technical and behavioral interviews: Practice data science, ML evaluation, and problem-solving questions, demonstrating a growth mindset and collaborative spirit.
  • Network effectively: Connect with current Microsoft AI employees on LinkedIn to gain insights and potentially learn about internal opportunities.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background