4 hours ago

AI Engineer, Quality

RemoteHunter

Hybrid
Full Time
$200,000
Hybrid

Job Overview

Job TitleAI Engineer, Quality
Job TypeFull Time
Offered Salary$200,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Company

The organization operates in the assurance and audit sector, focusing on cybersecurity, privacy, and financial audit within global commerce and capital markets. It addresses the challenge of enabling trust between businesses by automating and streamlining the work of audit and assurance practitioners through specialized software. The company supports over 50 of the top 100 accounting and consulting firms, working in a market valued at over $100 billion. It functions as a remote-first company with a base in San Francisco, backed by notable investors in technology and venture capital. The team values diversity and a supportive culture with a strong emphasis on collaboration and mutual growth.

About the Role

The AI Engineer, Quality will develop and maintain the evaluation infrastructure that ensures AI agents perform reliably at enterprise scale. This role focuses exclusively on elevating evaluations as a core engineering practice by creating unified platforms, automated testing pipelines, and feedback mechanisms to assess new AI models quickly across critical workflows. The position contributes to maintaining high standards of quality and reliability for AI-driven audit workflows, influencing both product development and customer satisfaction. It requires close collaboration with machine learning engineers and subject matter experts and emphasizes in-person teamwork at the San Francisco office.

Responsibilities

  • Design and build a unified evaluation platform for agentic systems and audit workflows
  • Develop observability tools to track agent behavior, execution, and failures in production
  • Manage evaluation infrastructure, including integration with LangSmith and LangGraph
  • Translate customer needs into actionable agent behaviors and workflows
  • Integrate LLMs, tools, retrieval systems, and logic into reliable agent experiences
  • Create automated pipelines for rapid model evaluation across critical workflows
  • Design evaluation frameworks measuring effectiveness, consistency, latency, and cost
  • Implement monitoring systems to detect quality regressions before customer impact
  • Use AI-driven methods for designing, building, testing, and iterating evaluations
  • Collaborate with SMEs and ML engineers to curate evaluation datasets from production data
  • Develop prompts, retrieval pipelines, and orchestration systems for scalable performance
  • Define and document evaluation standards and best practices for the engineering team
  • Promote evaluation-driven development and facilitate evaluation processes for the team
  • Partner with product and ML teams to embed evaluation requirements from project start
  • Own large product areas related to evaluation infrastructure and quality assurance

Qualifications

  • Multiple years of experience delivering production software in complex systems
  • Proficiency with TypeScript, React, Python, and Postgres
  • Experience building and deploying LLM-powered features in production environments
  • Experience implementing evaluation frameworks for model outputs and agent behaviors
  • Strong understanding of AI evaluation as an integral engineering function
  • Ability to make data-driven decisions and measure key performance metrics
  • Experience with building production-grade observability and feedback systems
  • Strong product judgement and independence in decision-making
  • Preference for rapid prototyping and iterative, reliable system development

Pay Range and Compensation Package

The pay range and compensation package for this role will be determined based on the candidate’s experience, skills, and other relevant factors.

Equal Opportunity Statement

Our client is an an equal opportunity employer. They celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, or national origin.

Note

RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company’s career page or ATS.

Key skills/competency

  • AI Evaluation
  • Agentic Systems
  • LLM Integration
  • Automated Testing
  • Python
  • TypeScript
  • Postgres
  • Observability Tools
  • Data-driven Decisions
  • Product Judgment

Tags:

AI Engineer, Quality
AI evaluation
agentic systems
automated testing
LLM integration
observability
data curation
prompt engineering
quality assurance
workflow automation
engineering best practices
TypeScript
React
Python
Postgres
LangSmith
LangGraph
LLMs
machine learning
production software
AI infrastructure

Share Job:

How to Get Hired at RemoteHunter

  • Research the company's vision: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, focusing on their impact in the assurance and audit sector.
  • Tailor your resume: Highlight your expertise in AI evaluation, agentic systems, Python, TypeScript, and LLM implementation, aligning with the AI Engineer, Quality role requirements.
  • Showcase agentic system expertise: Prepare to demonstrate your experience in building, deploying, and evaluating robust AI agents for production environments during interviews.
  • Prepare for technical interviews: Focus on concepts related to AI quality assurance, automated testing pipelines, observability, data curation, and prompt engineering specific to AI applications.
  • Emphasize collaboration skills: Be ready to discuss experiences working closely with machine learning engineers, subject matter experts, and product teams to embed quality throughout the development lifecycle.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background