3 hours ago

AI Engineer, Quality

RemoteHunter

Hybrid
Full Time
$185,000
Hybrid

Job Overview

Job TitleAI Engineer, Quality
Job TypeFull Time
Offered Salary$185,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Company

RemoteHunter's client operates in the assurance and audit sector, focusing on cybersecurity, privacy, and financial audit within global commerce and capital markets. The company addresses the challenge of enabling trust between businesses by automating and streamlining the work of audit and assurance practitioners through specialized software. Supporting over 50 of the top 100 accounting and consulting firms, they work in a market valued at over $100 billion. The client functions as a remote-first company with a base in San Francisco, backed by notable investors in technology and venture capital. The team values diversity and a supportive culture with a strong emphasis on collaboration and mutual growth.

About the Role: AI Engineer, Quality

The AI Engineer, Quality will develop and maintain the evaluation infrastructure that ensures AI agents perform reliably at enterprise scale. This role focuses exclusively on elevating evaluations as a core engineering practice by creating unified platforms, automated testing pipelines, and feedback mechanisms to assess new AI models quickly across critical workflows. The position contributes to maintaining high standards of quality and reliability for AI-driven audit workflows, influencing both product development and customer satisfaction. It requires close collaboration with machine learning engineers and subject matter experts and emphasizes in-person teamwork at the San Francisco office.

Responsibilities

  • Design and build a unified evaluation platform for agentic systems and audit workflows
  • Develop observability tools to track agent behavior, execution, and failures in production
  • Manage evaluation infrastructure, including integration with LangSmith and LangGraph
  • Translate customer needs into actionable agent behaviors and workflows
  • Integrate LLMs, tools, retrieval systems, and logic into reliable agent experiences
  • Create automated pipelines for rapid model evaluation across critical workflows
  • Design evaluation frameworks measuring effectiveness, consistency, latency, and cost
  • Implement monitoring systems to detect quality regressions before customer impact
  • Use AI-driven methods for designing, building, testing, and iterating evaluations
  • Collaborate with SMEs and ML engineers to curate evaluation datasets from production data
  • Develop prompts, retrieval pipelines, and orchestration systems for scalable performance
  • Define and document evaluation standards and best practices for the engineering team
  • Promote evaluation-driven development and facilitate evaluation processes for the team
  • Partner with product and ML teams to embed evaluation requirements from project start
  • Own large product areas related to evaluation infrastructure and quality assurance

Qualifications

  • Multiple years of experience delivering production software in complex systems
  • Proficiency with TypeScript, React, Python, and Postgres
  • Experience building and deploying LLM-powered features in production environments
  • Experience implementing evaluation frameworks for model outputs and agent behaviors
  • Strong understanding of AI evaluation as an integral engineering function
  • Ability to make data-driven decisions and measure key performance metrics
  • Experience with building production-grade observability and feedback systems
  • Strong product judgement and independence in decision-making
  • Preference for rapid prototyping and iterative, reliable system development

Key skills/competency

  • AI Evaluation
  • Agentic Systems
  • Automated Testing
  • Quality Assurance
  • LLM Integration
  • Observability Tools
  • Python/TypeScript
  • Postgres
  • Data-driven Decisions
  • Product Judgement

Tags:

AI Engineer
Quality Assurance
AI Evaluation
Agentic Systems
Automated Testing
LLMs
Observability
Python
TypeScript
Postgres
React
Machine Learning
Data Curation
Prompt Engineering
Workflow Design
Engineering Standards
Cloud Platforms
System Design
Product Development
Enterprise Software

Share Job:

How to Get Hired at RemoteHunter

  • Research RemoteHunter's client's mission: Study their focus on cybersecurity, privacy, and financial audit within global markets to align your application.
  • Tailor your resume for AI evaluation: Highlight experience with LLM-powered features, agentic systems, and building robust evaluation frameworks.
  • Showcase technical proficiency: Emphasize your skills in TypeScript, React, Python, Postgres, and any experience with LangSmith/LangGraph.
  • Prepare for technical and behavioral interviews: Be ready to discuss designing complex evaluation infrastructure and your collaborative approach to quality assurance.
  • Demonstrate product judgment: Share examples of data-driven decision-making and your ability to own significant product areas in AI quality.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background