AI Engineer, Quality
RemoteHunter
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Client
The client organization is a leader in the assurance and audit sector, providing specialized software solutions for cybersecurity, privacy, and financial audits within global commerce and capital markets. They aim to automate and streamline audit and assurance processes, enabling trust between businesses. Supporting over 50 of the top 100 accounting and consulting firms, the company operates in a market exceeding $100 billion. While remote-first, their base is in San Francisco, and they are backed by prominent technology and venture capital investors. The team fosters a diverse and supportive culture, prioritizing collaboration and mutual growth.
About the Opportunity: AI Engineer, Quality
As an AI Engineer, Quality, you will be instrumental in developing and maintaining the evaluation infrastructure crucial for ensuring the reliable performance of AI agents at an enterprise scale. This role is dedicated to establishing evaluations as a core engineering practice by building unified platforms, automated testing pipelines, and robust feedback mechanisms. Your work will enable rapid assessment of new AI models across critical audit workflows, directly contributing to high standards of quality and reliability in AI-driven solutions and impacting both product development and customer satisfaction. The position requires close collaboration with machine learning engineers and subject matter experts, with an emphasis on in-person teamwork at the San Francisco office.
Key Responsibilities
- Design and build a unified evaluation platform for agentic systems and audit workflows.
- Develop observability tools to track agent behavior, execution, and failures in production.
- Manage evaluation infrastructure, including integration with LangSmith and LangGraph.
- Translate customer needs into actionable agent behaviors and workflows.
- Integrate LLMs, tools, retrieval systems, and logic into reliable agent experiences.
- Create automated pipelines for rapid model evaluation across critical workflows.
- Design evaluation frameworks measuring effectiveness, consistency, latency, and cost.
- Implement monitoring systems to detect quality regressions before customer impact.
- Utilize AI-driven methods for designing, building, testing, and iterating evaluations.
- Collaborate with SMEs and ML engineers to curate evaluation datasets from production data.
- Develop prompts, retrieval pipelines, and orchestration systems for scalable performance.
- Define and document evaluation standards and best practices for the engineering team.
- Promote evaluation-driven development and facilitate evaluation processes for the team.
- Partner with product and ML teams to embed evaluation requirements from project start.
- Own large product areas related to evaluation infrastructure and quality assurance.
Requirements
- Multiple years of experience delivering production software in complex systems.
- Proficiency with TypeScript, React, Python, and Postgres.
- Experience building and deploying LLM-powered features in production environments.
- Experience implementing evaluation frameworks for model outputs and agent behaviors.
- Strong understanding of AI evaluation as an integral engineering function.
- Ability to make data-driven decisions and measure key performance metrics.
- Experience with building production-grade observability and feedback systems.
- Strong product judgment and independence in decision-making.
- Preference for rapid prototyping and iterative, reliable system development.
Key skills/competency
- AI Evaluation
- LLM-powered Features
- Automated Testing Pipelines
- Observability Systems
- TypeScript
- React
- Python
- Postgres
- LangSmith
- LangGraph
How to Get Hired at RemoteHunter
- Research the client company's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, especially their focus on AI in audit.
- Tailor your resume for AI Engineer, Quality: Highlight experience with LLM-powered features, evaluation frameworks, and production-grade observability, using keywords from the job description.
- Showcase your technical expertise: Prepare to discuss projects involving TypeScript, React, Python, Postgres, and real-world AI evaluation challenges and solutions.
- Prepare for behavioral interviews: Emphasize collaboration, problem-solving, and your ability to translate customer needs into technical solutions, aligning with their team-oriented culture.
- Demonstrate product judgment: Be ready to discuss how you make data-driven decisions and your approach to rapid prototyping and iterative system development in an AI quality context.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background