8 days ago

Senior Software Engineer LLM Evaluation

Keystone Recruitment

Hybrid
Contractor
$200,000
Hybrid

Job Overview

Job TitleSenior Software Engineer LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$200,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Opportunity

One of our global AI research clients is developing advanced evaluation and benchmarking datasets to improve the performance of large language models in real-world software engineering scenarios. This Senior Software Engineer LLM Evaluation role focuses on assessing AI-generated code and strengthening model reliability across production-grade engineering workflows.

Role Overview

As a Senior Software Engineer LLM Evaluation supporting AI model evaluation, you will contribute to building high-quality datasets used for training and benchmarking large language models. You will work closely with researchers to curate code examples, provide precise technical solutions, and refine AI-generated outputs across multiple programming languages. This role blends hands-on software engineering expertise with structured AI evaluation and research collaboration.

Key Responsibilities

  • Curate and develop realistic software engineering tasks across languages such as Python, JavaScript (including React), C/C++, Java, Rust, and Go
  • Review, evaluate, and refine AI-generated code for efficiency, scalability, correctness, and maintainability
  • Collaborate with cross-functional research teams to enhance AI-driven coding solutions against industry performance benchmarks
  • Design verification mechanisms to automatically validate software engineering solutions
  • Analyze stages of the software development lifecycle (architecture design, API design, prototyping, production deployment, monitoring, and maintenance) and evaluate model performance across these stages
  • Build internal tools or agents to detect code quality issues and error patterns

Requirements

  • Several years of professional software engineering experience
  • At least 2 years of continuous full-time experience at a product-focused technology company
  • Strong expertise in building and deploying scalable, production-grade applications
  • Deep understanding of software architecture, debugging, performance optimization, and code review standards
  • Experience working with modern development workflows and tooling
  • Strong written and verbal communication skills for documenting structured evaluation feedback

Engagement Details

  • Flexible engagement: minimum 10 hours per week, up to 40 hours per week
  • Partial overlap with Pacific Time required
  • Contractor engagement (no medical or paid leave benefits)
  • Initial duration: 1 month, with potential extension based on performance and project needs

Key skills/competency

  • LLM Evaluation
  • AI Code Analysis
  • Software Engineering
  • Benchmarking
  • Dataset Curation
  • Code Review
  • Performance Optimization
  • Software Architecture
  • Debugging
  • Programming Languages (Python, JavaScript, C/C++, Java, Rust, Go)

Tags:

Senior Software Engineer
LLM Evaluation
AI Code Analysis
Software Engineering
Benchmarking
Dataset Curation
Code Review
Performance Optimization
Software Architecture
Debugging
Python
JavaScript
C/C++
Java
Rust
Go
Large Language Models
AI Research
Production Applications

Share Job:

How to Get Hired at Keystone Recruitment

  • Research Keystone Recruitment's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, focusing on their approach to talent acquisition for specialized tech roles.
  • Tailor your resume for LLM Evaluation: Customize your resume to highlight extensive software engineering experience, particularly in AI, LLMs, and code evaluation, using keywords like 'AI-generated code assessment,' 'large language model benchmarking,' and 'software architecture.'
  • Showcase your technical depth: Prepare to discuss specific projects where you've evaluated code quality, optimized performance, or designed scalable applications, demonstrating expertise in Python, JavaScript, C/C++, Java, Rust, or Go, crucial for the Senior Software Engineer LLM Evaluation role.
  • Prepare for a technical and behavioral interview: Expect a combination of technical assessments focusing on software engineering principles, coding challenges, and behavioral questions assessing your problem-solving, collaboration, and communication skills, especially in a remote contractor setting.
  • Demonstrate strong communication skills: As this is a remote role requiring collaboration with AI research teams, emphasize your ability to document structured evaluation feedback and communicate complex technical concepts clearly and concisely.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background