7 days ago

Senior Software Engineer LLM Evaluation

Keystone Recruitment

Hybrid
Contractor
$250,000
Hybrid

Job Overview

Job TitleSenior Software Engineer LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$250,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Opportunity

One of our global AI research clients, supported by Keystone Recruitment, is at the forefront of developing advanced evaluation and benchmarking datasets to significantly enhance the performance of large language models in real-world software engineering scenarios. As a Senior Software Engineer LLM Evaluation, you will play a crucial role in assessing AI-generated code and strengthening model reliability across production-grade engineering workflows.

Role Overview

As a Senior Software Engineer LLM Evaluation, you will be instrumental in building high-quality datasets essential for training and benchmarking large language models. This role involves close collaboration with researchers to curate insightful code examples, provide precise technical solutions, and refine AI-generated outputs across a multitude of programming languages. This position uniquely blends hands-on software engineering expertise with structured AI evaluation and collaborative research.

Key Responsibilities

  • Curate and develop realistic software engineering tasks across various languages, including Python, JavaScript (and React), C/C++, Java, Rust, and Go.
  • Review, evaluate, and meticulously refine AI-generated code for optimal efficiency, scalability, correctness, and maintainability.
  • Collaborate effectively with cross-functional research teams to continuously enhance AI-driven coding solutions against rigorous industry performance benchmarks.
  • Design robust verification mechanisms to automatically validate software engineering solutions.
  • Analyze various stages of the software development lifecycle, such as architecture design, API design, prototyping, production deployment, monitoring, and maintenance, evaluating model performance across these critical phases.
  • Build internal tools or agents designed to detect code quality issues and identify error patterns.

Requirements

  • Several years of professional software engineering experience.
  • At least 2 years of continuous full-time experience at a product-focused technology company.
  • Strong expertise in building and deploying scalable, production-grade applications.
  • Deep understanding of software architecture, debugging, performance optimization, and code review standards.
  • Experience working with modern development workflows and tooling.
  • Strong written and verbal communication skills, crucial for documenting structured evaluation feedback.

Engagement Details

  • Flexible engagement with a minimum of 10 hours per week, up to a maximum of 40 hours per week.
  • Partial overlap with Pacific Time is required.
  • This is a contractor engagement; no medical or paid leave benefits are provided.
  • Initial duration is 1 month, with potential for extension based on performance and project needs.

Key skills/competency

  • LLM Evaluation
  • AI-generated Code
  • Software Engineering
  • Dataset Curation
  • Benchmarking
  • Python
  • JavaScript
  • Go
  • Rust
  • Code Review

Tags:

Senior Software Engineer
LLM Evaluation
AI
Software Engineering
Dataset Curation
Benchmarking
Code Review
Python
JavaScript
C++
Java
Rust
Go
React
Production-grade Applications
Software Architecture
Debugging
Performance Optimization
Development Workflows
AI Models

Share Job:

How to Get Hired at Keystone Recruitment

  • Research Keystone Recruitment's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their client focus.
  • Tailor your resume for LLM evaluation: Customize your resume to highlight experience in AI, large language models, code quality, and software architecture relevant to the Senior Software Engineer LLM Evaluation role.
  • Showcase diverse programming skills: Prepare to demonstrate proficiency in Python, JavaScript, C/C++, Java, Rust, and Go, emphasizing experience in production environments.
  • Prepare for technical assessments: Practice evaluating AI-generated code, debugging, and optimizing software solutions, aligning with the core responsibilities of this Senior Software Engineer LLM Evaluation position.
  • Highlight communication and collaboration: During interviews, emphasize your ability to communicate complex technical feedback and collaborate effectively with research teams.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background