7 days ago

Senior Software Engineer LLM Evaluation

Quik Hire Staffing

Hybrid
Contractor
$200,000
Hybrid

Job Overview

Job TitleSenior Software Engineer LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$200,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Opportunity

One of our global AI research clients, supported by Quik Hire Staffing, is at the forefront of developing advanced evaluation and benchmarking datasets to significantly improve the performance of large language models (LLMs) in real-world software engineering scenarios. This Senior Software Engineer LLM Evaluation role is pivotal, focusing on meticulously assessing AI-generated code and bolstering model reliability across critical production-grade engineering workflows.

Role Overview

As a Senior Software Engineer LLM Evaluation, you will play a crucial role in building high-quality datasets essential for training and benchmarking large language models. Your work will involve close collaboration with researchers to curate compelling code examples, provide precise technical solutions, and refine AI-generated outputs across a diverse range of programming languages. This position uniquely blends hands-on software engineering expertise with structured AI evaluation methodologies and collaborative research efforts.

Key Responsibilities

  • Curate and develop realistic software engineering tasks across languages such as Python, JavaScript (including React), C/C++, Java, Rust, and Go.
  • Review, evaluate, and refine AI-generated code for efficiency, scalability, correctness, and maintainability.
  • Collaborate with cross-functional research teams to enhance AI-driven coding solutions against industry performance benchmarks.
  • Design robust verification mechanisms to automatically validate software engineering solutions.
  • Analyze stages of the software development lifecycle (architecture design, API design, prototyping, production deployment, monitoring, and maintenance) and evaluate model performance across these stages.
  • Build internal tools or agents to detect code quality issues and error patterns.

Requirements

  • Several years of professional software engineering experience.
  • At least 2 years of continuous full-time experience at a product-focused technology company.
  • Strong expertise in building and deploying scalable, production-grade applications.
  • Deep understanding of software architecture, debugging, performance optimization, and code review standards.
  • Experience working with modern development workflows and tooling.
  • Strong written and verbal communication skills for documenting structured evaluation feedback.

Engagement Details

  • Flexible engagement: minimum 10 hours per week, up to 40 hours per week.
  • Partial overlap with Pacific Time required.
  • Contractor engagement (no medical or paid leave benefits).
  • Initial duration: 1 month, with potential extension based on performance and project needs.

Key skills/competency

  • LLM Evaluation
  • Software Engineering
  • AI-Generated Code Review
  • Python
  • JavaScript
  • C/C++/Java/Rust/Go
  • Scalable Applications
  • Software Architecture
  • Performance Optimization
  • Dataset Curation

Tags:

Senior Software Engineer
LLM Evaluation
AI Engineering
Code Quality
Python
JavaScript
C++
Java
Rust
Go
Large Language Models
AI
Machine Learning
Software Architecture
Scalable Applications
Performance Optimization
Dataset Curation
Code Review
SDLC Analysis
Modern Development Workflows

Share Job:

How to Get Hired at Quik Hire Staffing

  • Research Quik Hire Staffing's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their client's AI research goals.
  • Tailor your resume for LLM Evaluation: Highlight extensive experience in software engineering, AI-generated code review, and dataset creation, emphasizing skills in Python, JavaScript, C++, Java, Rust, or Go.
  • Showcase deep software expertise: Prepare to discuss scalable application development, software architecture, debugging, and performance optimization with specific examples.
  • Practice AI evaluation scenarios: Anticipate questions on assessing AI model outputs, designing verification mechanisms, and improving code quality for LLMs.
  • Demonstrate strong communication skills: Be ready to articulate technical solutions and provide structured evaluation feedback clearly, crucial for this collaborative role.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background