7 days ago

Senior Software Engineer LLM Evaluation

Nexus Consulting

Hybrid
Contractor
$260,000
Hybrid

Job Overview

Job TitleSenior Software Engineer LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$260,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Senior Software Engineer LLM Evaluation

As a Senior Software Engineer specializing in LLM Evaluation, you will join one of Nexus Consulting's global AI research clients. This critical role involves developing and refining advanced evaluation and benchmarking datasets to enhance the real-world performance of large language models in software engineering scenarios. You will specifically focus on assessing AI-generated code and strengthening model reliability across various production-grade engineering workflows.

This is an hourly contract position, offered on a remote basis, with flexible engagement options from a minimum of 10 hours up to 40 hours per week. A partial overlap with Pacific Time is required for collaboration.

Role Overview

In this role, you will be instrumental in building high-quality datasets essential for both training and benchmarking large language models. Your work will involve close collaboration with research teams to curate relevant code examples, develop precise technical solutions, and meticulously refine AI-generated outputs across a diverse set of programming languages. This position uniquely blends deep hands-on software engineering expertise with structured AI model evaluation and collaborative research.

Key Responsibilities

  • Curate and develop realistic software engineering tasks across multiple languages, including Python, JavaScript (and React), C/C++, Java, Rust, and Go.
  • Review, evaluate, and refine AI-generated code for critical attributes such as efficiency, scalability, correctness, and maintainability.
  • Collaborate effectively with cross-functional research teams to continuously enhance AI-driven coding solutions against established industry performance benchmarks.
  • Design robust verification mechanisms capable of automatically validating complex software engineering solutions.
  • Analyze various stages of the software development lifecycle, including architecture design, API design, prototyping, production deployment, monitoring, and maintenance, to evaluate model performance throughout.
  • Build internal tools or agents specifically designed to detect common code quality issues and identify recurring error patterns.

Requirements

  • Several years of extensive professional software engineering experience.
  • At least 2 years of continuous, full-time experience gained at a product-focused technology company.
  • Strong expertise in building and successfully deploying scalable, production-grade applications.
  • Deep understanding of fundamental software architecture principles, effective debugging techniques, performance optimization strategies, and established code review standards.
  • Proven experience working within modern development workflows and utilizing contemporary tooling.
  • Strong written and verbal communication skills, essential for documenting structured evaluation feedback and collaborating effectively.

Engagement Details

  • Flexible engagement: minimum 10 hours per week, up to 40 hours per week.
  • Partial overlap with Pacific Time required to facilitate team collaboration.
  • This is a contractor engagement; no medical or paid leave benefits are provided.
  • Initial duration: 1 month, with strong potential for extension based on performance and evolving project needs.

Key skills/competency

  • LLM Evaluation
  • AI-Generated Code Analysis
  • Software Engineering
  • Python, JavaScript, C/C++, Java, Rust, Go
  • Software Architecture
  • Performance Optimization
  • Code Review
  • Dataset Curation
  • Debugging
  • Verification Mechanism Design

Tags:

Senior Software Engineer
LLM Evaluation
AI model evaluation
code review
dataset curation
software architecture
debugging
performance optimization
verification
software development lifecycle
tool building
research collaboration
Python
JavaScript
React
C/C++
Java
Rust
Go
Large Language Models
AI
Machine Learning

Share Job:

How to Get Hired at Nexus Consulting

  • Research Nexus Consulting's clients: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, especially focusing on their AI research initiatives.
  • Tailor your resume: Highlight deep software engineering, LLM evaluation, and diverse language skills (Python, JavaScript, C/C++, Java, Rust, Go) for this specialized role.
  • Showcase relevant projects: Demonstrate experience with AI-generated code analysis, LLM benchmarking, or building robust evaluation frameworks.
  • Prepare for technical deep-dives: Expect in-depth questions on software architecture, debugging complex systems, performance optimization, and code quality standards.
  • Emphasize communication: Practice articulating technical feedback clearly and demonstrating collaborative problem-solving, crucial for working with research teams.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background