7 days ago

Senior Software Engineer, LLM Evaluation

Talent Bridge

Remote
Contractor
$120,000
Remote

Job Overview

Job TitleSenior Software Engineer, LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$120,000
LocationRemote

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Opportunity

One of our global AI research clients is building advanced evaluation and training datasets to improve large language models on realistic software engineering tasks. This project focuses on creating verifiable software engineering challenges derived from public repository histories using a structured, human-in-the-loop approach. The goal is to expand dataset coverage across programming languages, complexity levels, and real-world development scenarios.

Role Overview

We are seeking experienced, tech lead–level software engineers who are comfortable working with high-quality public GitHub repositories (500+ stars). This role combines hands-on engineering work with AI model evaluation, contributing directly to how AI systems interact with real-world codebases.

What You’ll Do

  • Analyze and triage GitHub issues across widely used open-source repositories
  • Set up and configure repositories, including Dockerization and development environment automation
  • Evaluate unit test coverage, quality, and reliability
  • Run, modify, and debug real-world codebases locally to assess AI model performance in bug-fixing and implementation tasks
  • Collaborate with AI researchers to identify challenging repositories and issue types for LLM evaluation
  • Contribute to designing structured, verifiable software engineering tasks
  • Potentially lead and mentor junior engineers on repository validation projects

Required Skills

  • 5+ years of professional software engineering experience
  • Strong expertise in at least one of the following: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby
  • Deep understanding of software architecture, debugging, and code quality standards
  • Proficiency with Git, Docker, and development pipeline setup
  • Ability to navigate and evaluate complex, production-grade codebases
  • Experience contributing to or reviewing open-source projects is a plus

Nice to Have

  • Experience participating in AI/LLM evaluation or research initiatives
  • Background in building developer tools, automation systems, or code verification agents
  • Experience leading small engineering teams

Engagement Details

  • Contractor assignment (no medical or paid leave)
  • 20 hours per week with partial PST overlap
  • Duration: 3 months
  • Expected start date: Next week
  • Fully remote

This role offers a unique opportunity to combine deep software engineering expertise with frontier AI research, directly influencing how large language models understand and solve real-world coding problems.

Key skills/competency

  • LLM Evaluation
  • Software Engineering
  • Debugging
  • GitHub
  • Docker
  • Python
  • JavaScript
  • Software Architecture
  • Code Quality
  • AI Research

Tags:

Senior Software Engineer, LLM Evaluation
LLM evaluation
AI research
Software engineering
Debugging
GitHub
Docker
Python
JavaScript
Java
Go
Rust
C/C++
C#
Ruby
Software architecture
Code quality
Development pipeline
Automation systems
Code verification

Share Job:

How to Get Hired at Talent Bridge

  • Research Talent Bridge's client and culture: Study the global AI research client's mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their focus on advanced LLM evaluation.
  • Tailor your resume for LLM evaluation: Customize your resume to highlight extensive software engineering experience, debugging skills, proficiency with Git and Docker, and any specific involvement in AI/LLM evaluation projects.
  • Showcase open-source contributions: Emphasize your experience contributing to or reviewing open-source projects, especially those on GitHub, demonstrating your ability to navigate complex codebases.
  • Prepare for technical challenges: Be ready to discuss your expertise in programming languages like Python or Java, software architecture, and your approach to setting up development environments and evaluating code quality.
  • Demonstrate collaborative problem-solving: During interviews, articulate how you've collaborated with research teams or mentored junior engineers, showcasing your ability to work within a human-in-the-loop AI development process.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background