2 days ago

Senior Software Engineer, LLM Evaluation

Talent Bridge

Hybrid
Contractor
$120,000
Hybrid

Job Overview

Job TitleSenior Software Engineer, LLM Evaluation
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$120,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Opportunity

Join one of our global AI research clients as a Senior Software Engineer, LLM Evaluation. This project focuses on building advanced evaluation and training datasets to enhance large language models for realistic software engineering tasks. You'll contribute to creating verifiable software engineering challenges from public repository histories using a structured, human-in-the-loop methodology, aiming to expand dataset coverage across various programming languages, complexity levels, and real-world development scenarios.

Role Overview

We are seeking experienced, tech lead–level software engineers comfortable working with high-quality public GitHub repositories (500+ stars). This remote role uniquely combines hands-on engineering with AI model evaluation, directly influencing how AI systems interact with real-world codebases.

What You’ll Do

  • Analyze and triage GitHub issues across widely used open-source repositories.
  • Set up and configure repositories, including Dockerization and development environment automation.
  • Evaluate unit test coverage, quality, and reliability.
  • Run, modify, and debug real-world codebases locally to assess AI model performance in bug-fixing and implementation tasks.
  • Collaborate with AI researchers to identify challenging repositories and issue types for LLM evaluation.
  • Contribute to designing structured, verifiable software engineering tasks.
  • Potentially lead and mentor junior engineers on repository validation projects.

Required Skills

  • 5+ years of professional software engineering experience.
  • Strong expertise in at least one of the following: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby.
  • Deep understanding of software architecture, debugging, and code quality standards.
  • Proficiency with Git, Docker, and development pipeline setup.
  • Ability to navigate and evaluate complex, production-grade codebases.
  • Experience contributing to or reviewing open-source projects is a plus.

Nice to Have

  • Experience participating in AI/LLM evaluation or research initiatives.
  • Background in building developer tools, automation systems, or code verification agents.
  • Experience leading small engineering teams.

Engagement Details

  • Contractor assignment (no medical or paid leave).
  • 20 hours per week with partial PST overlap.
  • Duration: 3 months.
  • Expected start date: Next week.
  • Fully remote.

This Senior Software Engineer, LLM Evaluation role offers a unique opportunity to combine deep software engineering expertise with frontier AI research, directly influencing how large language models understand and solve real-world coding problems.

Key skills/competency

  • LLM Evaluation
  • Software Engineering
  • GitHub Repositories
  • Debugging
  • Code Quality
  • Docker
  • Git
  • Python/JavaScript/Java/Go/Rust/C/C++/C#/Ruby
  • AI Research Collaboration
  • Open Source Contribution

Tags:

Software Engineer
LLM Evaluation
AI Research
Debugging
Code Quality
Docker
Git
Python
JavaScript
Java
Go
Rust
C/C++
C#
Ruby
Open Source
Software Architecture
Development Pipelines
GitHub
Contract

Share Job:

How to Get Hired at Talent Bridge

  • Research Talent Bridge's client focus: Understand their global AI research client's mission and contributions to large language models.
  • Tailor your resume for AI/LLM evaluation: Highlight specific experience with LLM evaluation, software architecture, debugging, and open-source contributions.
  • Showcase GitHub and Docker proficiency: Provide concrete examples of your work with public GitHub repositories, Dockerization, and development pipeline setup.
  • Prepare for technical deep-dives: Be ready to discuss complex codebase analysis, debugging methodologies, and code quality standards in detail.
  • Network strategically: Identify and connect with current or past employees of Talent Bridge or their AI research client on LinkedIn for insights.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background