Senior Software Engineer LLM Evaluation
Talent Bridge
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Senior Software Engineer LLM Evaluation
One of Talent Bridge's global AI research clients is actively seeking a Senior Software Engineer LLM Evaluation to join their team remotely. This opportunity focuses on enhancing large language models for realistic software engineering tasks through advanced evaluation and training datasets. The project involves creating verifiable software engineering challenges from public repository histories using a structured, human-in-the-loop methodology, aiming to broaden dataset coverage across various programming languages, complexity levels, and real-world development scenarios.
As a Senior Software Engineer LLM Evaluation, you will be an experienced, tech lead-level software engineer comfortable working with high-quality public GitHub repositories (500+ stars). This role uniquely combines hands-on engineering with critical AI model evaluation, directly influencing how AI systems interact with real-world codebases.
What You’ll Do
- Analyze and triage GitHub issues across widely used open-source repositories.
- Set up and configure repositories, including Dockerization and development environment automation.
- Evaluate unit test coverage, quality, and reliability.
- Run, modify, and debug real-world codebases locally to assess AI model performance in bug-fixing and implementation tasks.
- Collaborate with AI researchers to identify challenging repositories and issue types for LLM evaluation.
- Contribute to designing structured, verifiable software engineering tasks.
- Potentially lead and mentor junior engineers on repository validation projects.
Required Skills
- 5+ years of professional software engineering experience.
- Strong expertise in at least one of the following: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby.
- Deep understanding of software architecture, debugging, and code quality standards.
- Proficiency with Git, Docker, and development pipeline setup.
- Ability to navigate and evaluate complex, production-grade codebases.
- Experience contributing to or reviewing open-source projects is a plus.
Nice to Have
- Experience participating in AI/LLM evaluation or research initiatives.
- Background in building developer tools, automation systems, or code verification agents.
- Experience leading small engineering teams.
Engagement Details
This is an hourly contract position for an independent contractor, requiring approximately 20 hours per week with partial PST overlap. The contract duration is 3 months, with an expected start date next week. This is a fully remote role.
This role offers a unique opportunity to combine deep software engineering expertise with frontier AI research, directly influencing how large language models understand and solve real-world coding problems.
Key skills/competency
- LLM Evaluation
- Software Engineering
- GitHub Repositories
- Debugging
- Code Quality
- Docker
- Git
- Open Source
- Python
- JavaScript
How to Get Hired at Talent Bridge
- Research Talent Bridge's client: Investigate the global AI research landscape and companies focusing on LLM evaluation to understand their mission.
- Tailor your resume: Highlight your 5+ years of software engineering, LLM evaluation experience, debugging, and proficiency in languages like Python or Java.
- Showcase open-source contributions: Emphasize any experience contributing to or reviewing public GitHub repositories with 500+ stars.
- Prepare for technical depth: Be ready to discuss software architecture, debugging complex codebases, Git, and Docker expertise during interviews.
- Demonstrate collaboration: Prepare examples of how you've worked with researchers or mentored junior engineers in past projects.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background