Senior Software Engineer LLM Evaluation
Nexus Consulting
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Opportunity
Nexus Consulting is partnering with a global AI research client to enhance large language models (LLMs) for real-world software engineering tasks. As a Senior Software Engineer LLM Evaluation, you will be crucial in developing advanced evaluation and benchmarking datasets, focusing on assessing AI-generated code and strengthening model reliability across production-grade engineering workflows.
Role Overview
In this Senior Software Engineer LLM Evaluation role, you will contribute to building high-quality datasets essential for training and benchmarking large language models. You will collaborate closely with researchers to curate code examples, provide precise technical solutions, and refine AI-generated outputs across various programming languages. This position uniquely combines hands-on software engineering expertise with structured AI evaluation and collaborative research efforts.
Key Responsibilities
- Curate and develop realistic software engineering tasks across languages such as Python, JavaScript (including React), C/C++, Java, Rust, and Go.
- Review, evaluate, and refine AI-generated code for efficiency, scalability, correctness, and maintainability.
- Collaborate with cross-functional research teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Design verification mechanisms to automatically validate software engineering solutions.
- Analyze stages of the software development lifecycle (architecture design, API design, prototyping, production deployment, monitoring, and maintenance) and evaluate model performance across these stages.
- Build internal tools or agents to detect code quality issues and error patterns.
Requirements
- Several years of professional software engineering experience.
- At least 2 years of continuous full-time experience at a product-focused technology company.
- Strong expertise in building and deploying scalable, production-grade applications.
- Deep understanding of software architecture, debugging, performance optimization, and code review standards.
- Experience working with modern development workflows and tooling.
- Strong written and verbal communication skills for documenting structured evaluation feedback.
Engagement Details
- Flexible engagement: minimum 10 hours per week, up to 40 hours per week.
- Partial overlap with Pacific Time required.
- Contractor engagement (no medical or paid leave benefits).
- Initial duration: 1 month, with potential extension based on performance and project needs.
Key skills/competency
- LLM Evaluation
- Software Engineering
- AI-Generated Code Review
- Python
- JavaScript/React
- C/C++/Java/Rust/Go
- Software Architecture
- Performance Optimization
- Data Set Curation
- Debugging
How to Get Hired at Nexus Consulting
- Research Nexus Consulting's mission: Understand their client focus in AI research and how your skills align with advanced LLM evaluation.
- Tailor your resume: Highlight extensive software engineering experience, specifically in code quality, debugging, and LLM evaluation or AI-generated code review.
- Showcase programming language expertise: Emphasize proficiency in Python, JavaScript, C/C++, Java, Rust, and Go, with practical project examples.
- Prepare for technical discussions: Be ready to discuss scalable application architecture, performance optimization, and modern development workflows.
- Demonstrate analytical skills: Practice explaining how you'd assess code efficiency, correctness, and maintainability for AI-generated solutions.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background