10 days ago
Software Engineer
Keystone Recruitment
Hybrid
Contractor
$140,000
Hybrid
Job Overview
Job TitleSoftware Engineer
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$140,000
LocationHybrid
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Software Engineer (AI Systems Evaluator)
Keystone Recruitment's client, a leading AI research organization, is seeking a Software Engineer to evaluate and improve advanced conversational AI systems. This role focuses on enhancing how large language models (LLMs) reason about code, generate solutions, and explain technical concepts across various programming and system design scenarios. This is an hourly contract position for independent contractors.
Key Responsibilities
- Evaluate AI-generated responses to software engineering and coding queries for correctness, clarity, and completeness.
- Execute and test code to validate functionality, performance, and edge-case handling.
- Perform fact-checking using authoritative technical references and public sources.
- Annotate model outputs by identifying strengths, weaknesses, bugs, and conceptual gaps.
- Assess code quality, readability, algorithmic soundness, and explanation quality.
- Ensure outputs align with established conversational and technical guidelines.
- Apply standardized evaluation rubrics and benchmarks consistently.
Required Qualifications
- Bachelor’s, Master’s, or PhD in Computer Science or a closely related field.
- Significant professional experience in software engineering or system design.
- Expert-level proficiency in at least one major programming language (e.g., Python, Java, C++, JavaScript, Go, Rust).
- Ability to independently solve medium-to-hard algorithmic problems.
- Experience contributing to open-source projects with accepted pull requests.
- Strong familiarity with using LLMs for coding and understanding their limitations.
- Exceptional attention to detail and ability to detect subtle technical errors.
Preferred Qualifications
- Prior experience with RLHF, model evaluation, or technical data annotation.
- Background in competitive programming or algorithmic problem solving.
- Experience reviewing or maintaining production-level code.
- Familiarity with multiple programming paradigms and technology stacks.
- Ability to explain complex technical topics to non-technical audiences.
What Success Looks Like
- You consistently identify logical errors, inefficiencies, and misleading explanations in AI-generated code.
- Your feedback measurably improves the accuracy, reliability, and clarity of model outputs.
- You deliver high-quality, reproducible evaluation artifacts that strengthen AI system performance.
Contract & Payment Terms
- Independent contractor engagement.
- Fully remote with flexible scheduling.
- Weekly payments via Stripe or Wise.
- Project scope and duration may vary based on performance and client needs.
- No access to confidential or proprietary employer data is required.
- H1-B and STEM OPT sponsorship is not available.
Key skills/competency
- Large Language Models (LLMs)
- Software Engineering
- Code Evaluation
- Algorithmic Problem Solving
- Python (or Java, C++, JavaScript, Go, Rust)
- System Design
- Technical Fact-Checking
- AI Model Annotation
- Open-Source Contributions
- Debugging & Testing
How to Get Hired at Keystone Recruitment
- Research Keystone Recruitment's client: Understand the AI research organization's mission, recent breakthroughs, and the impact of their conversational AI systems.
- Showcase coding and LLM expertise: Tailor your resume to highlight significant professional experience in software engineering, system design, and strong familiarity with LLMs for coding tasks.
- Emphasize problem-solving skills: Prepare to demonstrate your ability to solve medium-to-hard algorithmic problems and articulate your approach to code evaluation.
- Highlight attention to detail: During interviews, provide examples of how you detect subtle technical errors, perform fact-checking, and ensure code quality and clarity.
- Prepare for the technical assessment: Expect a short technical and evaluation assessment designed to test your proficiency in evaluating AI-generated code and technical concepts.
Frequently Asked Questions
Find answers to common questions about this job opportunity
01What kind of AI systems will I be evaluating as a Software Engineer at Keystone Recruitment's client?
02What is the typical engagement model for the Software Engineer role via Keystone Recruitment?
03What programming languages are essential for this Software Engineer position?
04Does Keystone Recruitment's client offer visa sponsorship for this Software Engineer contract role?
05What kind of technical assessment can I expect for the Software Engineer role?
06What does 'success look like' in this Software Engineer role focused on AI evaluation?
07Is prior experience with LLM evaluation or RLHF necessary for this Software Engineer position?
Explore similar opportunities that match your background