10 days ago

Software Engineer

Keystone Recruitment

Hybrid
Contractor
$140,000
Hybrid

Job Overview

Job TitleSoftware Engineer
Job TypeContractor
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$140,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Software Engineer (AI Systems Evaluator)

Keystone Recruitment's client, a leading AI research organization, is seeking a Software Engineer to evaluate and improve advanced conversational AI systems. This role focuses on enhancing how large language models (LLMs) reason about code, generate solutions, and explain technical concepts across various programming and system design scenarios. This is an hourly contract position for independent contractors.

Key Responsibilities

  • Evaluate AI-generated responses to software engineering and coding queries for correctness, clarity, and completeness.
  • Execute and test code to validate functionality, performance, and edge-case handling.
  • Perform fact-checking using authoritative technical references and public sources.
  • Annotate model outputs by identifying strengths, weaknesses, bugs, and conceptual gaps.
  • Assess code quality, readability, algorithmic soundness, and explanation quality.
  • Ensure outputs align with established conversational and technical guidelines.
  • Apply standardized evaluation rubrics and benchmarks consistently.

Required Qualifications

  • Bachelor’s, Master’s, or PhD in Computer Science or a closely related field.
  • Significant professional experience in software engineering or system design.
  • Expert-level proficiency in at least one major programming language (e.g., Python, Java, C++, JavaScript, Go, Rust).
  • Ability to independently solve medium-to-hard algorithmic problems.
  • Experience contributing to open-source projects with accepted pull requests.
  • Strong familiarity with using LLMs for coding and understanding their limitations.
  • Exceptional attention to detail and ability to detect subtle technical errors.

Preferred Qualifications

  • Prior experience with RLHF, model evaluation, or technical data annotation.
  • Background in competitive programming or algorithmic problem solving.
  • Experience reviewing or maintaining production-level code.
  • Familiarity with multiple programming paradigms and technology stacks.
  • Ability to explain complex technical topics to non-technical audiences.

What Success Looks Like

  • You consistently identify logical errors, inefficiencies, and misleading explanations in AI-generated code.
  • Your feedback measurably improves the accuracy, reliability, and clarity of model outputs.
  • You deliver high-quality, reproducible evaluation artifacts that strengthen AI system performance.

Contract & Payment Terms

  • Independent contractor engagement.
  • Fully remote with flexible scheduling.
  • Weekly payments via Stripe or Wise.
  • Project scope and duration may vary based on performance and client needs.
  • No access to confidential or proprietary employer data is required.
  • H1-B and STEM OPT sponsorship is not available.

Key skills/competency

  • Large Language Models (LLMs)
  • Software Engineering
  • Code Evaluation
  • Algorithmic Problem Solving
  • Python (or Java, C++, JavaScript, Go, Rust)
  • System Design
  • Technical Fact-Checking
  • AI Model Annotation
  • Open-Source Contributions
  • Debugging & Testing

Tags:

Software Engineer
AI Evaluation
Code Review
Algorithmic Problem Solving
Fact-Checking
Model Annotation
Software Quality
Debugging
Performance Testing
Systems Design
Python
Java
C++
JavaScript
Go
Rust
LLMs
AI
Machine Learning
Large Language Models

Share Job:

How to Get Hired at Keystone Recruitment

  • Research Keystone Recruitment's client: Understand the AI research organization's mission, recent breakthroughs, and the impact of their conversational AI systems.
  • Showcase coding and LLM expertise: Tailor your resume to highlight significant professional experience in software engineering, system design, and strong familiarity with LLMs for coding tasks.
  • Emphasize problem-solving skills: Prepare to demonstrate your ability to solve medium-to-hard algorithmic problems and articulate your approach to code evaluation.
  • Highlight attention to detail: During interviews, provide examples of how you detect subtle technical errors, perform fact-checking, and ensure code quality and clarity.
  • Prepare for the technical assessment: Expect a short technical and evaluation assessment designed to test your proficiency in evaluating AI-generated code and technical concepts.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background