Senior Research Engineer, Post-training & Evaluation
Reddit, Inc.
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Reddit, Inc.
Reddit is a vibrant community built on shared interests, passion, and trust, hosting the internet's most open and authentic conversations. With over 100,000 active communities and approximately 116 million daily active unique visitors, Reddit is a significant source of information. The company is actively expanding its teams with top talent.
This role offers complete remote flexibility within the United States. For those living near our physical offices in San Francisco, Los Angeles, New York City, or Chicago, you are welcome to work on-site as often as you prefer.
The AI Engineering Team
The AI Engineering team at Reddit is driving a strategic initiative to develop Reddit-native foundational Large Language Models (LLMs). This team operates at the intersection of applied research and massive-scale infrastructure, focusing on training models that deeply understand Reddit's unique culture, language, and community structures. You will join a team of distinguished engineers and safety experts, contributing to the core 'engine room' of Reddit's AI future. These foundational models will power critical functions like Safety & Moderation, Search, Ads, and the next generation of user products.
About the Role: Senior Research Engineer, Post-training & Evaluation
As a Senior Research Engineer, Post-training & Evaluation, you will be instrumental in owning the critical 'feedback loop' of our model development process. While the pre-training team builds the base models, your role will involve architecting the comprehensive evaluation suites and fine-tuning pipelines that ensure these models are safe, intelligent, and truly 'Reddit-native.' You will establish the 'Reddit Benchmark' – our internal standard for model quality – and implement Supervised Fine-Tuning (SFT) workflows to adapt our models specifically for Safety and Moderation tasks.
Responsibilities
- Architect and maintain the 'Reddit Benchmark' evaluation suite: A comprehensive system for rigorously testing model capabilities across Safety, Reasoning, and Reddit-specific knowledge (slang, norms).
- Build scalable SFT (Supervised Fine-Tuning) pipelines: Implement efficient, distributed training loops for instruction tuning, transforming raw base models into helpful assistants.
- Develop Model-as-a-Judge systems: Engineer automated evaluation pipelines utilizing strong external models (e.g., GPT-5, Nova, Claude) to grade internal model outputs, enabling rapid iteration cycles.
- Execute Synthetic Data generation strategies: Create and curate high-quality instruction sets to enhance model generalization, especially where human data is limited.
- Collaborate with Safety Engineering: Translate high-level safety policies into concrete evaluation metrics and unit tests integrated into our CI/CD pipelines.
- Debug post-training instability: Investigate loss curves and evaluation logs to pinpoint instances where fine-tuning may cause alignment tax or capability degradation.
Required Qualifications
- 4+ years of professional experience in machine learning engineering, with a strong focus on LLM fine-tuning or evaluation.
- Fluency in Python and PyTorch, including hands-on experience with libraries such as Hugging Face Transformers, vLLM, or lm-eval-harness.
- Deep understanding of Instruction Tuning (SFT) and the impact of data quality on model behavior.
- Proven experience building Evaluation Pipelines: Demonstrating knowledge of benchmarks like MMLU, GSM8K, and the ability to construct custom domain-specific benchmarks.
- Familiarity with distributed training techniques (FSDP/DeepSpeed) for fine-tuning jobs.
- Strong data engineering skills for curating and cleaning instruction datasets.
Nice To Have
- Experience with MLFlow, Weights & Biases, or other experiment tracking tools.
- Experience with Synthetic Data generation methodologies (e.g., Self-Instruct papers).
Benefits
Reddit, Inc. offers comprehensive benefits including Healthcare Benefits and Income Replacement Programs, 401k with Employer Match, Global Benefit programs (workspace, professional development, caregiving support), Family Planning Support, Gender-Affirming Care, Mental Health & Coaching Benefits, Flexible Vacation & Paid Volunteer Time Off, and Generous Paid Parental Leave.
Key skills/competency
- LLM Fine-tuning
- Model Evaluation
- PyTorch
- Python
- Hugging Face Transformers
- Distributed Training
- Data Engineering
- Instruction Tuning (SFT)
- AI/Machine Learning
- Synthetic Data Generation
How to Get Hired at Reddit, Inc.
- Research Reddit, Inc.'s culture: Study their mission, values, recent AI initiatives, and employee testimonials on LinkedIn and Glassdoor to understand their community-centric approach.
- Tailor your resume strategically: Highlight your extensive experience in LLM fine-tuning, evaluation pipeline architecture, PyTorch proficiency, and strong data engineering skills, specifically for the Senior Research Engineer, Post-training & Evaluation role.
- Showcase relevant projects effectively: Prepare to discuss in detail projects involving custom domain-specific benchmarks, scalable SFT pipelines, or debugging post-training instability, demonstrating your practical expertise.
- Prepare for in-depth technical discussions: Expect rigorous questions on distributed training techniques, instruction tuning impact, model safety evaluation, and handling data quality challenges.
- Articulate your impact vision: Clearly connect your skills to enhancing Reddit's Safety & Moderation, Search, and next-gen user products, aligning with the company's AI future.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background