Deep Learning Scientist LLM Training Datasets
@ NVIDIA

Santa Clara, CA
$200,000
On Site
Full Time
Posted 23 days ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXXX XXXXXXXXXXXXX XXXXXXX****** @nvidia.com
Recommended after applying

Job Details

Overview

NVIDIA is seeking a dedicated Deep Learning Scientist LLM Training Datasets. You will engineer innovative data solutions to support LLM pre-training and post-training operations.

What You'll Be Doing

You will develop datasets for LLM pre-training, fine-tuning, and reinforcement learning. Responsibilities include designing data strategies, optimizing models, and evaluating performance.

  • Develop pre-training and fine-tuning datasets.
  • Design and implement data collection, cleaning, and augmentation routines.
  • Generate synthetic data and curate high-quality labeled datasets.
  • Implement post-training tasks including fine-tuning and RL.
  • Collaborate with ML researchers, data scientists, and infrastructure teams.

What We Need To See

Applicants should have a Master’s or PhD (or equivalent experience) in a relevant field along with 3+ years of experience in dataset development and large language models training. Proficiency in Python, machine learning libraries and frameworks such as PyTorch or TensorFlow Data is essential.

Ways To Stand Out From The Crowd

Candidates with a record of open-source contributions, research publications, and familiarity with cloud platforms are highly desirable.

Compensation and Benefits

Competitive salaries, equity, and a comprehensive benefits package are offered. Salary ranges vary by level with additional perks based on location and experience.

Key Skills/Competency

  • Deep learning
  • LLM training
  • Data engineering
  • Python
  • Machine learning
  • Data augmentation
  • Synthetic data
  • RL and SFT
  • Data curation
  • Collaboration

How to Get Hired at NVIDIA

🎯 Tips for Getting Hired

  • Customize your resume: Highlight relevant deep learning and data projects.
  • Showcase technical skills: Emphasize Python, PyTorch, and TensorFlow experience.
  • Research NVIDIA: Review company culture and technical achievements.
  • Prepare for interviews: Be ready with practical examples and case studies.

📝 Interview Preparation Advice

Technical Preparation

Review Python and ML libraries documentation.
Practice dataset engineering techniques.
Study LLM training and fine-tuning methods.
Familiarize with synthetic and multi-modal data.

Behavioral Questions

Describe past collaboration experiences.
Explain problem-solving challenges faced.
Discuss managing project deadlines.
Share teamwork examples in technical projects.

Frequently Asked Questions