Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Role
NVIDIA is at the forefront of the AI revolution, and our research is shaping the future of large language models. We are looking for a Senior Scientist, Synthetic Data and Privacy to join our team and help advance our capabilities in generating synthetic datasets and privacy-preserving AI. You will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation at scale while ensuring data privacy. This role combines hands-on software engineering with research in privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.
What You'll Be Doing
- Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks.
- Research and implement privacy-preserving techniques such as differentially private training (DP-SGD), identifying and replacing sensitive information via NER models, and membership inference protection.
- Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns.
- Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows.
- Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.
- Mentor interns and junior researchers to develop technical growth within the team.
What We Need To See
- PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.
- A research background of 5+ years in synthetic data generation, data privacy, or related areas such as differential privacy, federated learning, or trustworthy machine learning is required. Comparable experience is also considered.
- Proven track record of developing or maintaining software libraries used by a broad developer community.
- Deep technical understanding of PyTorch and the HuggingFace Transformers ecosystem including PEFT and LoRA.
- Technical familiarity with LLM inference frameworks such as vLLM or TGI.
- Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.
Ways To Stand Out From The Crowd
- Active contributions to open-source projects, particularly in ML, security, or privacy domains.
- Specialized expertise with differential privacy concepts and tools such as Opacus.
- Ability to build and optimize scalable data processing pipelines for large-scale models.
- Proficiency with NER-based PII detection and advanced anonymization techniques.
- Functional knowledge of global privacy regulations such as GDPR or CCPA.
Key skills/competency
- Synthetic Data Generation
- Privacy-Preserving AI
- Large Language Models (LLMs)
- Differential Privacy
- PyTorch
- HuggingFace Transformers
- Open-Source Development
- Research Publication
- Software Engineering
- NER Models
How to Get Hired at NVIDIA
- Research NVIDIA's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume for NVIDIA: Highlight synthetic data generation, privacy-preserving AI, and LLM expertise.
- Master technical interview skills: Practice PyTorch, HuggingFace Transformers, and differential privacy concepts.
- Showcase open-source contributions: Emphasize your involvement in ML, security, or privacy projects.
- Network within NVIDIA: Connect with current employees in research and engineering for insights.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background