Deep Learning Performance Software Engineer
NVIDIA
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Role
We are expanding our research and development for deep learning at NVIDIA. We seek excellent Software Engineers and Senior Software Engineers to join our team, specializing in developing GPU-accelerated Deep learning software. Researchers worldwide use NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Your ability to work in a fast-paced, customer-oriented team is required, and excellent communication skills are necessary.
What You’ll Be Doing
- Develop TileGym, Triton CUDA TileIR backend and CUDA Tile
- Develop highly optimized deep learning kernels through tile-based GPU programming model
- End-to-end performance optimization through tile-based GPU programming model
- Do performance optimization, analysis, and tuning
What We Need To See
- Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
- SW Agile skills helpful
- Excellent C/C++ programming and software design skills
- Python experience a plus
- MLIR experience a plus
- AI agent experience a plus
- Performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU
- GPU programming experience (CUDA or OpenCL) desired
- 3 years of relevant work experience
Why NVIDIA?
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and talented people on the planet working for us. If you're creative and autonomous, we want to hear from you!
Key skills/competency
- Deep Learning
- GPU Programming
- CUDA
- C/C++
- Python
- Performance Optimization
- MLIR
- Software Design
- Debugging
- Agile Methodologies
How to Get Hired at NVIDIA
- Research NVIDIA's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume for deep learning roles: Customize it to highlight GPU programming, CUDA, C/C++, and performance optimization expertise.
- Showcase relevant project experience: Detail your contributions to deep learning projects, especially those involving kernel development and performance tuning.
- Prepare for technical deep learning and CUDA questions: Expect rigorous questions on GPU architecture, parallel programming, and optimization techniques.
- Demonstrate problem-solving and communication skills: Be ready to discuss how you troubleshoot complex performance issues and collaborate effectively in a team.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background