AI Test Architect
NVIDIA
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Role
The AI Test Architect will join NVIDIA's E2E Verification group to profile innovative large scale distributed training on NVIDIA AI End-to-End solutions in supercomputing clusters. The role involves providing insights on at-scale system design and tuning mechanisms for large compute runs.
What You’ll Be Doing
You will be responsible for profiling, benchmarking, and analyzing deep learning models to identify optimization opportunities, with a special focus on networking. You will collaborate with data scientists, researchers, developers, and automation teams to design and implement scalable training pipelines. Staying updated on deep learning algorithms, NVIDIA GPU technologies and high-performance networking solutions is key. The role also involves optimizing deep learning models for performance, memory usage, power efficiency and addressing networking bottlenecks. Additionally, you will work closely with hardware engineers to integrate efficient networking solutions, exploring technologies such as RDMA and InfiniBand.
- Profile and benchmark deep learning models
- Collaborate with cross-functional teams
- Stay updated with latest deep learning and NVIDIA GPU technologies
- Optimize performance, memory and power usage
- Guide development of high-performance networking solutions
What We Need To See
A B.Sc in Computer Science, Software Engineering or equivalent experience. Candidates should have 8+ years of experience with CUDA programming on deep learning frameworks like TensorFlow and PyTorch, with practical experience in high-performance networking. Strong analytical skills, excellent communication, and a deep understanding of profiling and optimizing deep learning workflows are essential.
Ways To Stand Out
Demonstrated experience in profiling and optimizing large-scale deep learning training, particularly with high-performance networking. Familiarity with distributed deep learning frameworks, NVIDIA's networking technologies (e.g., Mellanox InfiniBand) and optimization of network parameters such as bandwidth and latency will help you excel in this role.
Key skills/competency
- Deep Learning
- CUDA
- Distributed Systems
- Networking
- Benchmarking
- Profiling
- High-performance Computing
- InfiniBand
- RDMA
- Optimization
How to Get Hired at NVIDIA
- Research NVIDIA's culture: Review their innovations and technology focus.
- Customize your resume: Highlight deep learning and networking experience.
- Emphasize technical projects: Include CUDA and profile optimization examples.
- Prepare for interviews: Practice system design and technical problem solving.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background