Question 1

What is the primary focus of the Systems Software Engineer, AI Infrastructure role at NVIDIA?

Accepted Answer

This role is primarily focused on developing and maintaining large-scale systems crucial for frontier AI model training, ensuring their reliability, operability, and scalability across global public and private cloud environments. You'll be instrumental in shaping NVIDIA's AI infrastructure.

Question 2

What technical skills are essential for a Systems Software Engineer, AI Infrastructure at NVIDIA?

Accepted Answer

Essential skills include strong proficiency in Python and another language like C/C++ or Go, expertise in Linux/Windows systems engineering, and experience with major cloud platforms (AWS, Azure, GCP, OCI). A deep understanding of SRE principles and experience with observability tools like ELK or Prometheus is also critical.

Question 3

How important is prior SRE experience for this NVIDIA Systems Software Engineer role?

Accepted Answer

SRE principles are fundamental to this role. You'll be expected to implement incident management, monitoring, performance optimization, and automation. A strong grasp of concepts like error budgets, SLOs, and Infrastructure as Code (Terraform CDK) is explicitly required.

Question 4

What kind of systems will I be working on as a Systems Software Engineer, AI Infrastructure?

Accepted Answer

You will be working on systems that support critical use-cases such as frontier model training for AI, HPC, and GPU training workflows. This includes building tools for observability, defining reliability metrics, and ensuring high availability and performance of distributed systems.

Question 5

Does NVIDIA require experience with specific deep learning frameworks for this AI Infrastructure position?

Accepted Answer

While not strictly required, experience with deep learning frameworks such as PyTorch, TensorFlow, JAX, or Ray is listed as a strong 'way to stand out from the crowd'. Demonstrating this knowledge would be a significant advantage in your application for the Systems Software Engineer, AI Infrastructure role.

This job post expired on March 26, 2026

Systems Software Engineer, AI Infrastructure

NVIDIA

Job Overview

Who's the hiring manager?

Job Description

Overview

What You Will Be Doing

What We Need To See

Ways To Stand Out From The Crowd

Key skills/competency

Tags:

How to Get Hired at NVIDIA

Frequently Asked Questions