Question 1

What specific experience does NVIDIA look for in a Senior System Architect for Infrastructure Reliability?

Accepted Answer

NVIDIA seeks candidates with a BS/MS/PhD in Computer Science or Electrical Engineering, plus 6+ years in systems programming. Key areas include mastery of distributed systems, automated RCA pipelines, deep knowledge of CPU architecture (x86/ARM) metrics, strong C++ and Python skills, and familiarity with cluster resource managers like Slurm or Kubernetes.

Question 2

How does NVIDIA handle salary for this Senior System Architect role?

Accepted Answer

Salary is determined by location, experience, and peer compensation. For this role, base salary ranges are $184,000-$287,500 USD for Level 4 and $224,000-$356,500 USD for Level 5. Eligibility for equity and benefits is also included.

Question 3

What are the key technical skills that will make a candidate stand out for the Senior System Architect position at NVIDIA?

Accepted Answer

Standout candidates will have expert knowledge of the Linux kernel and its error-reporting interfaces, deep experience with NVIDIA's DCGM and NVML for GPU monitoring, familiarity with non-intrusive monitoring tools, and experience with checkpoint/restore technologies like CRIU.

Question 4

How important is experience with specific cluster resource managers for the Senior System Architect role at NVIDIA?

Accepted Answer

Familiarity with cluster resource managers such as Slurm, LSF, or Kubernetes is important. Understanding how these systems manage job lifecycles and propagate signals is crucial for building the failure attribution framework at scale.

Question 5

What is the application deadline for the Senior System Architect, Infrastructure Reliability position at NVIDIA?

Accepted Answer

Applications for this Senior System Architect position will be accepted at least until March 1, 2026. This posting is for an existing vacancy.

Question 6

Does NVIDIA use AI in its hiring process for the Senior System Architect role?

Accepted Answer

Yes, NVIDIA utilizes AI tools in its recruiting processes, including for roles like the Senior System Architect, Infrastructure Reliability.

Question 7

What kind of impact will the Senior System Architect have at NVIDIA?

Accepted Answer

The Senior System Architect will develop an automated framework to drastically reduce resource waste caused by job failures at scale, by identifying root causes of failures in real-time and distinguishing between hardware, infrastructure, and software issues.

This job post expired on April 19, 2026

Senior System Architect Infrastructure Reliability

NVIDIA

Job Overview

Who's the hiring manager?

Job Description

Senior System Architect Infrastructure Reliability

What You'll Be Doing

What We Need To See

Ways To Stand Out From The Crowd

Key skills/competency

Tags:

How to Get Hired at NVIDIA

Frequently Asked Questions

This job post expired on April 19, 2026

Senior System Architect Infrastructure Reliability

NVIDIA

Job Overview

Who's the hiring manager?

Job Description

Senior System Architect Infrastructure Reliability

What You'll Be Doing

What We Need To See

Ways To Stand Out From The Crowd

Key skills/competency

Tags:

Share Job:

How to Get Hired at NVIDIA

Frequently Asked Questions