4 days ago

Senior HPC Engineer

NVIDIA

Hybrid
Full Time
$200,000
Hybrid

Job Overview

Job TitleSenior HPC Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$200,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About NVIDIA

NVIDIA has spent over 25 years at the forefront of computer graphics, PC gaming, and accelerated computing, a legacy built on innovation and exceptional talent. Today, NVIDIA is harnessing the immense power of AI to forge the next computing era, where GPUs serve as the intelligence for advanced computers, robots, and autonomous vehicles. Joining NVIDIA means contributing to groundbreaking work in a diverse, supportive environment that inspires the best.

The Opportunity: Senior HPC Engineer

NVIDIA is seeking a Senior AI/HPC Engineer to enhance its infrastructure Specialist team. This role offers the chance to build and manage some of the world's largest and fastest AI/HPC systems. You will engage with academic and commercial customers, partners, and internal teams to analyze, define, and execute large-scale AI/HPC projects. This encompasses Networking, System Design, Automation, and serving as a primary customer interface.

What You Will Be Doing

  • Deploying, managing, and maintaining AI/HPC infrastructure within Linux-based environments for new and existing customers.
  • Serving as the domain expert for customers throughout the planning and implementation phases.
  • Preparing handover documentation and conducting knowledge transfers to support customers in deploying sophisticated systems.
  • Providing crucial feedback to internal teams by reporting bugs, documenting workarounds, and suggesting system improvements.

What We Need To See

  • Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
  • At least 4 years of experience in providing in-depth support, deployment services, and troubleshooting for hardware and software products.
  • Extensive knowledge and experience with AI Factory / HPC concepts, including Linux System Administration, process management, task scheduling, kernel management, boot procedures, troubleshooting, performance reporting, and optimization.
  • Proficiency in cluster management technologies.
  • Strong scripting capabilities.
  • Excellent interpersonal skills, with the ability to effectively resolve customer-blocking issues.
  • Superior verbal and written English communication skills.
  • Robust organizational skills and the ability to prioritize and multitask effectively with minimal supervision.
  • Experience with job schedulers such as SLURM, LSF, or PBS.

Ways To Stand Out From The Crowd

  • Industry-standard Linux certifications.
  • Experience with InfiniBand or Ethernet networking.
  • Hands-on experience with GPU-focused hardware/software.
  • Familiarity with Message Passing Interface (MPI).
  • Background in automation tooling (e.g., Ansible, Salt, Puppet).

Why NVIDIA?

NVIDIA is renowned as a top technology employer, attracting the most forward-thinking and talented individuals globally. We are at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our commitment extends to offering highly competitive salaries, a comprehensive benefits package, and a work environment that champions diversity, inclusion, and flexibility. Join our team of driven, innovative professionals dedicated to pushing technological boundaries.

Key skills/competency

  • AI/HPC Infrastructure
  • Linux System Administration
  • Cluster Management
  • Scripting
  • Troubleshooting
  • Networking (InfiniBand/Ethernet)
  • GPU Hardware/Software
  • Job Schedulers (SLURM, LSF, PBS)
  • Automation Tools (Ansible, Salt, Puppet)
  • Customer Support

Tags:

Senior HPC Engineer
AI/HPC Infrastructure
Linux Administration
Cluster Management
Scripting
Networking
GPU
SLURM
Ansible
Troubleshooting
System Design
Performance Optimization

Share Job:

How to Get Hired at NVIDIA

  • Research NVIDIA's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their innovation-driven environment.
  • Tailor your resume for HPC: Highlight extensive experience in AI/HPC infrastructure, Linux system administration, cluster management, and scripting relevant to NVIDIA's core technologies.
  • Showcase technical expertise: Emphasize your proficiency with schedulers (SLURM, LSF), networking (InfiniBand, Ethernet), and GPU-focused hardware/software.
  • Prepare for technical challenges: Practice problem-solving scenarios related to large-scale system deployment, performance optimization, and complex troubleshooting in Linux environments.
  • Demonstrate strong communication: Be ready to discuss your experience in customer-facing roles, explaining complex technical concepts clearly and providing effective support.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background