Senior Cloud Services Software Engineer
@ NVIDIA

Hybrid
$356,500
Hybrid
Full Time
Posted 1 day ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXX XXXXXXXXXXXXX XXXXXXXXXX***** @nvidia.com
Recommended after applying

Job Details

Overview

Join NVIDIA's DGX Cloud Team and contribute to the infrastructure powering innovative AI research. As a Senior Cloud Services Software Engineer, you will develop and optimize AI infrastructure services to deliver peak performance and resiliency for DGX Cloud.

Responsibilities

  • Develop solutions integrating machine learning, distributed systems, and HPC.
  • Design and optimize micro-services orchestrated by Kubernetes for large-scale AI workflows.
  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks.
  • Create abstractions for long-running training jobs with auto-restart capabilities.
  • Develop modular services deployable on on-premises AI clusters.

Requirements

A Bachelor's degree in Computer Science or related field and at least 12 years of hands-on experience in backend development with languages such as Python, Go, or C/C++. Proven record in building large-scale distributed systems, experience with cloud platforms (AWS, Azure, GCP), container technologies like Docker and Kubernetes, and HPC/AI platforms such as Slurm.

Preferred Qualifications

  • Experience with DL frameworks and orchestrators (PyTorch, TensorFlow, JAX, Ray).
  • Background in framework plugin architectures and cluster scheduler integration.
  • Deep understanding of NVIDIA GPUs, network technologies, and failure patterns.
  • Practical experience with AI models and AI-based tools, plus code contributions.

About NVIDIA

NVIDIA leads groundbreaking developments in AI, HPC, and visualization. Work with world-class engineers to shape the future of technology.

Key Skills/Competency

  • Distributed Systems
  • Backend Development
  • Cloud Computing
  • Kubernetes
  • Python
  • Go
  • Microservices
  • High-Performance Computing
  • API Development
  • AI Infrastructure

How to Get Hired at NVIDIA

🎯 Tips for Getting Hired

  • Research NVIDIA's culture: Review mission, values, and recent projects.
  • Customize your resume: Highlight backend, cloud, and AI experience.
  • Tailor your portfolio: Showcase distributed systems and microservices.
  • Prepare for interviews: Familiarize with Kubernetes and HPC concepts.

📝 Interview Preparation Advice

Technical Preparation

Review Kubernetes deployment and configurations.
Practice coding in Python and Go.
Study distributed system design and resilience.
Explore cloud platform services and API integration.

Behavioral Questions

Describe past team collaboration experiences.
Explain problem-solving in distributed systems.
Detail experiences handling project failures.
Discuss managing high-pressure technical challenges.

Frequently Asked Questions