4 days ago

Senior Solutions Architect, Cloud Infrastructure and DevOps

NVIDIA

Hybrid
Full Time
$200,000
Hybrid

Job Overview

Job TitleSenior Solutions Architect, Cloud Infrastructure and DevOps
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$200,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About the Role

NVIDIA is seeking a Senior Solutions Architect, Cloud Infrastructure and DevOps to join its NVIDIA Infrastructure Specialist Team. Academic and commercial groups worldwide are leveraging NVIDIA products to redefine deep learning, data analytics, and power state-of-the-art data centers. This role offers the opportunity to be involved with the team developing many of the largest and fastest AI/HPC systems globally. We are looking for an individual capable of thriving in a dynamic, customer-focused environment that demands exceptional interpersonal skills. You will interact with customers, partners, and various internal departments to analyze, define, and implement large-scale Networking projects. The scope of these efforts encompasses Networking, System Building, Kubernetes-based platforms, and Automation, while serving as the primary technical interface to the customer.

What You'll Be Doing

  • Maintain large-scale computational and AI infrastructure, with a strong focus on monitoring, logging, and workload orchestration (Kubernetes and Linux job schedulers).
  • Perform end-to-end troubleshooting across the entire stack, from bare metal and operating system, through the software stack, container platform, networking, and storage.
  • Optimize scalable, production-ready Kubernetes-based container platforms integrated with enterprise-grade networking and storage solutions.
  • Serve as a key technical resource, developing, refining, and documenting standard methodologies and operational guidelines for internal teams.
  • Support Research & Development activities and engage in Proof-of-Concept (POCs) and Proof-of-Value (POVs) to validate new features, architectures, and upgrade approaches.
  • Create and deliver high-quality documentation, including runbooks, onboarding materials, and best-practice guides for customers and internal teams.
  • Become the technical leader for assigned customer accounts, providing strategic guidance on DevOps and platform architecture and influencing long-term infrastructure and operations decisions.

What We Need To See

  • BS/MS/PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields, combined with 8+ years of professional experience in managing scalable cloud environments and automation engineering roles.
  • Cloud & HPC Expertise: Proven understanding of networking fundamentals (TCP/IP stack), data center architectures, and hands-on experience managing HPC/AI clusters, including deployment, optimization, and troubleshooting.
  • Kubernetes & AI/ML Workloads: Extensive experience with Kubernetes for container orchestration, resource scheduling, scaling, and integration with HPC environments.
  • Hardware & Software Knowledge: Familiarity with HPC and AI technologies (CPUs, GPUs, high-speed interconnects) and supporting software stacks.
  • Linux & Storage Systems: Deep knowledge of Linux (RedHat/CentOS, Ubuntu), OS-level security, and protocols (TCP, DHCP, DNS). Experience with storage solutions such as Lustre, GPFS, ZFS, XFS, and emerging Kubernetes storage technologies.
  • Automation & Observability: Proficiency in Python and Bash scripting, configuration management, and Infrastructure-as-Code tools (e.g., Ansible, Terraform). Experience with observability stacks (Grafana, Loki, Prometheus) for monitoring, logging, and building fault-tolerant systems.
  • Solution Architecture & Customer Engagement: Strong background in crafting scalable solutions and providing consultative support to customers.

Ways To Stand Out From The Crowd

  • Knowledge of CI/CD pipelines for software deployment and automation.
  • Solid hands-on knowledge of Kubernetes and container-based microservices architectures.
  • Experience with GPU-focused hardware and software (e.g., NVIDIA DGX, CUDA, GPU Operator).
  • Background with RDMA-based fabrics (InfiniBand or RoCE) in HPC or AI environments.

Key skills/competency

  • Cloud Infrastructure
  • DevOps
  • Kubernetes
  • HPC/AI Systems
  • Networking
  • Linux Administration
  • Automation (Python, Bash, Ansible, Terraform)
  • Observability (Grafana, Prometheus)
  • Troubleshooting
  • Customer Engagement

Tags:

Solutions Architect
Cloud
Infrastructure
DevOps
Kubernetes
AI/ML
HPC
Monitoring
Automation
Troubleshooting
Customer Engagement
Linux
Python
Bash
Ansible
Terraform
Grafana
Prometheus
InfiniBand
CUDA

Share Job:

How to Get Hired at NVIDIA

  • Research NVIDIA's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand the company's pioneering spirit in AI and HPC.
  • Tailor your resume for AI/HPC: Highlight your deep experience with cloud infrastructure, DevOps practices, Kubernetes, and managing large-scale AI/HPC environments specifically relevant to NVIDIA's core business.
  • Showcase technical expertise: Prepare to discuss in detail your hands-on proficiency with Linux, networking fundamentals, storage solutions, automation tools like Ansible and Terraform, and observability stacks.
  • Emphasize problem-solving and customer focus: Be ready to share examples of your end-to-end troubleshooting skills and how you've provided strategic technical guidance and built strong customer relationships as a solutions architect.
  • Highlight innovation and continuous learning: Demonstrate your passion for new technologies, engagement in R&D, and ability to adapt to evolving cloud and AI landscapes, aligning with NVIDIA's fast-paced innovation.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background