3 days ago

Software Engineering Intern, Cloud Inference

Modular

On Site

Intern

$108,160

Los Altos, CA

Job Overview

Job TitleSoftware Engineering Intern, Cloud Inference

Job TypeIntern

CategoryCommerce

Experience5 Years

DegreeMaster

Offered Salary$108,160

LocationLos Altos, CA

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Modular

At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges.

If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value.

About The Role

Our Cloud Inference team focuses on building a platform to serve massive foundation models with high throughput, low latency, and maximum efficiency across diverse hardwares (NVIDIA, AMD, TPUs, and more). Our goal is to make inference not only the fastest and most scalable, but also the simplest to deploy and operate.

Location

Candidates based in the United States are welcome to apply. To support growth and collaboration, all interns will work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates.

What You Will Do

As a Software Engineering Intern, Cloud Inference, you’ll contribute directly to the core components of Mammoth. You’ll work alongside engineers designing large-scale distributed systems to deploy and scale foundation models with state-of-the-art performance.

Depending on your interests and skills, you may work on:

Efficient serving – designing high-throughput, low-latency inference services, with features such as KV-aware routing and disaggregated inference.
KV-cache optimizations – developing distributed KV-cache manager, KV-cache offloading, and other optimizations needed to improve cache utilization.
Large-model inference – solving challenges in running large frontier models (e.g., DeepSeek R1) across multiple nodes.
Scalable deployments – extending Kubernetes APIs and building controllers to support multi-model, multi-node, and multi-cluster deployments.

What You’ll Gain

A chance to help build an inference platform from the ground up, while leveraging cutting edge optimizations needed to have state-of-the-art performance.
Experience building a cloud inference platform from the ground up with cutting-edge optimizations.
Hands on experience with large-scale AI infrastructure and model serving systems.
Mentorship from engineers who have built AI systems at leading companies like Google, Meta and NVIDIA, and are now rebuilding the whole AI stack from the ground up.
The opportunity to work on real production challenges in distributed inference with immediate impact.

What You Bring To The Table (Required)

Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, Mathematics, or related field.
Strong programming skills in any programming language.
Interest in distributed systems, cloud infrastructure, or machine learning systems.
Curiosity, problem-solving mindset, and ability to learn quickly in a fast-moving environment.

Helpful, But Not Required

Familiarity with Kubernetes and cloud-native technologies.
Strong programming skills in Go.
Experience building efficient, scalable distributed systems.
Understanding of LLMs and common serving optimizations.

What Modular Brings To The Table

Amazing Team. We are a progressive and agile team with some of the industry’s best engineering and product leaders.
Competitive Compensation. We offer very strong compensation packages, including stock options. We want people to be focused on their best work and believe in tailoring compensation plans to meet the needs of our workforce.
Team Building Events. We organize regular team onsites and local meetups in Los Altos, CA.

Working at Modular will enable you to grow quickly as you work alongside incredibly motivated and talented people who have high standards, possess a growth mindset, and a purpose to truly change the world.

Compensation

The estimated base hourly range for this role is $47.00 - $57.00 USD.

The hourly rate for the successful applicant will depend on a variety of permissible, non-discriminatory job-related factors, which include but are not limited to education, training, work experience, business needs, or market demands. This range may be modified in the future.

For candidates who fall outside of the listed requirements, we nevertheless encourage you to apply as we may have openings that are lower/higher level than the ones advertised.

Key skills/competency

Cloud Inference
Distributed Systems
AI Infrastructure
Machine Learning Systems
Kubernetes
Large Language Models (LLMs)
Performance Optimization
Go Programming
System Design
Cloud-native Technologies

Tags:

Software Engineering Intern

Cloud Inference

Distributed Systems

Large Models

AI Infrastructure

Scalable Deployments

KV-Cache Optimization

Performance Tuning

Machine Learning Systems

Kubernetes

Software Development

Python

C++

TensorFlow

PyTorch

Cloud

NVIDIA

AMD

TPUs

APIs

How to Get Hired at Modular

Research Modular's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
Tailor your resume: Highlight programming skills, distributed systems, and machine learning interests.
Showcase relevant projects: Emphasize experience with AI infrastructure or cloud deployments.
Prepare for technical challenges: Focus on algorithms, data structures, and system design principles.
Demonstrate curiosity and problem-solving: Be ready to discuss how you learn and tackle new problems.

Frequently Asked Questions

Find answers to common questions about this job opportunity

01What makes the Software Engineering Intern, Cloud Inference role at Modular unique?

02What specific technologies will I gain experience with as a Cloud Inference Intern at Modular?

03Is relocation assistance provided for the hybrid Software Engineering Intern position at Modular?

04What kind of mentorship can I expect in the Software Engineering Intern role at Modular?

05What are the key programming skills Modular looks for in a Cloud Inference intern?

Explore similar opportunities that match your background