Software Engineering Intern, Cloud Inference
Modular
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Modular
At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges.
If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value.
About The Role
Our Cloud Inference team focuses on building a platform to serve massive foundation models with high throughput, low latency, and maximum efficiency across diverse hardwares (NVIDIA, AMD, TPUs, and more). Our goal is to make inference not only the fastest and most scalable, but also the simplest to deploy and operate.
Location
Candidates based in the United States are welcome to apply. To support growth and collaboration, all interns will work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates.
What You Will Do
As a Software Engineering Intern, Cloud Inference, you’ll contribute directly to the core components of Mammoth. You’ll work alongside engineers designing large-scale distributed systems to deploy and scale foundation models with state-of-the-art performance.
Depending on your interests and skills, you may work on:
- Efficient serving – designing high-throughput, low-latency inference services, with features such as KV-aware routing and disaggregated inference.
- KV-cache optimizations – developing distributed KV-cache manager, KV-cache offloading, and other optimizations needed to improve cache utilization.
- Large-model inference – solving challenges in running large frontier models (e.g., DeepSeek R1) across multiple nodes.
- Scalable deployments – extending Kubernetes APIs and building controllers to support multi-model, multi-node, and multi-cluster deployments.
What You’ll Gain
- A chance to help build an inference platform from the ground up, while leveraging cutting edge optimizations needed to have state-of-the-art performance.
- Experience building a cloud inference platform from the ground up with cutting-edge optimizations.
- Hands on experience with large-scale AI infrastructure and model serving systems.
- Mentorship from engineers who have built AI systems at leading companies like Google, Meta and NVIDIA, and are now rebuilding the whole AI stack from the ground up.
- The opportunity to work on real production challenges in distributed inference with immediate impact.
What You Bring To The Table (Required)
- Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, Mathematics, or related field.
- Strong programming skills in any programming language.
- Interest in distributed systems, cloud infrastructure, or machine learning systems.
- Curiosity, problem-solving mindset, and ability to learn quickly in a fast-moving environment.
Helpful, But Not Required
- Familiarity with Kubernetes and cloud-native technologies.
- Strong programming skills in Go.
- Experience building efficient, scalable distributed systems.
- Understanding of LLMs and common serving optimizations.
What Modular Brings To The Table
- Amazing Team. We are a progressive and agile team with some of the industry’s best engineering and product leaders.
- Competitive Compensation. We offer very strong compensation packages, including stock options. We want people to be focused on their best work and believe in tailoring compensation plans to meet the needs of our workforce.
- Team Building Events. We organize regular team onsites and local meetups in Los Altos, CA.
Working at Modular will enable you to grow quickly as you work alongside incredibly motivated and talented people who have high standards, possess a growth mindset, and a purpose to truly change the world.
Compensation
The estimated base hourly range for this role is $47.00 - $57.00 USD.
The hourly rate for the successful applicant will depend on a variety of permissible, non-discriminatory job-related factors, which include but are not limited to education, training, work experience, business needs, or market demands. This range may be modified in the future.
For candidates who fall outside of the listed requirements, we nevertheless encourage you to apply as we may have openings that are lower/higher level than the ones advertised.
Key skills/competency
- Cloud Inference
- Distributed Systems
- AI Infrastructure
- Machine Learning Systems
- Kubernetes
- Large Language Models (LLMs)
- Performance Optimization
- Go Programming
- System Design
- Cloud-native Technologies
How to Get Hired at Modular
- Research Modular's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight programming skills, distributed systems, and machine learning interests.
- Showcase relevant projects: Emphasize experience with AI infrastructure or cloud deployments.
- Prepare for technical challenges: Focus on algorithms, data structures, and system design principles.
- Demonstrate curiosity and problem-solving: Be ready to discuss how you learn and tackle new problems.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background