6 days ago

AI Inference Intern

Perplexity

On Site
Full Time
$40,000
London, England, United Kingdom
Apply

Job Overview

Job TitleAI Inference Intern
Job TypeFull Time
Offered Salary$40,000
LocationLondon, England, United Kingdom

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

UK Internship Program - AI Inference

Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program where you will work directly with our AI Inference team. This program offers a unique opportunity to gain valuable experience in a rapidly growing AI startup. Outstanding performers might be offered a full-time position at the end of the program.

About the AI Inference Team

Our AI Inference team is responsible for running the models behind the Perplexity products. The team maintains the inference engine and deployments behind models ranging from single-node embeddings to distributed sparse Mixture-of-Experts models, maintaining large GPU clusters. With a keen focus on latency and throughput, the Inference team is responsible for the entire serving stack, from GPU kernels to networking and monitoring infrastructure.

Responsibilities

  • Work with the inference team to improve serving latency and throughput.
  • Bring up support for new models and state-of-the-art inference optimizations or quantization schemes.
  • Optimize inference across the entire stack, from GPU kernels to serving endpoints.

Qualifications

  • Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc).
  • Pursuing a Master's or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems).
  • Experience with ML frameworks (Torch, JAX).
  • Experience with GPU programming (CUDA, Triton).
  • Experience with High-Performance Computing (OpenMPI).

Schedule

Internship program: 13 weeks, full-time or part-time, in-person in London office (hybrid schedule: 3 days from the office, 2 days WFH).

Interview Process

  • Fill out the application on Perplexity website.
  • If selected, People Ops and technical interviews will be involved.
  • Offer: We’re impressed! We’d love to welcome you to our Internship program!
  • Start: We have a desk waiting for you in our London office!

Key skills/competency

  • AI Inference
  • Machine Learning
  • GPU Programming
  • High-Performance Computing
  • Systems Programming
  • Serving Latency
  • Throughput Optimization
  • Distributed Systems
  • Computer Science
  • Engineering

Tags:

AI Internship
Inference Engineering
Machine Learning Intern
Computer Science Internship
PhD Internship
Master's Internship
UK Tech Jobs
HPC
GPU Computing
Systems Programming
Perplexity
AI Startup

Share Job:

How to Get Hired at Perplexity

  • Apply early: Submit your application via the Perplexity website promptly for consideration for the 2025-2026 academic year.
  • Tailor your resume: Highlight your Master's or PhD focus in Computer Science/Engineering and performance-related subjects like HPC, Compilers, or Distributed Systems. Emphasize experience with ML frameworks and GPU programming.
  • Prepare for technical interviews: Be ready to discuss your knowledge of multi-threaded programming, networking, compilation, systems programming, and demonstrate experience with ML frameworks (Torch, JAX) and GPU programming (CUDA, Triton).
  • Showcase your potential: Clearly articulate how your skills in AI inference and optimization align with Perplexity's needs to impress during People Ops and technical interviews.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background