AI Training Infrastructure Engineer
Dex
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
AI Training Infrastructure Engineer
This role is with one of Dex’s trusted Partner companies. We work closely with their teams to truly understand their culture, goals, and what they’re looking for, so we can match you with the right opportunity and give you context about the role before you commit to a process.
If you're interested, sign up to Dex to apply.
Dex is an AI recruiter agent that helps you run your job search. Tell Dex your stack, seniority, and what you want to build. We will manage your applications and surface other opportunities that are a fit.
About the Company
A well-funded generative AI company building foundational models that create high-quality sound, speech, and music directly from video.
Their technology enables creators, platforms, and gaming companies to generate realistic, synchronised audio for previously silent or dynamically produced video content. Backed by leading global investors, the company is scaling rapidly across engineering and product.
The Opportunity
This role sits at the core of the model training stack. You will design and optimise the infrastructure that enables large-scale training of generative audio and video models. The focus is on GPU-level performance, distributed systems, and building scalable pipelines that allow researchers to iterate efficiently.
Your work will directly influence training throughput, cost efficiency, and model performance.
What You’ll Do
- Design and optimise distributed training strategies across varying model sizes and compute constraints.
- Profile and debug GPU workloads to improve utilisation and throughput.
- Improve end-to-end training pipelines, including data loading, distributed execution, checkpointing, and logging.
- Architect and maintain scalable ML training clusters (SLURM-based).
- Implement experiment tracking, model versioning, and reproducibility systems.
- Optimise PyTorch code and inference pathways for performance and efficiency.
What They’re Looking For
- Strong hands-on experience optimising large-scale training workloads.
- Deep understanding of GPU architecture, memory hierarchies, and performance bottlenecks.
- Experience balancing compute-bound vs memory-bound workloads.
- Expertise in distributed training and parallelism strategies.
- Strong systems thinking across data pipelines, storage, and cluster orchestration.
Nice to have:
- Experience implementing custom GPU kernels.
- Familiarity with diffusion or autoregressive models.
- Experience managing SLURM clusters at scale.
- Knowledge of high-performance storage systems for ML workloads.
Why It’s Compelling
- Foundational role shaping the infrastructure behind next-generation generative models.
- Significant autonomy and technical ownership.
- Backed by top-tier investors with strong early traction.
- Competitive compensation (€150k–€200k + equity).
- Remote-first with periodic collaboration in Berlin.
Key skills/competency
- Infrastructure Design
- AI Training
- GPU Optimization
- Distributed Systems
- ML Pipelines
- PyTorch
- SLURM
- Generative Models
- Performance Tuning
- Scalability
How to Get Hired at Dex
- Research Dex's partner company: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight expertise in large-scale ML infrastructure, GPU optimization, and distributed systems.
- Showcase relevant projects: Provide examples of work with PyTorch, SLURM, and high-performance computing in AI.
- Prepare for technical deep-dives: Expect questions on GPU architecture, distributed training strategies, and debugging complex workloads.
- Demonstrate systems thinking: Articulate how you approach end-to-end training pipelines and cluster orchestration.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background