
ML/AI Software Engineer - Triton GPU Kernel Optimization
AMD · Belgrade, Serbia
This listing has closed — view similar roles below.
- On site
- Full-time
- $120,000 / year
- Belgrade, Serbia
Job highlights
- Develop Triton GPU kernels for ML workloads.
- Optimize ML models for AMD Radeon/Ryzen.
- Analyze and debug GPU performance bottlenecks.
- Integrate kernels with PyTorch and JAX.
- Contribute to open-source Triton and ROCm.
About the role
About AMD and the Role
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. We are looking for an experienced ML/AI software engineer with deep expertise in GPU kernel development and building high‑performance primitives for training and inference. You’ll design, implement, and optimize custom Triton kernels for core ML workloads integrate them into our frameworks and services, and drive end‑to‑end performance on AMD ROCm platform across AMD Radeon and Ryzen product families.Key Responsibilities
- Design, implement, and maintain Triton GPU kernels for state‑of‑the‑art ML workloads, with a focus on fusion, tiling, vectorization, and memory‑hierarchy optimization for AMD RDNA GPU architectures.
- Analyze performance using profiling tools; identify bottlenecks in memory hierarchy, tensor core/matrix core utilization, warp/wavefront scheduling, and synchronization.
- Optimize kernels across problem shapes, batch sizes, and sequence lengths; implement autotuning strategies and performance heuristics.
- Integrate custom kernels with PyTorch (torch.compile, torch._inductor), JAX, or other frameworks; develop Python/C++ bindings and align with runtime/graph compilers.
- Track and contribute to upstream Triton and related compiler ecosystems; propose enhancements aligned with our workloads.
Preferred Experience
- Triton GPU kernel development and optimization experience
- CUDA or HIP kernel development experience (porting, performance tuning, and feature enablement)
- Experience optimizing kernels on AMD GPUs and familiarity with the ROCm software stack (HIP runtime, rocBLAS, MIOpen, rocWMMA, RCCL, etc.) is a plus
- Familiarity with compiler internals and IRs (LLVM, MLIR) and codegen for GPUs
- Strong background in linear algebra, convolution algorithms, attention mechanisms, or other core ML primitives
- Experience integrating kernels with PyTorch, JAX, or TensorFlow, including custom ops/extensions
- Knowledge of quantization (INT8/FP8), mixed precision, custom dtypes, and numerics (stability, error analysis).
- Experience with LLM, diffusion and MoE workloads: FlashAttention‑style kernels, paged attention, grouped‑query attention, rotary embeddings, fused MLPs.
- Contributions to open-source projects in Triton, ROCm, or related GPU/ML ecosystems are a plus
Academic Credentials
Bachelor’s, Master, or PhD in Computer Science, Electrical Engineering or relevant fields.Key skills/competency
- ML AI Software Engineer
- Triton GPU Kernel Optimization
- C++ AI Development
- GPU Programming
- Performance Analysis
- Compiler Internals
- PyTorch Integration
- AMD ROCm Platform
- Linear Algebra
- Large Language Models (LLM)
Skills & topics
- ML Software Engineer
- AI Software Engineer
- Triton GPU Kernel Optimization
- GPU Programming
- CUDA
- HIP
- C++
- Machine Learning
- Deep Learning
- Performance Optimization
- ROCm
- PyTorch
- JAX
- Embedded Systems
- Data Centers
- RDNA Architecture
How to get hired
- Tailor your resume: Highlight Triton, CUDA/HIP, C++, and ML kernel optimization experience. Quantify achievements with performance gains.
- Showcase projects: Detail any contributions to open-source Triton, ROCm, or related ML/GPU ecosystems.
- Prepare for technical interviews: Expect deep dives into GPU architecture, kernel optimization techniques, and ML algorithms.
- Understand AMD's culture: Research AMD's focus on innovation, collaboration, and customer solutions.
- Apply strategically: Clearly state how your skills align with the ML/AI Software Engineer role.
Technical preparation
Master Triton, CUDA, or HIP kernel programming.,Deepen C++ and Python for ML development.,Practice ML algorithm and linear algebra concepts.,Familiarize with ROCm and GPU profiling tools.
Behavioral questions
Describe a complex performance bottleneck you solved.,How do you approach technical leadership in projects?,Share an experience collaborating on innovative ideas.,How do you stay updated with AI and GPU trends?
Frequently asked questions
- What specific ML workloads are prioritized for Triton kernel optimization at AMD?
- AMD prioritizes optimization for state-of-the-art ML workloads, including LLMs, diffusion models, and MoE workloads. This encompasses areas like FlashAttention-style kernels, paged attention, grouped-query attention, rotary embeddings, and fused MLPs, all aiming to leverage AMD's RDNA GPU architectures effectively.
- How does AMD support contributions to open-source projects like Triton and ROCm for this ML AI Software Engineer role?
- AMD encourages and values contributions to open-source projects. For this ML/AI Software Engineer role, contributing to upstream Triton and related compiler ecosystems is expected, with opportunities to propose enhancements and actively participate in community development.
- What is the expected level of experience with AMD's ROCm software stack for this position?
- While experience optimizing kernels on AMD GPUs and familiarity with the ROCm software stack (HIP runtime, rocBLAS, MIOpen, rocWMMA, RCCL, etc.) is considered a plus, a strong foundation in GPU programming (Triton, CUDA, HIP) and ML workloads is essential. The role offers opportunities to deepen ROCm expertise.
- Are there opportunities for career growth in AI and GPU computing at AMD for this role?
- Yes, AMD emphasizes career advancement. Joining AMD means shaping the future of AI and beyond, with opportunities to advance your career by working on cutting-edge technologies and solving important challenges in AI and high-performance computing.
- What academic background is typically required for the ML/AI Software Engineer role?
- The preferred academic credentials for this ML/AI Software Engineer position include a Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or a closely related technical field. Strong practical experience in C++ AI development and GPU programming is also highly valued.
- How does AMD use AI in its hiring process for roles like ML AI Software Engineer?
- AMD may utilize Artificial Intelligence to assist in screening, assessing, or selecting candidates for positions like the ML/AI Software Engineer. Candidates can refer to AMD’s 'Responsible AI Policy' for more information on their AI usage in recruitment.