
Senior Principal Engineering Manager
Microsoft · Redmond, WA
This listing has closed — view similar roles below.
- On site
- Full-time
- $296,000 / year
- Redmond, WA
Job highlights
- Lead AI infrastructure team building large GPU clusters.
- Manage and grow engineering talent globally.
- Drive execution of complex infrastructure projects.
- Set technical vision for AI research compute.
- Foster operational excellence and innovation.
About the role
Senior Principal Engineering Manager AI Infrastructure
Microsoft Research (MSR) is transforming the future of artificial intelligence (AI) by bridging the gap between cutting-edge general AI and specialized, real-world applications. We are building world-class AI infrastructure that powers our models on large Graphics Processing Unit (GPU) clusters and accelerates our research lifecycle through agentic development.
Our team has a global scope, powering the work of every Microsoft Research lab worldwide. We are seeking a Senior Principal Engineering Manager to lead and grow our team that builds one of the world's largest research GPU clusters. This is a transformational leadership opportunity. You will grow a talented team of engineers, evolving it into a cohesive, high-performing organization that designs, builds, and operates world-class research compute infrastructure at scale. You will set the vision for how the team works, grows, and delivers, while driving the execution rigor needed to ship complex infrastructure reliably in a highly dynamic environment.
If you are passionate about leading teams at the frontier of AI infrastructure and want to shape the future of how research compute is built and operated, we invite you to explore this opportunity. At Microsoft, our mission—to empower every person and every organization on the planet to achieve more—guides how we partner with customers to deliver trusted, impactful solutions. With a growth mindset culture, we innovate responsibly and measure success by shared progress—people, teams, and customers. Join us to do meaningful work that changes the world and helps shape what’s next for everyone.
Responsibilities
- Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure.
- Recruit and develop exceptional engineering talent, building a diverse team, including hiring, onboarding, career development, and performance management.
- Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality.
- Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development.
- Provide technical vision and judgment on the team's architecture, strategy, and roadmap—spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows—while empowering engineers to own deep technical details.
- Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together.
- Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate.
Qualifications
Required Qualifications
- Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Preferred Qualifications
- 5+ years of people management experience leading software engineering teams, including managing principal engineers.
- Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning (ML) workloads.
- Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability.
- Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments.
- Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
- Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
- A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise.
- Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team).
- Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Key skills/competency
- AI Infrastructure
- Engineering Management
- GPU Clusters
- Large-Scale Systems
- Distributed Systems
- Cloud Platforms
- Machine Learning
- High-Performance Computing (HPC)
- Technical Leadership
- Team Growth
Skills & topics
- AI Infrastructure
- Engineering Manager
- GPU Clusters
- Large-Scale Systems
- Distributed Systems
- Cloud Platforms
- Machine Learning
- HPC
- Technical Leadership
- Software Engineering
How to get hired
- Tailor your resume: Highlight leadership in AI infrastructure, GPU clusters, and team management.
- Showcase technical depth: Emphasize experience with distributed systems, cloud platforms, and HPC environments.
- Quantify achievements: Use metrics to demonstrate successful project execution and team growth.
- Prepare for leadership questions: Discuss your experience mentoring engineers and fostering team culture.
- Research Microsoft Research: Understand their mission in AI and agentic development.
Technical preparation
Behavioral questions
Frequently asked questions
- What is the expected salary range for a Senior Principal Engineering Manager at Microsoft Research?
- The typical base pay range for this role across the U.S. is USD $163,000 - $296,400 per year. Specific locations like the San Francisco Bay area and New York City metropolitan area have a different range, USD $220,800 - $331,200 per year. This range can vary based on location and other factors.
- What kind of AI infrastructure does Microsoft Research build?
- Microsoft Research builds world-class AI infrastructure, including large GPU clusters, high-performance networking, workload optimization tools, researcher interfaces, and agentic development workflows to accelerate AI research.
- What are the key technical skills for this Senior Principal Engineering Manager role at Microsoft?
- Key technical skills include experience with large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), High-Performance Compute (HPC) environments, GPU programming (CUDA, NCCL), and frameworks like PyTorch. Expertise in networking, storage, or distributed training is also valuable.
- Does Microsoft Research offer opportunities for career growth for engineering managers?
- Yes, Microsoft Research emphasizes growth mindset culture. This role involves mentoring and developing engineering talent, including career development and performance management, indicating a strong focus on growth for engineering managers and their teams.
- What is the work arrangement for the Senior Principal Engineering Manager role at Microsoft Research?
- While the job description doesn't explicitly state the work arrangement, similar high-level technical roles at Microsoft often offer hybrid or remote options. It is best to clarify this during the interview process.
- How does Microsoft Research approach AI development and research?
- Microsoft Research aims to bridge general AI with specialized applications, focusing on building AI infrastructure that powers models on GPU clusters and accelerates research through agentic development, fostering an AI-first mentality within their teams.
- What is the role of a Senior Principal Engineering Manager in managing an engineering team at Microsoft Research?
- The role involves leading, mentoring, and growing the team; recruiting and developing talent; driving execution and accountability for complex projects; fostering a culture of operational excellence and psychological safety; and providing technical vision and judgment.