Principal Software Engineering Manager
Microsoft
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Microsoft's HPC/AI Organization
The HPC/AI (High-Performance Computing and Artificial Intelligence) organization is on a mission to build the next generation of distributed AI supercomputers—systems that deliver unprecedented computational power, scalability, and reliability to accelerate breakthroughs in artificial intelligence. Our teams design and develop world-class AI infrastructure that enables large-scale model training and inference, forming the backbone of Microsoft’s AI innovation.
The Role: Principal Software Engineering Manager
As a Principal Software Engineering Manager, you will lead a team building foundational components of Azure’s AI networking infrastructure—powering some of the largest and most complex distributed training systems in the world. This is a rare opportunity to work at the intersection of AI, cloud infrastructure, and high-performance networking, driving innovation across hardware and software boundaries. With the explosive growth of generative AI and the demand for low-latency, high-bandwidth systems, your work will directly impact the scale, performance, and reliability of Microsoft’s AI platforms. You will lead the design, development, and deployment of high-performance, scalable, and observable networking systems that connect AI accelerators at massive scale. The role requires deep technical acumen, strategic thinking, and a passion for engineering excellence. You’ll collaborate across Microsoft teams to define architecture, deliver solutions to complex infrastructure challenges, and ensure our systems meet the evolving needs of AI workloads. If you’re passionate about building large-scale distributed systems, pushing the boundaries of AI infrastructure, and leading teams that shape the future of supercomputing, we invite you to join us on this journey to define the next era of AI at Microsoft.
Microsoft's Mission and Culture
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation.
- Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure.
- Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy.
- Establish lean, scalable, and efficient processes that promote innovation and engineering rigor.
- Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable.
- Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps.
- Ensure live-site reliability and service health through robust monitoring, telemetry, and automation.
- Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions.
- Apply data-driven insights to optimize performance, scalability, and customer satisfaction.
- Champion Microsoft’s culture by modeling, coaching, and caring—nurturing diversity, inclusion, and continuous growth for your team and peers.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Additional Or Preferred Qualifications (PQs):
- Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- 4+ years people management experience.
- 10+ years of professional software design and development experience in large-scale distributed systems.
- Experience building and operating networking infrastructure for hyperscale datacenters or AI clusters.
- Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, MRC, NVLink).
- In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems.
- Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning.
- Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure.
- Experience with telemetry and observability tools for network monitoring at scale.
- Background in building scalable and fault-tolerant systems in large, distributed environments.
Key skills/competency
- Distributed Systems
- AI Infrastructure
- High-Performance Networking
- Software Engineering Leadership
- Azure Cloud
- Network Protocols (RDMA, TCP/IP)
- AI Accelerators (GPUs, TPUs)
- System Scalability
- Observability
- Team Management
How to Get Hired at Microsoft
- Research Microsoft's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Customize for the Principal Software Engineering Manager role, emphasizing leadership, distributed systems, and AI/HPC networking.
- Highlight AI/HPC expertise: Showcase experience with AI infrastructure, high-performance networking, and relevant technologies like RDMA and GPUs.
- Prepare for technical deep-dives: Focus on networking protocols, cloud infrastructure design, and solving complex scalability challenges.
- Showcase leadership skills: Be ready to discuss team management, fostering innovation, cross-functional collaboration, and strategic planning.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background