10 days ago

Principal Software Engineering Manager

Microsoft

Hybrid
Full Time
$220,000
Hybrid

Job Overview

Job TitlePrincipal Software Engineering Manager
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$220,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Microsoft's HPC/AI Organization

The HPC/AI (High-Performance Computing and Artificial Intelligence) organization is on a mission to build the next generation of distributed AI supercomputers—systems that deliver unprecedented computational power, scalability, and reliability to accelerate breakthroughs in artificial intelligence. Our teams design and develop world-class AI infrastructure that enables large-scale model training and inference, forming the backbone of Microsoft’s AI innovation.

The Role: Principal Software Engineering Manager

As a Principal Software Engineering Manager, you will lead a team building foundational components of Azure’s AI networking infrastructure—powering some of the largest and most complex distributed training systems in the world. This is a rare opportunity to work at the intersection of AI, cloud infrastructure, and high-performance networking, driving innovation across hardware and software boundaries. With the explosive growth of generative AI and the demand for low-latency, high-bandwidth systems, your work will directly impact the scale, performance, and reliability of Microsoft’s AI platforms. You will lead the design, development, and deployment of high-performance, scalable, and observable networking systems that connect AI accelerators at massive scale. The role requires deep technical acumen, strategic thinking, and a passion for engineering excellence. You’ll collaborate across Microsoft teams to define architecture, deliver solutions to complex infrastructure challenges, and ensure our systems meet the evolving needs of AI workloads. If you’re passionate about building large-scale distributed systems, pushing the boundaries of AI infrastructure, and leading teams that shape the future of supercomputing, we invite you to join us on this journey to define the next era of AI at Microsoft.

Microsoft's Mission and Culture

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

  • Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation.
  • Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure.
  • Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy.
  • Establish lean, scalable, and efficient processes that promote innovation and engineering rigor.
  • Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable.
  • Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps.
  • Ensure live-site reliability and service health through robust monitoring, telemetry, and automation.
  • Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions.
  • Apply data-driven insights to optimize performance, scalability, and customer satisfaction.
  • Champion Microsoft’s culture by modeling, coaching, and caring—nurturing diversity, inclusion, and continuous growth for your team and peers.

Qualifications

Required Qualifications:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Additional Or Preferred Qualifications (PQs):

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 4+ years people management experience.
  • 10+ years of professional software design and development experience in large-scale distributed systems.
  • Experience building and operating networking infrastructure for hyperscale datacenters or AI clusters.
  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, MRC, NVLink).
  • In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems.
  • Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning.
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure.
  • Experience with telemetry and observability tools for network monitoring at scale.
  • Background in building scalable and fault-tolerant systems in large, distributed environments.

Key skills/competency

  • Distributed Systems
  • AI Infrastructure
  • High-Performance Networking
  • Software Engineering Leadership
  • Azure Cloud
  • Network Protocols (RDMA, TCP/IP)
  • AI Accelerators (GPUs, TPUs)
  • System Scalability
  • Observability
  • Team Management

Tags:

Principal Software Engineering Manager
Leadership
Distributed Systems
AI Infrastructure
Networking
System Design
Scalability
Reliability
Team Management
Engineering Excellence
Architecture
Azure
C++
Python
Java
C#
InfiniBand
RDMA
TCP/IP
GPUs
TPUs
SDN
Telemetry

Share Job:

How to Get Hired at Microsoft

  • Research Microsoft's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Customize for the Principal Software Engineering Manager role, emphasizing leadership, distributed systems, and AI/HPC networking.
  • Highlight AI/HPC expertise: Showcase experience with AI infrastructure, high-performance networking, and relevant technologies like RDMA and GPUs.
  • Prepare for technical deep-dives: Focus on networking protocols, cloud infrastructure design, and solving complex scalability challenges.
  • Showcase leadership skills: Be ready to discuss team management, fostering innovation, cross-functional collaboration, and strategic planning.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background