4 days ago

Senior Software Engineer

Microsoft

Hybrid
Full Time
$240,000
Hybrid

Job Overview

Job TitleSenior Software Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$240,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Overview of the Senior Software Engineer Role

The HPC/AI (High Performance Computing and Artificial Intelligence) team at Microsoft is dedicated to building the next-generation distributed AI supercomputer. This mission aims to drive breakthroughs in artificial intelligence by delivering unparalleled computational power, scalability, and reliability. We focus on designing and developing cutting-edge infrastructure to support high-performance AI model training at scale, thereby creating the foundation for innovations that will redefine AI's potential.

We are actively seeking passionate and innovative Senior Software Engineers to design and build state-of-the-art networking infrastructure crucial for large-scale AI training. This role specifically targets the development of next-generation networking capabilities to ensure high performance, low latency, and minimal jitter for distributed AI workloads. You will be instrumental in enabling state-of-the-art AI systems to achieve their full capabilities.

As a Senior Software Engineer on the HPC/AI team, you will play a pivotal role in shaping the future networking infrastructure for AI training and inference within Azure Cloud. This position offers a unique opportunity to work at the confluence of AI and high-performance computing, two of the most dynamic fields in technology. With the exponential growth of generative AI and the increasing demand for large-scale, low-latency systems, this area represents the forefront of innovation and impact. You will engage with diverse network architectures and advanced processor and accelerator technologies, driving the design and delivery of comprehensive, end-to-end solutions with an unwavering focus on performance, scalability, and observability. If you are passionate about groundbreaking technology, large-scale systems, and AI infrastructure, join us to build the platform that will power the future of AI supercomputing!

Key Responsibilities

  • Design, develop, and optimize networking solutions specifically tailored for large-scale AI training infrastructure.
  • Architect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systems.
  • Benchmark, analyze, and enhance the scalability and reliability of networking systems to manage petabyte-scale data transfer.
  • Debug and resolve complex networking issues within large-scale, high-performance environments.
  • Drive the identification of dependencies and the development of design documents for products, applications, services, or platforms.
  • Create, implement, optimize, debug, refactor, and reuse code to improve performance, maintainability, effectiveness, and return on investment (ROI).
  • Act as a Designated Responsible Individual (DRI), guiding other engineers by developing and following playbooks, working on-call to monitor system/product/service for degradation, downtime, or interruptions. Alert stakeholders and initiate actions to restore system/product/service for both simple and complex problems when necessary.
  • Proactively seek new knowledge and adapt to emerging AI trends, technical solutions, and patterns to improve the availability, reliability, efficiency, observability, and performance of products, while also driving consistency in monitoring and operations at scale.

Required Qualifications

  • Bachelor's Degree in Computer Science or a related technical field AND 4+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, OR Java, JavaScript, or Python; OR equivalent experience.
  • 2+ years of experience with network virtualization, software-defined networking (SDN), or network performance tuning.

Other Requirements

  • Ability to meet Microsoft, customer, and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications

  • Bachelor's Degree in Computer Science OR a related technical field AND 6+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python; OR Master's Degree in Computer Science or a related technical field AND 8+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python; OR equivalent experience.
  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink).
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and understanding their interaction with networking infrastructure.
  • Experience with telemetry and observability tools for network monitoring at scale.
  • Background in building scalable and fault-tolerant systems in large, distributed environments.

Key skills/competency

  • Distributed AI Systems
  • High Performance Computing (HPC)
  • Networking Infrastructure
  • Low Latency Systems
  • Software-Defined Networking (SDN)
  • Network Performance Tuning
  • C/C++/Python/Java
  • AI Accelerators (GPU, TPU)
  • Scalable Systems Design
  • Observability & Telemetry

Tags:

Senior Software Engineer
HPC
AI
Networking
Distributed Systems
Scalability
Low Latency
Performance Optimization
C++
Python
Java
C#
JavaScript
InfiniBand
ROCE
NVLink
GPU
TPU
Azure Cloud
SDN
Telemetry
Observability

Share Job:

How to Get Hired at Microsoft

  • Research Microsoft's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume for AI/HPC: Customize your resume to highlight experience in distributed systems, networking, and high-performance computing, using keywords from the Senior Software Engineer job description.
  • Showcase relevant projects: Prepare to discuss personal or professional projects involving large-scale data, low-latency networks, or AI infrastructure during your Microsoft interviews.
  • Master technical fundamentals: Brush up on data structures, algorithms, and core networking concepts critical for a Senior Software Engineer role at Microsoft.
  • Demonstrate problem-solving: Practice articulating your thought process for complex technical challenges, emphasizing your debugging and optimization skills.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background