5 days ago

Software Engineer II

Microsoft

Hybrid
Full Time
$175,000
Hybrid

Job Overview

Job TitleSoftware Engineer II
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$175,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Overview

The Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is actively seeking a passionate Software Engineer II to contribute to the building, operating, and support of hyperscale cloud infrastructure. This infrastructure is critical for some of the world’s largest supercomputing deployments. In this role, you will collaborate with experienced engineers to develop, monitor, and troubleshoot cloud-native supercomputing systems, directly enhancing the reliability and performance of Azure’s AI infrastructure offerings.

At the supercomputing scale, specialized tools and techniques are essential to maintain system availability, reliability, runtime performance, and health, thereby meeting customer Service Level Agreements (SLAs). Your responsibilities will include building and utilizing state-of-the-art cloud applications and services to monitor supercomputer health, identify operational gaps, and implement features for the smooth management of cloud-native supercomputers. As a Supercomputing Software Engineer II, you will also introduce best practices, drive architectural changes, and influence the roadmap of relevant software and hardware components. Your work will significantly impact the business objectives of a diverse user base and facilitate the next wave of growth and innovation in AI within the cloud environment.

Responsibilities

  • Be proactive and innovative in adding new metrics for monitoring the health of supercomputers.
  • Collaborate with team members and stakeholders to understand requirements and produce detailed, data-driven, collaborative designs for assigned features.
  • Independently use appropriate artificial intelligence tools and practices across the software development lifecycle to develop, test, debug, and maintain code for Supercomputer health monitoring systems.
  • Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products, while also driving consistency in monitoring and operations at scale.
  • Act as a Designated Responsible Individual (DRI) working on-call to monitor system/product feature/service for degradation, downtime, or interruptions and gain approval to restore system/product/service for simple problems.

Qualifications

Required Qualifications:

  • Bachelor's Degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Bachelor’s Degree in Computer Science or a related technical field AND 4+ years technical engineering experience OR Master’s degree in Computer Science or a related technical field AND 3+ years technical engineering experience.
  • Experience with monitoring, profiling, or debugging distributed systems or cloud applications.
  • Familiarity with AI/HPC workloads, GPU-based systems, AI assisted software development and secure software design practices.
  • Familiarity with IaaS operating model and SLA commitments.

Key skills/competency

  • Azure
  • AI/HPC
  • Supercomputing
  • Cloud Infrastructure
  • Monitoring
  • Distributed Systems
  • Software Development
  • Python
  • C#
  • Reliability Engineering

Tags:

Software Engineer
Cloud Infrastructure
Monitoring
Reliability
Performance
Distributed Systems
Observability
Architectural Design
Software Development
On-call Support
Azure
C#
Python
C++
Java
JavaScript
AI
HPC
GPU
IaaS

Share Job:

How to Get Hired at Microsoft

  • Research Microsoft's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume for Azure AI/HPC: Highlight experience with cloud infrastructure, distributed systems, and coding languages like C# or Python.
  • Prepare for technical interviews: Practice data structures, algorithms, and system design, especially for large-scale, distributed cloud environments.
  • Showcase problem-solving skills: Be ready to discuss troubleshooting complex systems, monitoring solutions, and driving architectural improvements.
  • Demonstrate a growth mindset: Emphasize continuous learning, adaptability, and collaboration within a fast-paced, innovative team.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background