Principal Site Reliability Engineer
@ Upstart

Hybrid
Hybrid
Posted 17 days ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXXX XXXXXXXXXXXXX XXXXXXX****** @upstart.com
Recommended after applying

Job Details

About Upstart

Upstart is the leading AI lending marketplace partnering with banks and credit unions to expand access to affordable credit. By leveraging Upstart's AI marketplace, banks and credit unions can achieve higher approval rates and lower loss rates while delivering an exceptional digital-first lending experience.

The Team

Upstart’s Site Reliability Engineering (SRE) team is responsible for the reliability, resiliency, and observability of production systems. This team builds automation, tooling, and frameworks to ensure a healthy, scalable infrastructure while supporting seamless experiences for both engineers and customers.

Role Overview

As the Principal Site Reliability Engineer, you will be a thought leader and SRE evangelist. You will drive the adoption of SRE best practices across the organization, mentor engineers, and influence decisions across multiple teams including Product Engineering, DevEx, Development Productivity (Quality), DevOps, Data Engineering, and Machine Learning.

How You’ll Make an Impact

  • Lead and advocate SRE principles across teams.
  • Shape long-term reliability, resiliency, and observability strategies with leadership.
  • Champion distributed tracing, real user monitoring (RUM), and performance metrics.
  • Build self-healing systems and drive improvements in incident response processes.
  • Manage cross-functional initiatives from concept through execution.

Requirements

Minimum Requirements: 10+ years of combined experience in Software Engineering and SRE, strong communication and mentoring skills, proficiency in Python, Go, and JavaScript/TypeScript, experience with Infrastructure as Code tools (Terraform, CDK, CloudFormation), and hands-on experience with observability and incident management.

Preferred Qualifications: Experience with service mesh, full stack development skills, development productivity, high-scale SaaS environments, and background in building or extending observability platforms.

Position Details

This role is available in Remote, San Mateo, Columbus, and Austin. The team operates across all U.S. time zones with occasional on-site collaboration sessions (3 days per quarter with all travel expenses covered).

Compensation & Benefits

  • Competitive base pay with bonus and equity.
  • Comprehensive medical, dental, and vision coverage.
  • 401(k) with company match and immediate vesting.
  • Employee Stock Purchase Plan (ESPP) and additional benefits.

Equal Opportunity

Upstart is a proud Equal Opportunity Employer dedicated to diversity and inclusion. Applicants requiring accommodation should email candidate_accommodations@upstart.com.

Key skills/competency

  • SRE
  • Reliability
  • Resiliency
  • Observability
  • Distributed Tracing
  • Incident Management
  • Automation
  • Infrastructure as Code
  • Mentoring
  • Program Management

How to Get Hired at Upstart

🎯 Tips for Getting Hired

  • Research Upstart's culture: Understand its digital-first, AI lending focus.
  • Tailor your resume: Highlight SRE and automation experience.
  • Prepare technical examples: Emphasize distributed tracing and both coding and infrastructure projects.
  • Brush up on collaboration: Show proven cross-team communication skills.

📝 Interview Preparation Advice

Technical Preparation

Review Python, Go, JavaScript skills.
Practice Infrastructure as Code scripting.
Study distributed tracing and observability tools.
Simulate incident management scenarios.

Behavioral Questions

Describe a past mentoring experience.
Explain a challenging system outage resolution.
Detail cross-team communication examples.
Discuss decision-making under pressure.

Frequently Asked Questions