Site Reliability Engineer II
@ Atlan

Hybrid
$150,000
Hybrid
Full Time
Posted 21 hours ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXX XXXXXXXXXXX XXXXXXX***** @atlan.com
Recommended after applying

Job Details

About the Role

At Atlan, our Site Reliability Engineer II is a key member of the Platform & Reliability Engineering team. You will strengthen alert management and incident response capabilities to ensure fast, reliable, and uninterrupted customer experiences.

Your Mission at Atlan

As a Site Reliability Engineer II, you will:

  • Own and operate end-to-end system reliability
  • Manage incidents within defined SLAs (60 mins for Critical, 180 mins for High)
  • Enhance observability by refining monitoring systems and alerts
  • Automate operations and incident workflows to eliminate manual tasks
  • Collaborate with Platform, Observability, and Product Engineering teams
  • Contribute to documentation and playbooks for process improvement

What Makes You a Great Fit

You possess proven experience in managing alerts, incidents, and root cause analysis in production environments. You have hands-on experience with cloud platforms (AWS, GCP, or Azure) and Kubernetes, along with expertise in monitoring tools like Prometheus, Grafana, ELK/EFK, or Datadog. Strong scripting skills (Python, Bash, or Shell) and excellent communication abilities are essential.

Why You'll Love Working at Atlan

Joining Atlan means real impact from day one with a modern tech stack including Kubernetes, Terraform, Prometheus, and Datadog. You will work with world-class engineers in a learning culture, enjoy autonomy, and have a clear growth path from SRE II to principal levels.

About Atlan

At Atlan, we transform data chaos into clarity for Fortune 500 leaders and hyper-growth startups alike. Backed by top investors and recognized by Gartner and Forrester, we are a fully remote company trusted by global leaders like Cisco, Nasdaq, and HubSpot.

Key skills/competency

reliability, incident response, automation, monitoring, Kubernetes, cloud, scripting, observability, troubleshooting, documentation

How to Get Hired at Atlan

🎯 Tips for Getting Hired

  • Research Atlan's culture: Understand their mission, values, and tech stack.
  • Customize your resume: Highlight cloud, Kubernetes, and automation skills.
  • Prepare for technical interviews: Practice incident management and scripting challenges.
  • Show case collaboration: Emphasize teamwork and communication experiences.

📝 Interview Preparation Advice

Technical Preparation

Review cloud and Kubernetes fundamentals.
Practice writing automation scripts in Python.
Study monitoring tools configuration and troubleshooting.
Prepare incident management case studies.

Behavioral Questions

Describe a challenging incident and resolution.
Explain collaboration during high-pressure situations.
Discuss how you handle repetitive operational tasks.
Share teamwork experiences in distributed settings.

Frequently Asked Questions