Platform Monitoring Engineer
Databricks
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Position Overview
The Platform Monitoring Engineer role at Databricks is a high-impact technical position focused on platform reliability, incident response, and customer experience. You will lead incident investigations, design observability solutions, and drive systemic improvements.
The Impact You Will Have
- Lead platform incident investigations and coordinate cross-functional teams.
- Conduct thorough post-incident root cause analysis.
- Design and implement customer-focused alerting pipelines and observability workflows.
- Build automation tools, establish reusable monitoring patterns, and resolve reliability gaps.
What We Are Looking For
- Minimum 5 years of experience in an SRE, DevOps, Production Engineer or similar role.
- Production-level experience with a major cloud provider and container/orchestration technologies.
- Hands-on with tools such as ELK, Prometheus, Grafana, and PagerDuty.
- Strong proficiency in Python or similar languages for building automation tools.
- Experience managing incident lifecycles from detection to post-mortem analysis.
- Degree in Computer Science, Engineering or related field.
About Databricks
Databricks is a leading data and AI company empowering over 10,000 organizations worldwide to unify data, analytics, and AI. Founded by the creators of Lakehouse, Apache Spark, Delta Lake, and MLflow, Databricks drives innovation globally from its headquarters in San Francisco.
Benefits & Diversity
Databricks offers comprehensive benefits and is committed to fostering a diverse and inclusive culture. All candidates are considered without regard to any protected characteristic.
Compliance
If export-controlled technology access is required, the employer may apply for the U.S. government license as needed.
Key skills/competency
- Platform Reliability
- Incident Response
- Observability
- Automation
- Cloud Computing
- Containerization
- Python
- Root Cause Analysis
- Monitoring Tools
- SRE
How to Get Hired at Databricks
- Research Databricks culture: Understand their data and AI mission.
- Customize your resume: Highlight SRE and DevOps experience.
- Leverage cloud expertise: Detail your hands-on cloud projects.
- Prepare technical stories: Explain incident management processes.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background