14 days ago

Platform Monitoring Engineer

Databricks

On Site
Full Time
$150,000
São Paulo, São Paulo, Brazil

Job Overview

Job TitlePlatform Monitoring Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$150,000
LocationSão Paulo, São Paulo, Brazil

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Position Overview

The Platform Monitoring Engineer role at Databricks is a high-impact technical position focused on platform reliability, incident response, and customer experience. You will lead incident investigations, design observability solutions, and drive systemic improvements.

The Impact You Will Have

  • Lead platform incident investigations and coordinate cross-functional teams.
  • Conduct thorough post-incident root cause analysis.
  • Design and implement customer-focused alerting pipelines and observability workflows.
  • Build automation tools, establish reusable monitoring patterns, and resolve reliability gaps.

What We Are Looking For

  • Minimum 5 years of experience in an SRE, DevOps, Production Engineer or similar role.
  • Production-level experience with a major cloud provider and container/orchestration technologies.
  • Hands-on with tools such as ELK, Prometheus, Grafana, and PagerDuty.
  • Strong proficiency in Python or similar languages for building automation tools.
  • Experience managing incident lifecycles from detection to post-mortem analysis.
  • Degree in Computer Science, Engineering or related field.

About Databricks

Databricks is a leading data and AI company empowering over 10,000 organizations worldwide to unify data, analytics, and AI. Founded by the creators of Lakehouse, Apache Spark, Delta Lake, and MLflow, Databricks drives innovation globally from its headquarters in San Francisco.

Benefits & Diversity

Databricks offers comprehensive benefits and is committed to fostering a diverse and inclusive culture. All candidates are considered without regard to any protected characteristic.

Compliance

If export-controlled technology access is required, the employer may apply for the U.S. government license as needed.

Key skills/competency

  • Platform Reliability
  • Incident Response
  • Observability
  • Automation
  • Cloud Computing
  • Containerization
  • Python
  • Root Cause Analysis
  • Monitoring Tools
  • SRE

Tags:

Platform Monitoring Engineer
SRE
DevOps
Cloud
Observability
Kubernetes
Automation
Python
Incident Response
Monitoring

Share Job:

How to Get Hired at Databricks

  • Research Databricks culture: Understand their data and AI mission.
  • Customize your resume: Highlight SRE and DevOps experience.
  • Leverage cloud expertise: Detail your hands-on cloud projects.
  • Prepare technical stories: Explain incident management processes.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background