PitchMeAI
Joveo AI

Site Reliability Engineer (SRE) (Remote)

Joveo AI · Bengaluru, Karnataka, India

  • Hybrid
  • Full-time
  • $150,000 / year
  • Bengaluru, Karnataka, India

Job highlights

  • Own production system availability and scalability.
  • Apply software engineering to infrastructure operations.
  • Reduce toil with automation and tooling.
  • Enhance system reliability and observability.
  • Partner with engineering on system design.

About the role

About Joveo

Every company says they're "AI-first." We actually are. Joveo's recruitment advertising platform processes millions of hiring decisions through machine learning, real-time bidding, and predictive analytics - helping the world's largest employers find the right people, faster and fairer. But we're not done. Not even close.

Role: Site Reliability Engineer (SRE)

Location: Remote

Role Overview:

We are hiring a Site Reliability Engineer to own the availability, performance, and scalability of Joveo's production systems. You will apply software engineering principles to infrastructure and operations - reducing toil, improving observability, and keeping our platform at the reliability levels our clients depend on.

Key Responsibilities:

  • Define and maintain SLOs, SLIs, and error budgets for critical services
  • Lead incident response, blameless postmortems, and reliability improvements
  • Build internal tooling and automation to reduce operational toil
  • Partner with engineering teams to bake reliability into system design
  • Implement and evolve observability stacks — metrics, logs, and traces
  • Manage on-call rotations and build scalable incident runbooks

Required Skills & Qualifications:

  • Strong software engineering background with SRE or production ops experience
  • Proficiency in Python, Go, or similar for automation and tooling
  • Experience with observability platforms (Datadog, New Relic, Prometheus/Grafana)
  • Deep understanding of distributed systems, failure modes, and reliability patterns
  • Experience with Kubernetes, container orchestration, and cloud-native infrastructure
  • Strong incident management skills and a calm, structured approach to outages

Equal Opportunity Employer:

Joveo is an equal opportunity employer. We are committed to building an inclusive workplace and welcome applications from all qualified individuals regardless of race, color, ethnicity, nationality, gender, gender identity or expression, sexual orientation, age, religion, disability, marital status, or any other characteristic protected by applicable law. All hiring decisions are made solely on the basis of qualifications, skills, and demonstrated ability.

If your dream job is one that doesn’t fit neatly into a job title — apply. Joveo. Where AI meets the future of work.

Key skills/competency

  • Site Reliability Engineering
  • Kubernetes
  • Python
  • Go
  • Observability
  • Distributed Systems
  • Incident Management
  • Cloud-Native Infrastructure
  • Automation
  • SRE

Skills & topics

  • Site Reliability Engineer
  • SRE
  • Python
  • Go
  • Kubernetes
  • Observability
  • Distributed Systems
  • Cloud
  • Automation
  • Production Operations

How to get hired

  • Tailor your resume: Highlight SRE experience, Python/Go proficiency, and Kubernetes skills.
  • Craft a strong cover letter: Emphasize your understanding of distributed systems and incident management.
  • Prepare for technical interviews: Review SRE principles, automation challenges, and observability concepts.
  • Showcase problem-solving: Be ready to discuss past incidents and your approach to resolving them.
  • Research Joveo AI: Understand their AI-first mission and how reliability supports it.

Technical preparation

Master Python or Go for automation tasks.,Deepen Kubernetes and container orchestration knowledge.,Practice with Datadog or Prometheus/Grafana.,Study distributed system failure modes.

Behavioral questions

Describe a major production incident you handled.,How do you balance reliability with feature velocity?,How do you build trust with development teams?,Explain a time you reduced operational toil.

Frequently asked questions

What is the work arrangement for the Site Reliability Engineer role at Joveo AI?
The Site Reliability Engineer position at Joveo AI is a remote role, allowing you to work from any location.
What programming languages are most important for the SRE role at Joveo AI?
Proficiency in Python or Go is highly valued for automation and tooling in the Site Reliability Engineer position at Joveo AI.
What kind of experience is expected for the Site Reliability Engineer at Joveo AI?
We are looking for candidates with a strong software engineering background and prior experience in SRE or production operations to join Joveo AI.
Does Joveo AI use specific observability platforms for their SRE team?
Experience with observability platforms such as Datadog, New Relic, or Prometheus/Grafana is required for the Site Reliability Engineer role at Joveo AI.
What is Joveo AI's approach to incident management for their SRE team?
The Site Reliability Engineer at Joveo AI will lead incident response and blameless postmortems, requiring strong incident management skills and a calm approach.
What are SLOs and SLIs in the context of the Site Reliability Engineer role at Joveo AI?
SLOs (Service Level Objectives) and SLIs (Service Level Indicators) are key metrics that the Site Reliability Engineer will define and maintain to ensure critical service reliability at Joveo AI.
How does Joveo AI ensure reliability in system design for SREs?
The Site Reliability Engineer at Joveo AI will partner with engineering teams to integrate reliability principles directly into system design from the outset.
What does 'reducing toil' mean for a Site Reliability Engineer at Joveo AI?
'Reducing toil' for a Site Reliability Engineer at Joveo AI means automating repetitive operational tasks to improve efficiency and focus on more strategic reliability work.