Senior Site Reliability Engineer @ Rocket.Chat
placeRemote
attach_money $150,000
businessRemote
scheduleFull Time
Posted 8 hours ago
Your Application Journey
Interview
Email Hiring Manager
***** @rocket.chat
Recommended after applying
Job Details
Overview
The Senior Site Reliability Engineer at Rocket.Chat plays a crucial role in ensuring reliability, scalability, and performance across all critical systems and services. Reporting to the Head Of Infrastructure and Deployment, you will be a key member of the Engineering team.
Your Responsibilities
As a Senior Site Reliability Engineer, you will:
- Enhance reliability, performance, and scalability of Rocket.Chat's ecosystem.
- Design, develop, and maintain Kubernetes operators and manage core infrastructure.
- Implement robust monitoring, alerting, logging and automation for operational efficiency.
- Lead incident management, on-call response, and blameless post-mortems.
- Collaborate with cross-functional teams to integrate reliability practices early in the product lifecycle.
Mandatory Hard Skills
- Expertise in Kubernetes and cloud platforms (AWS, GCP, Azure, OVH).
- Proficiency in programming/scripting languages (Go, Python, Bash).
- Experience with monitoring tools (Prometheus, Grafana, Loki) and IaC (Terraform, Pulumi, Ansible).
- Solid networking fundamentals and security principles.
- Familiarity with databases like MongoDB or Redis.
Desirable Skills & Soft Skills
- Knowledge in chaos engineering and disaster recovery planning.
- Experience with agile tools like Jira.
- Proactive, collaborative, and strong problem-solving mindset.
- Leadership and clear communication skills even in stressful incidents.
- Data-driven decision making and accountability.
What You'll Do
- Engineer and operate deployment and platform services.
- Manage and optimize core infrastructure and associated tools.
- Ensure service reliability through SLOs, error budgets, and robust monitoring.
- Automate operations and reduce manual toil.
- Foster cross-functional collaboration and implement advanced reliability practices.
Benefits
- Fully remote and flexible working hours.
- Flexible paid time off, holidays, and vacation.
- Company laptop and remote benefits.
- Access to Talki, courses, books, stock options, and a multicultural environment.
- Vibrant company culture and detailed competitive compensation based on location.
Key skills/competency
Kubernetes, AWS, GCP, Python, Terraform, CI/CD, Monitoring, Distributed Systems, Automation, Incident Management
How to Get Hired at Rocket.Chat
🎯 Tips for Getting Hired
- Customize your resume: Tailor skills and projects to Rocket.Chat requirements.
- Showcase SRE expertise: Highlight Kubernetes, cloud, and automation experience.
- Research Rocket.Chat: Understand their open-source communication platform and culture.
- Prepare for technical interviews: Practice incident management and system design questions.
📝 Interview Preparation Advice
Technical Preparation
circle
Review Kubernetes architecture and operator development.
circle
Practice using cloud platforms like AWS and GCP.
circle
Brush up on scripting languages like Python and Go.
circle
Familiarize with IaC tools and CI/CD pipeline setups.
Behavioral Questions
circle
Describe a time you led incident management.
circle
Explain your approach to cross-team collaboration.
circle
Discuss solving complex system outages.
circle
Share experience in proactive problem prevention.
Frequently Asked Questions
What does a Senior Site Reliability Engineer at Rocket.Chat do?
keyboard_arrow_down
How important is Kubernetes experience for Rocket.Chat's SRE role?
keyboard_arrow_down
What kind of work arrangement does Rocket.Chat offer for this role?
keyboard_arrow_down
How should I prepare my resume for the Senior SRE position at Rocket.Chat?
keyboard_arrow_down