Senior Site Reliability Engineer
@ Remotivate

Hybrid
$150,000
Hybrid
Full Time
Posted 15 hours ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXXXX XXXXXXXXX XXXXXXXXXX******* @remotivate.com
Recommended after applying

Job Details

About the Company

Our client is one of the leading SMS providers for marketing teams in the US. Their advanced dashboard and queueing mechanisms help clients scale campaigns to the next level. With a global team in scale-up mode, they are looking for strong problem solvers who thrive on building reliable systems.

About the Role

We are looking for a Senior Site Reliability Engineer with strong infrastructure experience to ensure platform stability and optimize back-end systems in Python. You will play a key role in keeping their SMS marketing platform fast, reliable, and scalable. This highly technical position sits at the intersection of backend engineering and infrastructure.

You will work hands-on with Python/Flask applications, Linux servers, and network analysis tools to ensure millions of SMS messages are delivered without delay or downtime.

Key Responsibilities

  • Maintain and optimize Linux-based servers running Python/Flask apps.
  • Ensure high uptime by monitoring system health and addressing bottlenecks.
  • Troubleshoot complex issues using packet capture and diagnostic tools.
  • Optimize backend workflows for efficient message queuing and delivery.
  • Implement monitoring and alerting systems for enhanced visibility.
  • Automate infrastructure tasks through scripting and tool development.
  • Take ownership in infrastructure decisions to boost platform performance.

Requirements

  • 5+ years experience in SRE, infrastructure, or backend systems engineering.
  • Experience in running and maintaining Python/Flask applications in production.
  • Advanced Python development skills and familiarity with relevant libraries/frameworks.
  • In-depth knowledge of Linux (Debian/Ubuntu) server administration.
  • Proficiency in network analysis using tools like Wireshark, mitmproxy, tcpdump.
  • Understanding of distributed systems, scaling strategies, and performance tuning.
  • Experience with monitoring/logging systems (Prometheus, Grafana, ELK, Datadog) and CI/CD workflows.
  • Comfort with automation tools and scripting for infrastructure management.
  • Excellent troubleshooting, analytical skills, and a strong sense of ownership.

Growth Opportunities & Perks

  • Endless growth in a scale-up environment.
  • Potential to move into R&D or leadership roles.
  • Flexible working schedule with fully remote setup.
  • Performance bonuses as the company grows.
  • Collaborate with highly skilled developers in a challenging industry.

Key skills/competency

Senior Site Reliability Engineer, Python, Linux, Flask, Monitoring, Logging, Networking, Automation, CI/CD, Troubleshooting

How to Get Hired at Remotivate

🎯 Tips for Getting Hired

  • Customize your resume: Highlight infrastructure and Python expertise.
  • Showcase troubleshooting skills: Emphasize your network diagnostic experience.
  • Prepare for technical interviews: Review Linux and CI/CD workflows.
  • Research Remotivate: Understand their scale-up journey and SMS services.

📝 Interview Preparation Advice

Technical Preparation

Review Python and Flask framework.
Practice Linux server management on Debian/Ubuntu.
Familiarize with network diagnostics and tools.
Study CI/CD pipeline and automation scripting.

Behavioral Questions

Describe a time you solved a system outage.
Explain handling pressure during critical failures.
Discuss your approach to taking ownership.
Share experience learning new technical skills.

Frequently Asked Questions