Site Reliability Engineer
Digistore24 DACH
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Site Reliability Engineer
Are you an experienced developer or DevOps engineer eager to work remotely and grow in the field of site reliability at an internationally successful software and education company? Join our Site Reliability Engineering team to elevate our reliability to the next level.
Please note: English and German language proficiency are a MUST for this position. Please do not apply if you do not speak both languages.
Who is Digistore24?
We are one of Europe's fastest-growing tech companies, driven by our mission to shape the digital future. We empower people with our software and expertise to share their knowledge online, enabling them to fulfill their dream of owning a business. This allows millions to access information that helps them reach their goals. To sustain our growth, we are expanding our teams and emphasize working with experts and strong personalities who share our values, regardless of their location.
Your Responsibilities as a Site Reliability Engineer
- Automation and Infrastructure as Code (IaC): You will automate repetitive tasks, deployments, and system management to reduce human error and improve efficiency. This involves creating scripts, CI/CD pipelines, or automating infrastructure provisioning.
- Reliability and Performance Optimization: Continuously improve system uptime by identifying bottlenecks and optimizing system architecture.
- Capacity Planning and Scaling: Assess and predict system resource requirements (CPU, memory, storage) to ensure infrastructure scales with increasing demand. Implement auto-scaling solutions to handle load spikes without human intervention, ensuring systems remain performant under various conditions.
- System Monitoring and Incident Response: Continuously monitor system performance, uptime, and reliability using tools like Prometheus, Grafana, or ElasticSearch. The goal is to detect and respond to issues before they impact users. Manage and respond to incidents, outages, and failures quickly, aiming to minimize downtime. This includes managing incident documentation, communication, and post-incident analysis.
- Incident Postmortems and Continuous Improvement: Conduct root cause analysis (RCA) after incidents to identify what went wrong and how to prevent similar issues in the future. Implement fixes, improvements, and best practices based on learnings from postmortems to increase system reliability and reduce future incidents.
Benefits at Digistore24
- Play a crucial role in shaping cutting-edge projects in a collaborative work environment, enjoying flexibility in working time and location.
- Work in our partner's coworking spaces or your home office, ensuring uninterrupted internet access.
- Regular further education opportunities.
- Benefit from the stability of an extremely successful German high-tech company, funded by its product, not by investors.
- Work within outcome-focused teams and a culture of direct feedback.
- Modern equipment provided: Thinkpad or MacBook.
- Join an international, collaborative team with strong cohesion.
- Participate in spectacular team events across various European countries.
- Enjoy autonomy from day one.
- Contribution to a retirement scheme.
- Work with your team on a first-name basis, without a dress code, and at eye level.
- Flexible working hours from Monday to Friday (core working hours from 10 AM to 4 PM).
Requirements for a Site Reliability Engineer
- Communication Mastery: You communicate precisely and recipient-friendly, diffusing potential conflicts with sensitivity and a solution-oriented approach. You strike the right tone with stakeholders, developers, and your team, even under time pressure, and can seamlessly switch between German and English.
- Collaboration Wizardry: You collaborate effectively with developers, stakeholders, and operations, aligning everyone. You understand challenges across teams and find solutions benefiting the entire company.
- Automation Sorcery: You promote automation to save time and reduce errors, implementing tools that enhance team productivity.
- Problem-Solving Genius: You dive deep into problems, identify root causes, and devise solutions that prevent future incidents.
- Self-organization: You thrive on autonomy and excel at organizing and structuring complex projects while working remotely.
Technical Skills Required
- Kubernetes / Container Technology
- CI/CD (Github Workflows, Helm, Kustomize)
- Cloud Services (preferably Google, but others are also okay)
- Excellent spelling and grammar in German
- PHP language experience would be a plus
A Day in the Life of a Site Reliability Engineer
- Morning video call to discuss yesterday's progress and today's plans with your team.
- You work in a structured way, outlining your daily routine and goals. You block out time for the continuous development of SRE processes, supported by your team.
- Daily call with your team: Report priorities and blockers, receiving tangible tips for challenges.
- Dedicated focus time to develop ideas for auto-scaling, monitoring, and alerting improvements. Test your ideas in practice and document successful principles for a one-on-one with the Head of IT Operations.
- After lunch, assist a developer with a new CI/CD workflow, discussing requirements and providing an initial prototype.
- Check and adjust the resource allocation of an application by reviewing current utilization and deployment settings.
- Identify an unmonitored endpoint, create a ticket, and immediately write Terraform code to add it to monitoring.
This Role is Not for You If...
- ... you do not identify with our values.
- ... you have less than 3 years of experience in IT operations.
- ... you can’t take ownership and need to discuss every detail with your supervisor or colleagues.
- ... you have difficulty planning and prioritizing your tasks.
- ... you don’t like to find solutions for complex problems.
- ... you are not confident speaking German AND English.
Our Values
Please take a REALLY close look at the values. Are you ready to live them?
Key skills/competency
- Site Reliability Engineering (SRE)
- DevOps
- Automation
- Infrastructure as Code (IaC)
- System Monitoring
- Incident Response
- Capacity Planning
- Kubernetes/Containerization
- CI/CD
- Cloud Services (GCP)
How to Get Hired at Digistore24 DACH
- Research Digistore24's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to understand their remote-first, outcome-focused environment.
- Tailor your resume: Highlight your Site Reliability Engineering, DevOps, automation, IaC, and incident response experience, showcasing specific achievements relevant to Digistore24's needs.
- Master German and English proficiency: Prepare to demonstrate your strong communication skills in both languages, as it's a critical requirement for this Site Reliability Engineer role.
- Showcase problem-solving capabilities: Prepare detailed examples of how you've identified root causes, implemented lasting solutions, and driven continuous improvement in past SRE roles.
- Emphasize autonomy and collaboration: Share instances where you've thrived in self-organized teams, managed complex projects independently, and collaborated effectively with cross-functional stakeholders.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background