
Site Reliability Engineer - AI
Scaleway · Paris, Île-de-France, France
- On site
- Full-time
- €70,000 / year
- Paris, Île-de-France, France
Job highlights
- Ensure infrastructure reliability and scalability.
- Onboard product teams and deploy stacks.
- Implement and manage observability tools.
- Automate infrastructure using GitOps and IaC.
- Participate in on-call rotations.
About the role
About Scaleway
Join Scaleway and shape the sovereign cloud of tomorrow! Since 1999, we have been designing secure, sustainable infrastructures aimed at supporting the most ambitious companies. Historically known for our dedicated servers (Dedibox), we made a strategic shift to cloud computing in 2015. Staying true to our principles of simplicity, flexibility, and technical excellence, we have become one of the leading players in Europe in the sector. With the rise of artificial intelligence, we have strengthened our commitment, supported by the Iliad Group, which is investing €3 billion to develop a serious, sovereign AI alternative to American and Asian giants. Every day, thanks to our fast-growing portfolio of cloud and AI products (bare metal, containerization, serverless, AI, etc.), Scaleway proudly serves thousands of customers across the private and public sector, from corporations like France Télévisions or Hachette Livre, to fast-growing startups like Photoroom and Biolevate, to institutions like the City of Copenhagen.
Our offices are located in Paris, Lille, Toulouse, Rennes, Rouen, Bordeaux and Lyon.
Why We Need You
Our growth is driving us to strengthen our Platform SRE team to build, standardize, and reliable the infrastructure hosting Scaleway's products. Your mission will be to ensure the operational readiness of our infrastructures and the onboarding of product teams in order to maintain high-performance standards, ensure continuous improvement, and support the deployment of product stacks across new regions.
Your Future Team
We work in a collaborative and international environment where the diversity of Scalers, combined with a spirit of sharing, helps bring new projects to life every day, advancing our ambitions together. You will be part of a team of 5 people, including your manager. The team operates in a stabilized environment with a fresh dynamic, focusing on onboarding multiple products and uniformizing engineering practices. We use a mix of Scrum and Kanban methodologies and rotate product referents to keep the work engaging and mitigate "bus factor" risks.
Your Daily Routine
- Build, standardize, and enhance the reliability of the platform infrastructure.
- Onboard product teams and facilitate the deployment of product stacks in new regions.
- Implement and manage observability tools (Grafana, Thanos, Alertmanager).
- Automate infrastructure deployment and management using Gitops processes.
- Ensure configuration consistency through tools like Ansible or Salt.
- Manage operational maintenance (MCO) and handle production incidents.
- Define and monitor reliability metrics such as SLAs and SLOs.
- Maintain clear and accurate technical documentation.
- Manage security components and secrets (e.g., HashiCorp Vault).
- Participate in a weekly on-call rotation (approximately one week per month).
About You
Hard Skills:
- Senior-level experience in Linux system administration and infrastructure management.
- Proficiency with Kubernetes (K8S) and Gitops workflows (Argocd, Fluxcd).
- Strong mastery of Infrastructure as Code (IaC) and automation (Ansible, Salt).
- Solid understanding of Networking fundamentals and security protocols.
- Experience managing Secrets (e.g., Vault) and Observability stacks (Thanos, Grafana).
- Proficiency in scripting and automation for high-availability environments.
Soft Skills:
- Pragmatic approach to problem-solving.
- Strong listening skills and ability to collaborate across teams.
- High level of precision and attention to detail.
- Curiosity and a continuous improvement mindset.
- Open-mindedness and a collaborative spirit.
What You Will Find at Scaleway
- Hybrid work: We offer up to 3 days of remote work per week.
- Offices: Our offices are spacious, dynamic workspaces with bold design, conveniently located near public transport. Most of our offices feature outdoor spaces (terraces) and bike parking facilities.
- Dining: Our chef provides a healthy meal service at the headquarters, and breakfast is available across all our sites year-round. Scalers working from regional sites enjoy a Swile card for lunches.
- Well-being commitments: Whether it’s access to a gym, daycare places, or discounted services for caring services, Scaleway is committed to supporting Scalers in maintaining a balanced life.
- International environment: With dozens of nationalities, Scaleway offers a stimulating environment where English is as widely spoken as French.
- Career & Mobility: Our managers value internal mobility, and opportunities to transition to other entities within the Iliad Group are accessible to all Scalers.
Why Join the Scaleway Adventure?
- A rich and diverse product offering: Scaleway offers over 100 public cloud products in IaaS, PaaS, and AI.
- A cutting-edge technical environment: Scaleway provides modern infrastructures, including high-performance bare metal servers, to tackle exciting technical challenges.
- Commitment to responsible cloud: Scaleway is dedicated to a more responsible cloud, with data centers powered solely by renewable energy since 2017, minimizing our ecological footprint and holding top-level certification.
The Next Steps
- Discovery call with a recruiter (30 min)
- Manager interview to understand your technical skills and approach to the role (45 min)
- Technical interview and Use Case with the team to validate your expertise (1h)
- Deep dive interview to deepen discussions and assess your fit with the team (45 min)
- Final validation with the Head of Tribe and office tour to meet your future colleagues
Key skills/competency
- Linux System Administration
- Kubernetes (K8S)
- GitOps (Argocd, Fluxcd)
- Infrastructure as Code (IaC)
- Ansible
- Salt
- Networking
- Security Protocols
- Secrets Management (Vault)
- Observability (Thanos, Grafana)
Skills & topics
- Site Reliability Engineer
- Linux
- Kubernetes
- K8S
- GitOps
- Infrastructure as Code
- IaC
- Ansible
- Salt
- Automation
- Observability
- Grafana
- Thanos
- Networking
- Cloud Computing
- SRE
- System Administration
- High Availability
How to get hired
- Tailor your resume: Highlight Linux administration, Kubernetes, IaC, and GitOps experience relevant to Scaleway's SRE role.
- Showcase automation skills: Detail your experience with Ansible, Salt, and scripting for high-availability environments.
- Emphasize collaboration: Demonstrate your pragmatic problem-solving and ability to work in an international team.
- Prepare for technical interviews: Be ready to discuss infrastructure, networking, security, and observability concepts.
- Research Scaleway: Understand their commitment to sovereign cloud, AI, and responsible cloud practices.
Technical preparation
Behavioral questions
Frequently asked questions
- What is the work arrangement for a Site Reliability Engineer at Scaleway?
- Scaleway offers a hybrid work model, allowing up to 3 days of remote work per week. This provides flexibility while maintaining opportunities for in-person collaboration.
- What are the key technical skills required for the Site Reliability Engineer role at Scaleway?
- Key technical skills include senior-level Linux system administration, proficiency with Kubernetes and GitOps workflows, strong Infrastructure as Code (IaC) and automation (Ansible, Salt) experience, solid understanding of networking and security protocols, and experience with observability stacks and secrets management.
- How does Scaleway support employee well-being and career development?
- Scaleway is committed to employee well-being through access to gyms, daycare, and discounted services. They also foster career growth through internal mobility and opportunities within the Iliad Group.
- What is Scaleway's approach to diversity and inclusion in hiring?
- Scaleway is committed to building an inclusive and respectful workplace, considering all applications regardless of age, gender, sexual orientation, background, religion, or disability. They believe diverse perspectives drive innovation.
- What is the typical interview process for a Site Reliability Engineer at Scaleway?
- The interview process typically includes a discovery call with a recruiter, a manager interview, a technical interview with a use case, a deep dive interview, and a final validation with the Head of Tribe.
- Does Scaleway offer opportunities for international collaboration for their Site Reliability Engineers?
- Yes, Scaleway has a diverse, international environment with dozens of nationalities, where English is widely spoken, facilitating collaboration on global projects.
- What are Scaleway's commitments regarding responsible cloud practices?
- Scaleway is dedicated to a responsible cloud, with data centers powered solely by renewable energy since 2017, minimizing their ecological footprint and holding top-level certification.