
Site Reliability Engineer
Future Secure AI · Toronto, ON
- On site
- Full-time
- $150,000 / year
- Toronto, ON
Job highlights
- Build and operate reliable AI Co‑Worker platforms.
- Own Kubernetes and Terraform infrastructure.
- Automate deployments with Helm and CI/CD.
- Respond to incidents and reduce operational toil.
- Work with cutting-edge AI technology.
About the role
About Future Secure AI
At Future Secure AI, we're building something genuinely new — and we're looking for people bold enough to build it with us. We work at the frontier of AI, tackling big, real-world problems for global enterprises across multiple industries, armed with state-of-the-art technology and a culture that prizes courage, rigor, and relentless curiosity. Our BRAVER values aren't just words on a wall — they describe the kind of people we are and the standard we hold ourselves to every day. Our leadership team is entrepreneurial, experienced, and accessible, with an open-door policy that means you'll never be just a number here. We invest seriously in your growth because we know our success depends on yours. If you're ready to work alongside some of the brightest minds in the industry, push into uncharted territory, and do work that genuinely matters, Future Secure AI is the place for you. Future Secure AI builds AI Co‑Workers that automate real operational work across enterprise environments. Our systems run in production, at scale, and under real‑world constraints. Reliability, resilience, and disciplined engineering are critical to everything we do. We are looking for a Site Reliability Engineer to help design, build, and operate the platforms that power AI Co‑Workers. This is a hands‑on role for an engineer who enjoys owning reliability end‑to‑end and working closely with product, AI, and engineering teams.The Role
- Design, build, and operate reliable production infrastructure supporting AI Co‑Workers
- Own Kubernetes‑based platforms used to deploy and run AI workloads
- Build and maintain infrastructure as code using Terraform
- Implement and maintain Helm‑based deployment workflows
- Define, measure, and improve system reliability using SLIs, SLOs, and SLAs
- Participate in on‑call rotation, incident response, root cause analysis, and post‑mortems
- Reduce operational toil through automation and engineering improvements
- Build and improve observability across monitoring, logging, and alerting
- Partner closely with engineers to ensure systems are resilient, scalable, and secure
- Operate across build, deploy, and operate phases of the software lifecycle
Must Have Criteria
- Hands‑on Kubernetes experience designing, building, or operating workloads on EKS, AKS, GKE, or self‑managed Kubernetes
- Hands‑on Terraform experience for infrastructure provisioning and automation
- Hands‑on Helm experience for Kubernetes application deployment
- Professional experience using at least two programming or scripting languages such as Python, Go, Java, Bash, PowerShell, or Ruby
- Direct Site Reliability Engineer experience or equivalent, including reliability engineering, on‑call, incident response, post‑mortems, and toil reduction
Should Have Criteria
- Experience working within a defined SDLC, including CI/CD, release processes, and end‑to‑end delivery from design to operations
- Hands‑on experience with at least one major cloud provider such as AWS, Azure, or Google Cloud
- Experience with ArgoCD or GitOps‑style deployment approaches
- Five or more years of relevant professional experience
- DevOps or DevSecOps experience, including CI/CD ownership, infrastructure automation, and security considerations
Preferable Criteria
- Relevant certifications such as CKA, CKAD, cloud certifications, DevOps, DevSecOps, or programming credentials
Why Join Us?
- A high-performance culture
- State-of-the-art technology
- Experience world-class leadership
- Scale of impact and purpose
- A competitive salary and a huge growth trajectory
- Work with the best in the industry
- Flexible work environment
- Diversity and creativity
Key skills/competency
- Site Reliability Engineering
- Kubernetes (EKS, AKS, GKE)
- Terraform
- Helm
- Python
- Go
- AWS
- DevOps
- CI/CD
- Observability
Skills & topics
- Site Reliability Engineer
- SRE
- Kubernetes
- Terraform
- Helm
- Cloud
- AWS
- Azure
- GCP
- Python
- Go
- DevOps
- CI/CD
- Observability
- Incident Response
- Automation
- AI Infrastructure
- Production Operations
How to get hired
- Tailor your resume: Highlight Kubernetes, Terraform, Helm, and scripting language experience.
- Showcase SRE experience: Emphasize on-call, incident response, and toil reduction achievements.
- Prepare for technical interviews: Brush up on Kubernetes, cloud platforms, and coding challenges.
- Demonstrate collaboration: Be ready to discuss working with product and engineering teams.
- Understand company values: Align your application with Future Secure AI's BRAVER principles.
Technical preparation
Master Kubernetes concepts and operations.,Practice provisioning infrastructure with Terraform.,Implement application deployments using Helm.,Build automation scripts in Python or Go.
Behavioral questions
Describe a complex system reliability issue.,How do you handle on-call incidents?,Explain your approach to reducing operational toil.,How do you collaborate with development teams?
Frequently asked questions
- What is the primary focus of the Site Reliability Engineer role at Future Secure AI?
- The Site Reliability Engineer at Future Secure AI will focus on designing, building, and operating reliable production infrastructure that powers their AI Co‑Workers. This involves owning Kubernetes-based platforms, utilizing infrastructure as code with Terraform, and ensuring system reliability through SLIs, SLOs, and SLAs.
- What are the 'must-have' technical skills for this Site Reliability Engineer position?
- Candidates must have hands-on Kubernetes experience (EKS, AKS, GKE, or self-managed), practical Terraform skills for infrastructure provisioning, and Helm experience for Kubernetes application deployment. Proficiency in at least two programming or scripting languages like Python, Go, Java, Bash, PowerShell, or Ruby is also required, along with direct SRE experience.
- Does Future Secure AI have a preferred cloud provider for their Site Reliability Engineers?
- While hands-on experience with at least one major cloud provider like AWS, Azure, or Google Cloud is a 'should-have' criterion, Future Secure AI utilizes these platforms to support their AI Co‑Workers. Demonstrating experience with any of these is beneficial for the Site Reliability Engineer role.
- What is the expected level of experience for a Site Reliability Engineer at Future Secure AI?
- The 'should-have' criteria indicate that five or more years of relevant professional experience are expected for a Site Reliability Engineer at Future Secure AI. This includes experience within a defined SDLC, CI/CD, and DevOps/DevSecOps practices.
- How does Future Secure AI approach career growth for its Site Reliability Engineers?
- Future Secure AI emphasizes investing seriously in employee growth, recognizing that their success depends on yours. They offer a competitive salary and a huge growth trajectory, alongside opportunities to work with world-class leadership and the best minds in the industry.
- What is the work environment like at Future Secure AI for a Site Reliability Engineer?
- The work environment is described as high-performance, utilizing state-of-the-art technology. There's a culture that prizes courage, rigor, and curiosity, with an entrepreneurial and accessible leadership team. They also offer a flexible work environment and value diversity and creativity.
- What are the responsibilities related to incident management for this Site Reliability Engineer role?