Senior Site Reliability Engineer
TYK TECHNOLOGIES LIMITED
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Tyk Technologies Limited
Tyk Technologies Limited is a global company founded in 2015 with offices in London (UK), London (Ontario), Atlanta, and Singapore. We are driving the connected world and powering new products and services through our API Management platform. Our mission is to connect every system in the world, starting with a flexible, default remote, and radically responsible API Management platform. We serve diverse industries including retail, finance, telecoms, healthcare, and media, with thousands of users worldwide.
Our Mission
Tyk is on a mission to connect every system in the world by building a robust API Management platform. We are committed to offering flexibility and autonomy to our employees, believing it enables them to achieve their best results and allows us to build the best possible team, unhindered by location or working hours.
The Role: Senior Site Reliability Engineer
At Tyk, we are dedicated to building software that solves problems. As a Senior Site Reliability Engineer (SRE), you will empower our users with a feature-rich, highly available, and performant platform. We are seeking an experienced and innovative individual to optimize, automate, and enhance our performance using real-time, large-scale data insights. You will be a critical thinker, a challenger, a technical leader, and a collaborative team member focused on continuous improvement.
Responsibilities:
- Collaborate with the Principal SRE to shape and implement the SRE strategic plan.
- Lead the SRE team in translating strategy into actionable plans through the SCRUM process.
- Address wellbeing and performance concerns, fostering a positive and productive team environment.
- Analyze wellbeing survey outcomes and develop improvement plans.
- Champion operational communication, ensuring high-quality and timely updates on team progress.
- Ensure SLA compliance for our cloud environment through proactive monitoring.
- Develop and oversee the roadmap for proactive alerting and monitoring.
- Define and track key performance metrics for cloud services, driving continuous improvement.
- Design and implement solutions to maintain and enhance KPIs.
- Lead performance tuning and fault finding by analyzing metrics from operating systems and applications.
- Optimize system and infrastructure performance, focusing on innovation and anticipating customer needs.
- Engage with commercial teams to understand growth plans and develop corresponding SRE strategies.
- Direct the analysis of cloud infrastructure, focusing on automation, scalability, and management.
- Align with the Principal SRE on automation strategies for cloud-operations tasks.
- Model excellence in software design and automation to enhance Tyk Cloud services, creating runbooks and knowledge sharing.
- Conduct blame-free root cause analysis postmortems, reporting findings and recommendations.
- Document operational processes and policies, ensuring replicability and adherence.
- Provide on-call support, ensuring effective response and resolution in line with SLAs.
- Plan and execute software upgrades to optimize cloud services.
- Assist commercial teams with data requests and account management.
- Champion and adhere to SCRUM methodologies within the SRE team.
What We’re Looking For:
- Proven experience in a senior SRE role or similar.
- Strong knowledge of cloud technologies and SLA SLO SLI management.
- Experience leading teams and implementing SCRUM processes.
- Excellent communication and leadership skills.
- Experience line managing, mentoring, and coaching.
- Ability to analyze and improve operational processes and performance metrics.
- Experience in software design, automation, and root cause analysis.
- On-call support experience and customer-focused mindset.
- Collaborative attitude with commercial and technical teams.
- Launching and operating production Kubernetes clusters.
- Designing and operating infrastructure on AWS and other providers.
- Operating MongoDB (or other document database) clusters.
- Operating Redis (or other key-value storage) clusters.
- Administering Linux servers.
- Maintaining distributed software.
- Operating Prometheus and Grafana.
- Operating logging collection and analysis systems.
- Working hours within 16:00pm – 4:00am UTC.
Skills:
- Kubernetes (administrator)
- Go (advanced)
- AWS (proficient)
- Linux (proficient)
- Terraform and IaC in general (proficient)
- Helm (familiar)
- MongoDB (or similar)
- Redis (or similar)
- Monitoring & logging
- Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
- Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
Why Join Us:
- Unlimited paid holiday.
- Total flexibility in working hours.
- Employee share scheme.
- Generous maternity and paternity leave.
- Company retreats.
- Volunteering Days.
- Employee Wellbeing platform.
Our Values:
We value authenticity, respect, responsibility, independence, honesty, diversity, inclusion, and treating others as we wish to be treated. We encourage innovation, experimentation, trust, collaboration, and continuous improvement.
Key skills/competency:
- Site Reliability Engineering
- Kubernetes
- AWS
- Linux
- Go
- Terraform
- Monitoring and Logging
- Performance Tuning
- Automation
- Cloud Infrastructure
How to Get Hired at TYK TECHNOLOGIES LIMITED
- Tailor your resume: Highlight SRE experience, Kubernetes, AWS, Go, and automation skills relevant to Tyk's needs.
- Showcase leadership: Emphasize team lead, mentoring, and SCRUM process implementation experience.
- Demonstrate technical expertise: Detail your experience with production Kubernetes, AWS, MongoDB, Redis, Linux, and monitoring tools.
- Align with values: Express your understanding of Tyk's culture of flexibility, radical responsibility, and continuous improvement.
- Prepare for technical interviews: Be ready to discuss system design, performance tuning, and fault-finding scenarios.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background