12 days ago

Senior Site Reliability Engineer

TYK TECHNOLOGIES LIMITED

Remote
Full Time
$140,000
Remote

Job Overview

Job TitleSenior Site Reliability Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$140,000
LocationRemote

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Tyk Technologies Limited

Tyk API Management platform is at the forefront of powering the connected world, enabling new products and services by facilitating the connection of diverse systems and services. From internal to external, public to highly encrypted systems, Tyk drives value across various industries including retail, finance, telecoms, healthcare, and media. If you've ever used online banking, news apps, or a connected car, Tyk's technology makes it possible. Founded in 2015 with global offices in London – UK, London – Ontario, Atlanta, and Singapore, Tyk serves thousands of B2B users worldwide, including major brands like Lotte, Bell, T Mobile, RBS, Capital One, and Vinci.

Tyk's mission is to connect every system in the world, starting with its robust API Management platform. The company champions total flexibility, remote work from anywhere globally, and radical responsibility, offering unlimited paid holidays and autonomy to foster peak employee performance and build the best possible team, unconstrained by location or traditional working hours.

The Role: Senior Site Reliability Engineer

At Tyk, we are dedicated to building software that solves complex problems. As a Senior Site Reliability Engineer, you will play a crucial role in empowering users with a rich feature set, high availability, and stellar performance. With a growing customer base, we are seeking an experienced Senior Site Reliability Engineer to optimize, automate, and enhance performance, leveraging real-time insights from massive-scale data. We are looking for an original thinker, a challenger, a technical legend, and an opinionated collaborator eager to drive continuous improvement.

Key Responsibilities

  • Collaborate with the Principal SRE to strategize and implement the SRE strategic plan.
  • Lead the SRE team in translating strategic goals into actionable plans, coordinating through the SCRUM process.
  • Address team wellbeing and performance concerns, cultivating a positive and productive environment.
  • Work with the Principal SRE and Scrum Master to analyze wellbeing survey outcomes and develop improvement initiatives.
  • Champion operational communication, ensuring high-quality and timely updates on team progress.
  • Ensure SLA compliance for our cloud environment through proactive monitoring.
  • Develop and oversee the roadmap for proactive alerting and monitoring systems.
  • Define and track key performance metrics for cloud services to drive continuous improvement.
  • Design and implement solutions to maintain and enhance critical KPIs.
  • Lead performance tuning and fault finding by meticulously analyzing metrics from operating systems and applications.
  • Optimise system and infrastructure performance, focusing on innovation and anticipating customer needs.
  • Engage with commercial teams to understand growth plans and develop aligned SRE strategies.
  • Direct the analysis of cloud infrastructure, with a strong focus on automation, scalability, and effective management.
  • Align with the Principal SRE on comprehensive automation strategies for cloud-operations tasks.
  • Model excellence in software design and automation to enhance Tyk Cloud services, including creating runbooks and facilitating knowledge sharing.
  • Conduct blame-free root cause analysis postmortems, reporting findings and recommendations.
  • Document operational processes and policies, ensuring replicability and adherence.
  • Provide on-call support, ensuring effective response and resolution in line with SLAs.
  • Plan and execute software upgrades to optimize cloud services.
  • Assist commercial teams with data requests and account management.
  • Champion and adhere to SCRUM methodologies within the SRE team.

What We're Looking For

We are seeking a candidate with a proven track record in a senior SRE role or a similar capacity, demonstrating strong knowledge of cloud technologies and expert management of SLA, SLO, and SLI. Experience in leading teams and implementing SCRUM processes is essential, complemented by excellent communication and leadership skills. Candidates should have experience in line managing, mentoring, and coaching, along with the ability to analyze and improve operational processes and performance metrics. Experience in software design, automation, and root cause analysis is critical, as is on-call support experience and a customer-focused mindset. A collaborative attitude when working with commercial and technical teams is also highly valued.

Technical Expertise:
  • Launching and operating production Kubernetes clusters.
  • Designing and operating infrastructure on AWS and other cloud providers.
  • Operating MongoDB (or other document database) clusters.
  • Operating Redis (or other key-value storage) clusters.
  • Administering Linux servers.
  • Maintaining distributed software.
  • Operating Prometheus and Grafana.
  • Operating logging collection and analysis systems.

Please note: Working hours within 16:00pm – 4:00am UTC are required for this role.

Key Skills

  • Kubernetes: Administrator level proficiency.
  • Go and/or Python: Advanced programming skills.
  • AWS: Proficient in Amazon Web Services.
  • Linux: Proficient in Linux administration.
  • Terraform and IaC: Proficient in Infrastructure as Code, especially Terraform.
  • Helm: Familiarity with Helm.
  • MongoDB (or similar): Experience with document databases.
  • Redis (or similar): Experience with key-value storage.
  • Monitoring & Logging: Expertise in related systems.
  • Networking Concepts: Strong grasp of subnets, routing, peering, load balancing, NAT, DNS, TCP/IP, HTTP, TLS, UDP.

Why Join Tyk Technologies Limited?

Tyk offers an unparalleled work environment defined by trust and flexibility. Enjoy unlimited paid holidays and complete flexibility in working hours, empowering you to work when most productive. The company provides an employee share scheme, generous maternity and paternity leave, company retreats, volunteering days, and an employee wellbeing platform. Tyk's culture is built on authenticity, respect, responsibility, independence, honesty, diversity, and inclusion, valuing every team member and fostering a collaborative, challenging, and supportive atmosphere where all ideas are encouraged. Tyk embraces change and constantly strives for improvement, embodying its values through principles like 'It’s ok to screw up!', 'Trust starts with you – make it count!', and 'Make things, better!'

Tyk is an equal opportunities employer committed to fair treatment for all applicants and employees. Learn more about their work life at tyk.io/worklife/ and the company at tyk.io.

Key skills/competency

  • Kubernetes
  • AWS
  • Go
  • Python
  • Terraform
  • MongoDB
  • Redis
  • Prometheus
  • Grafana
  • SRE principles

Tags:

Site Reliability Engineer
Cloud optimization
Automation
Performance tuning
Incident response
Team leadership
Strategic planning
Monitoring
Root cause analysis
Operational processes
On-call support
Kubernetes
AWS
Go
Python
Terraform
MongoDB
Redis
Prometheus
Grafana
Linux

Share Job:

How to Get Hired at TYK TECHNOLOGIES LIMITED

  • Research Tyk Technologies Limited's culture: Study their mission, values (flexibility, autonomy, radical responsibility), recent news, and employee testimonials on LinkedIn and Glassdoor to understand their remote-first, trust-based environment.
  • Tailor your resume for Senior SRE excellence: Customize your resume to highlight extensive experience with cloud technologies (AWS, Kubernetes), distributed systems, performance optimization, and team leadership, using keywords like SRE, automation, monitoring, and incident response.
  • Showcase your technical depth: Prepare to discuss hands-on experience with Go/Python, Terraform, MongoDB, Redis, Prometheus, Grafana, and Linux administration, demonstrating problem-solving skills and innovative solutions in SRE domains.
  • Emphasize leadership and collaboration: During interviews, articulate examples of leading SRE initiatives, fostering team wellbeing, implementing SCRUM processes, and effective communication with technical and commercial stakeholders, aligning with Tyk's collaborative values.
  • Demonstrate on-call and root cause analysis expertise: Be ready to share specific situations where you managed critical incidents, performed blame-free root cause analyses, and implemented preventative measures to ensure high availability and SLA compliance.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background