Question 1

What is the typical career growth for a Site Reliability Engineer at Cognition?

Accepted Answer

At Cognition, Site Reliability Engineers have the opportunity to grow by taking on more complex system ownership, leading critical incident responses, and shaping the future of platform engineering for our AI products. Given the company's focus on high ownership and a selective team, there's significant potential for impact and advancement.

Question 2

What are the main challenges a Site Reliability Engineer will face at Cognition?

Accepted Answer

The primary challenges for a Site Reliability Engineer at Cognition involve ensuring the extreme reliability of rapidly evolving AI products used by hundreds of thousands of developers. This includes managing complex production systems, leading incident response for critical services, and building scalable infrastructure in a fast-paced environment.

Question 3

How does Cognition approach on-call rotations for Site Reliability Engineers?

Accepted Answer

Cognition emphasizes making on-call sustainable and effective. This involves building robust runbooks and tooling, ensuring clear incident response procedures, and implementing systems that minimize unnecessary pages. The goal is to ensure that when incidents occur, they are managed with speed and clarity, and lead to durable improvements.

Question 4

What specific AI products will a Site Reliability Engineer be supporting at Cognition?

Accepted Answer

A Site Reliability Engineer at Cognition will be directly responsible for the production reliability and platform engineering of Devin, the first AI software engineer, and Windsurf, an AI-native IDE. These are Cognition's flagship products, used daily by a large developer community.

Question 5

Does Cognition require specific cloud provider experience for the Site Reliability Engineer role?

Accepted Answer

While proficiency with cloud infrastructure is essential, Cognition is open to candidates with deep experience in AWS, GCP, or Azure. The focus is on your ability to manage cloud infrastructure effectively using Infrastructure as Code principles, rather than a specific vendor lock-in.

Question 6

How important is software engineering ability for a Site Reliability Engineer at Cognition?

Accepted Answer

Software engineering fundamentals are critical for SREs at Cognition. The role involves writing real code to build and improve systems, automate tasks, and solve complex reliability challenges, rather than solely configuring existing tools. Demonstrating strong coding skills is a key requirement.

Question 7

What kind of security responsibilities are involved for a Site Reliability Engineer at Cognition?

Accepted Answer

Security is treated as an integral part of reliability at Cognition. Site Reliability Engineers are expected to ensure that security concerns, such as misconfigurations, vulnerabilities, and access failures, are addressed with the same urgency as traditional outages, embedding security practices within the reliability framework.

Site Reliability Engineer

Email the hiring manager to get a response.

Job highlights

About the role

About Cognition

Role Mission

What You'll Accomplish

Exceptional Candidates Have Demonstrated

Resources & Environment

Compensation & Benefits

Equal Opportunity

Key skills/competency

Skills & topics

How to get hired

Technical preparation

Behavioral questions

Frequently asked questions

Similar roles