
Staff Site Reliability Engineer, Core AI Infrastructure
Coinbase · United States
- Hybrid
- Full-time
- $237,262 / year
- United States
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Staff Site Reliability Engineer, Core AI Infrastructure role at Coinbase
Hi Avery — I came across the Staff Site Reliability Engineer, Core AI Infrastructure opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Coinbase stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
Job highlights
- Lead AI infrastructure reliability and automation.
- Build tools for operational IT workflows.
- Partner on CI/CD and network platforms.
- Enhance observability and documentation.
- Develop AI product applications using Go/Python.
About the role
About Coinbase
Ready to do the most impactful work of your career? At Coinbase, we are uncompromising on our mission to increase economic freedom. The bar is high, the environment is intense, and we like it that way. This isn't a place for complacency, it’s a place to be pushed past your perceived limits. If you're ready to build the future of finance alongside people who refuse to settle for "good enough," you belong here. Coinbase is a remote-first, but not remote-only company. Expect to get together quarterly for intense in-person working sessions called “surges.” learn more about working at Coinbase.
About the Role
You'll join a high-performing team of engineers driving AI transformation at Coinbase as a Staff Site Reliability Engineer on the IT Operations team. This team builds and scales the infrastructure powering Coinbase's AI products, with direct exposure to senior leadership in a fast-paced, incubator-style environment. You'll own the reliability and automation of critical AI infrastructure, ensuring our systems are resilient, observable, and secure at scale.
What you’ll be doing (ie. job duties):
- Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
- Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
- Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
- Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
- Develop full-stack applications that power internal AI products and infrastructure with Go or Python.
What we look for in you (ie. job requirements):
- 8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
- Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
- Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
- Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
- Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.
Nice to haves:
- Expertise with linux, bash, ruby, python and/or go
- Expertise automating EC2 or containers deployment with terraform
- Strong network security fundamentals
- Experience managing and leveraging log aggregation
- Experience working in a highly regulated environment
- Experience in a fast-paced, high-growth company
- Experience in a Remote-first IT environment
Key skills/competency
- Site Reliability Engineering
- Core AI Infrastructure
- AWS
- Kubernetes
- Docker
- Terraform
- Python
- Go
- CI/CD
- Incident Response
Skills & topics
- Staff Site Reliability Engineer
- Core AI Infrastructure
- Site Reliability Engineering
- AI Infrastructure
- AWS
- Kubernetes
- Docker
- Terraform
- Python
- Go
- CI/CD
- Incident Response
- Cloud Infrastructure
- Automation
- IT Operations
- Fintech
- SRE
How to get hired
- Tailor your resume: Highlight your 8+ years of AWS cloud infrastructure automation, IaC tools (Terraform), Docker, Kubernetes, and Go/Python proficiency.
- Showcase leadership: Emphasize your experience in leading incident response, root cause analysis, and achieving measurable reliability improvements under strict SLAs.
- Quantify achievements: Provide specific examples of how you've streamlined operations, eliminated manual tasks, and improved deployment velocity using CI/CD frameworks.
- Demonstrate AI/ML understanding: Mention your experience with generative AI and maintaining human oversight for business-ready outputs and efficiency gains.
- Prepare for technical depth: Be ready to discuss your experience with AWS, Kubernetes, CI/CD pipelines, and developing applications in Go or Python.
Technical preparation
Behavioral questions
Frequently asked questions
- What is the base salary range for the Staff Site Reliability Engineer role at Coinbase?
- The annual base salary range for this Staff Site Reliability Engineer position at Coinbase is $218,025 USD to $256,500 USD, excluding equity and bonus eligibility. Actual compensation may vary based on factors like location and experience.
- Does Coinbase offer remote work for this Staff Site Reliability Engineer position?
- Coinbase is a remote-first company, meaning this Staff Site Reliability Engineer role can be performed remotely. However, expect quarterly in-person working sessions called 'surges' for team collaboration.
- What programming languages are essential for the Staff Site Reliability Engineer role at Coinbase?
- Proficiency in at least one scripting or programming language like Python, Bash, Ruby, or Go is required for this Staff Site Reliability Engineer position. Experience with Go or Python is specifically mentioned for developing full-stack applications.
- What is the expected experience level for the Staff Site Reliability Engineer role at Coinbase?
- This role requires a minimum of 8+ years of experience in automating and supporting cloud infrastructure (AWS) and network environments, with a strong track record in areas like IaC, containerization, and incident response.
- How does Coinbase utilize AI in its hiring process for roles like Staff Site Reliability Engineer?
- For select roles, Coinbase is piloting AI tools for initial screening interviews and for transcribing/summarizing interview notes. These AI tools are for testing purposes, and a human recruiter always reviews the responses to assess qualifications.
- What are the key responsibilities of a Staff Site Reliability Engineer at Coinbase focusing on Core AI Infrastructure?
- Key responsibilities include owning the reliability, monitoring, and incident response for AI infrastructure, building automation and tooling, partnering on CI/CD and network platforms, strengthening observability, and developing AI-powered applications.
- What are the essential infrastructure-as-code (IaC) tools for the Staff Site Reliability Engineer position?
- Hands-on experience with infrastructure-as-code tools such as Terraform, Ansible, Chef, Puppet, or Salt is a requirement for this Staff Site Reliability Engineer role.
Similar roles
Open positions we recommend based on this role.
