What is the work arrangement for a Principal Site Reliability Engineer at Early Warning?

The Principal Site Reliability Engineer position follows a hybrid work model in Scottsdale, San Francisco, Chicago, or New York. This model encourages a collaborative working environment, blending in-office and remote work.

What are the key technical skills required for the Principal Site Reliability Engineer role at Early Warning?

Key technical skills include extensive experience with Python, Go, or Java, microservices architecture, messaging frameworks (Kafka, SQS, JMS), database technologies (Oracle, DynamoDB, Aurora), caching layers (Redis, Memcached), Linux administration, CI/CD pipelines (GIT, Chef, Maven, Jenkins), and networking fundamentals. Experience with AWS, Docker, and Kubernetes is also highly valued.

What is the required experience level for the Principal Site Reliability Engineer position?

Candidates should have at least 12 years of related experience in managing large, complex technical projects, including experience with post-graduate degrees. Proven ability to lead teams through high-priority incidents and improve the Root Cause Analysis (RCA) process is essential.

Does Early Warning offer visa sponsorship for the Principal Site Reliability Engineer role?

No, Early Warning does not offer employment visa sponsorship for this position. Candidates must independently possess the eligibility to work in the United States at the date of hire.

What is the typical salary range for a Principal Site Reliability Engineer at Early Warning in Phoenix, AZ?

The base pay scale for this position in Phoenix, AZ is $194,000 - $237,000 USD per year. Actual compensation may vary based on factors like job scope, market rates, geographic location, and candidate experience.

What kind of benefits does Early Warning offer to its employees?

Early Warning provides a comprehensive benefits package including competitive medical, dental, and vision plans, 401(k) with company match, flexible time off, paid parental leave, and family planning support. Specifics may be shared during the interview process.

What is the role of a Principal Site Reliability Engineer in application development at Early Warning?

The Principal SRE partners with development teams to design availability and resiliency patterns, implement automation for deployments and disaster recovery, and evangelize observability and monitoring. They also provide feedback on technical requirements and support the adoption of microservice design patterns.

Principal Site Reliability Engineer

Early Warning · Scottsdale, AZ

On site
Full-time
$237,000 / year
Scottsdale, AZ

✓ Hiring manager found for this role

Email the hiring manager to get a response.

Get their verified email + an intro that's ready to send.

★★★★★4.7 · 120,000+ users on the Chrome Web Store

Principal Site Reliability Engineer

Early Warning · Scottsdale, AZ

Verified ✓

Taylor Morgan

Hiring Manager · h•••••@earlywarning.wd5.myworkdayjobs.com

✍️ Your intro emailReady to send

Subject: Interested in the Principal Site Reliability Engineer role at Early Warning

Hi Taylor — I came across the Principal Site Reliability Engineer opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Early Warning stood out because…

🔒 Unlock to read & send

✎ Personalized to your résumé after sign-up.

$1 once

Just this hiring manager

Best value

$9/mo

Unlimited — any job, anywhere

✓ Verified email of the hiring manager
✓ Intro email personalized to your résumé
✓ $9/mo = unlimited — any job link

Secure checkout · cancel anytime

View the original posting ↗

Not recommended alone — most applicants never hear back.

Job highlights

Design and implement resilient applications and infrastructure.
Build automation for deployments and disaster recovery.
Develop and evangelize observability and monitoring systems.
Identify and resolve performance bottlenecks.
Mentor team members and lead technical initiatives.

About the role

About Early Warning

At Early Warning, we’ve powered and protected the U.S. financial system for over thirty years with cutting-edge solutions like Zelle®, Paze℠, and so much more. As a trusted name in payments, we partner with thousands of institutions to increase access to financial services and protect transactions for hundreds of millions of consumers and small businesses. Our positions located in Scottsdale, San Francisco, Chicago, or New York follow a hybrid work model to allow for a more collaborative working environment. Candidates responding to this posting must independently possess the eligibility to work in the United States, for any employer, at the date of hire. This position is ineligible for employment Visa sponsorship.

Overall Purpose

The Principal Site Reliability Engineer partners with development teams by designing availability and resiliency patterns in applications and infrastructure.

Essential Functions

Design and Implement software and tools to improve the performance, availability, scalability, and latency, while delivering end products to customers with the highest efficiency and meeting all security standards.
Supports the company’s commitment to risk management and protecting the integrity and confidentiality of systems and data.
Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios.
Design, Implement and evangelize Observability and monitoring systems to proactively detect problems and identify cause.
Evaluate capacity of the application on a continuous basis to provide stats to the Product/Business teams and recommend an efficient path to scale for future needs.
Identify performance bottlenecks and work with cross-functional teams to troubleshoot and resolve issues.
Serve as a technical liaison for the application and provide documents and runbooks to Level 1 and Level 2 teams.
Participate in 24 X 7 on-call rotation.
Be a champion of excellent processes; take the initiative in developing repeatable patterns and standard, re-usable work across teams.
Work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches.
Understand and support the adoption of best-practice microservice design patterns and other modern software reliability approaches and techniques.
Be a thought leader: a senior point of expertise on site reliability engineering issues, industry trends and developing technologies.
Be a role model to others on the team. Coach and mentor team members.

Minimum Qualifications

Education and experience typically obtained through completion of a Bachelor’s Degree in Business and/or Computer Science or related field.
12+ years of related experience managing large complex projects in a technical or software development environment inclusive of post-graduate degree.
Proven ability to lead a team through high priority Incidents and improve the RCA process.
Excellent troubleshooting skills and proven experience resolving technical issues in complex environments.
Hands-on experience in designing and developing using one or more of the following technologies: Python, Go, Java.
Experience in Microservices Architecture.
Experience with Messaging frameworks such as Kafka, SQS or JMS.
Experience with Database Technologies like Oracle, Dynamo DB, Aurora etc.
Experience with Caching layers such as Redis and memcached.
Strong understanding of Linux administration.
Experience with CI/CD pipeline implementation including GIT, Chef, Maven, Jenkins etc.
Strong understanding of networking fundamentals.
Experience in leading cross-functional teams to create technical solutions.
Proven track record designing and building complex end-to-end systems (full stack developer).
Background and drug screen.

Preferred Qualifications

Good programming skills in one or more of the following languages: Java, ruby, python, JavaScript and GO.
Hands-on experience in supporting applications in a 24X7 customer-facing production environment.
Working knowledge of AWS, Docker, Kubernetes, Swarm.

Key skills/competency

Site Reliability Engineering, Application Performance, Scalability, Latency, Automation, Observability, Monitoring Systems, Microservices Architecture, Cloud Computing (AWS), CI/CD.

Skills & topics

Site Reliability Engineering
SRE
Principal Engineer
Cloud Computing
AWS
Microservices
Automation
Observability
Python
Go
Java
Linux
CI/CD
Kafka
Docker
Kubernetes

How to get hired

Tailor your resume: Highlight your 12+ years of experience in technical project management, SRE, and microservices. Emphasize Python, Go, Java, and AWS skills.
Showcase leadership: Detail your experience leading teams through high-priority incidents and improving RCA processes.
Demonstrate technical expertise: Provide specific examples of designing and building complex end-to-end systems and implementing CI/CD pipelines.
Understand company values: Research Early Warning's mission to protect the U.S. financial system and their commitment to risk management.
Prepare for interviews: Be ready to discuss your experience with microservices, cloud technologies, and troubleshooting complex environments.

Technical preparation

Master Python, Go, or Java for development.,Deep dive into microservices and cloud architectures.,Practice CI/CD pipeline implementation and automation.,Prepare to troubleshoot complex distributed systems.

Behavioral questions

Describe leading teams through high-priority incidents.,How do you improve a Root Cause Analysis process?,Share an example of mentoring junior engineers.,How do you champion process improvements across teams?

Prefer to apply the usual way?

Not recommended alone — most applicants never hear back. Email the hiring manager first.

View original posting ↗

Frequently asked questions

What is the work arrangement for a Principal Site Reliability Engineer at Early Warning?: The Principal Site Reliability Engineer position follows a hybrid work model in Scottsdale, San Francisco, Chicago, or New York. This model encourages a collaborative working environment, blending in-office and remote work.
What are the key technical skills required for the Principal Site Reliability Engineer role at Early Warning?: Key technical skills include extensive experience with Python, Go, or Java, microservices architecture, messaging frameworks (Kafka, SQS, JMS), database technologies (Oracle, DynamoDB, Aurora), caching layers (Redis, Memcached), Linux administration, CI/CD pipelines (GIT, Chef, Maven, Jenkins), and networking fundamentals. Experience with AWS, Docker, and Kubernetes is also highly valued.
What is the required experience level for the Principal Site Reliability Engineer position?: Candidates should have at least 12 years of related experience in managing large, complex technical projects, including experience with post-graduate degrees. Proven ability to lead teams through high-priority incidents and improve the Root Cause Analysis (RCA) process is essential.
Does Early Warning offer visa sponsorship for the Principal Site Reliability Engineer role?: No, Early Warning does not offer employment visa sponsorship for this position. Candidates must independently possess the eligibility to work in the United States at the date of hire.
What is the typical salary range for a Principal Site Reliability Engineer at Early Warning in Phoenix, AZ?: The base pay scale for this position in Phoenix, AZ is $194,000 - $237,000 USD per year. Actual compensation may vary based on factors like job scope, market rates, geographic location, and candidate experience.
What kind of benefits does Early Warning offer to its employees?: Early Warning provides a comprehensive benefits package including competitive medical, dental, and vision plans, 401(k) with company match, flexible time off, paid parental leave, and family planning support. Specifics may be shared during the interview process.
What is the role of a Principal Site Reliability Engineer in application development at Early Warning?: The Principal SRE partners with development teams to design availability and resiliency patterns, implement automation for deployments and disaster recovery, and evangelize observability and monitoring. They also provide feedback on technical requirements and support the adoption of microservice design patterns.

Similar roles

Open positions we recommend based on this role.