Site Reliability Engineer Distributed Systems @ Workday
Your Application Journey
Email Hiring Manager
Job Details
About Workday
Your work days are brighter here. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, Workday is shaping the future of work with integrity, empathy, and shared enthusiasm. Join a team where hard work pays off, and meaningful work with supportive colleagues is the norm.
About the Team
The Data Platform and Observability team spans across Pleasanton, CA; Boston, MA; Atlanta, GA; Dublin, Ireland; and Chennai, India. They develop large scale distributed data systems that support Workday products including core HCM, Fins, AI/ML, and internal data products, delivering real-time insights and processing hundreds of terabytes of data.
About the Role
The Messaging, Streaming and Caching team is a full-service Distributed Systems Engineering team dedicated to designing and providing async messaging, streaming, and NoSQL platforms. As a Site Reliability Engineer Distributed Systems, your responsibilities will include:
- Designing, building, and enhancing distributed services such as Kafka, Redis, and RabbitMQ.
- Developing and maintaining core distributed software for streaming, messaging, and caching.
- Creating observability modules, alerts, and automation for dashboard lifecycle management.
- Deploying, operating, and tuning infrastructure components in production environments.
- Evaluating and implementing open-source and cloud-native tools across Kubernetes, OpenStack, and Bare Metal deployments.
- Participating in on-call rotations and managing distributed services in AWS, GCP, and Private cloud environments.
Required Qualifications
Applicants should have 4-8 years of software engineering experience (Java/Scala, Golang), 3+ years in distributed systems, and 3+ years in designing and operating large-scale deployments, with at least 1 year leading NoSQL-related product development.
Preferred Qualifications
Expertise in distributed system software, performance analysis, and optimization, as well as hands-on experience with Kafka, RabbitMQ, Redis, and Cassandra. Experience with CI/CD tools, Agile methodologies, configuration management using Chef, Kubernetes deployments via Helm and ArgoCD, and Linux system internals is desired.
Work Arrangement & Culture
Workday offers Flex Work, combining in-person and remote work. Employees are expected to spend at least 50% of their time in the office or in the field each quarter, ensuring both flexibility and community connection. Inclusion, belonging, and equity (VIBE™) are at the core of Workday's values.
Key skills/competency
- Distributed Systems
- Messaging
- Streaming
- Caching
- DevOps
- Scalability
- Observability
- Cloud Computing
- Automation
- Performance Optimization
How to Get Hired at Workday
🎯 Tips for Getting Hired
- Customize your resume: Highlight relevant distributed systems and DevOps experience.
- Research Workday culture: Learn about its values and mission.
- Prepare technical examples: Emphasize hands-on experience with Kafka, Redis, and cloud platforms.
- Practice behavioral questions: Showcase teamwork and problem-solving skills.
- Network with current employees: Use LinkedIn to gather insights.