Site Reliability Engineer II - Platform Security at Elastic | Apply at Elastic

About the role

About Elastic: The Search AI Company

Elastic enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.

The Role: Infrastructure Team

The Infrastructure team is the foundation of the Elastic product stack, enabling engineering efforts across the entire company. We are software developers specializing in managing state, including CI pipelines, cloud resources, and cross-team integrations. We own everything needed to build the Elastic Stack on our infrastructure and act as internal software consultants, utilizing our own products for Elastic's business.

What You Will Be Doing

Design and develop tooling that facilitates building, testing, and shipping the Elastic Stack.
Build and operate production services powering core Elastic business functions like downloads, Docker registry, and maps service.
Support internal adoption of the Elastic Stack for software development and analytics use cases.

What You Bring

Software Development: Deep proficiency in at least one programming language (Python, JavaScript, Clojure, Haskell, Java, Go) and a broad development background.
Site-Reliability Engineering: Experience in an SRE or equivalent role, with a core mission of keeping systems operational through code.
Service-Oriented Operations: Multiple years of hands-on experience administering Linux systems, ideally at scale and in distributed environments. SaaS platform operation experience is a plus.
Infrastructure-as-Code: Comfort automating production systems collaboratively using tools like Docker, Terraform, Puppet, Chef, Ansible, Salt, Packer, Kubernetes, or shell scripts, with configuration managed through version control.

Bonus Points

A strong drive to automate and monitor everything.
Experience building reusable software components; open source contributions are a plus.
Comfort with a versioned, Git-based workflow.
Strong Linux fundamentals, including syscall tracing, TCP internals, and init systems.
A passion for open source communities.
Experience thriving in a distributed, asynchronous work environment with strong written communication.
Appreciation for diverse, globally distributed teams and a collaborative, inclusive approach.

Security & Privacy Responsibilities

Take ownership of protecting the confidentiality, integrity, and availability of organizational data and systems by following applicable privacy and security policies, standards, and procedures.
Ensure individual contributions align with Elastic’s Secure Software Development Framework (SSDF).
Proactively participate in mandatory role-based training to ensure personal technical execution consistently aligns with the highest standards of data protection, data privacy, and system resilience.

Key skills/competency

Site Reliability Engineering
Platform Security
Infrastructure as Code
Linux Administration
Python
Docker
Kubernetes
Terraform
CI/CD
System Monitoring

How to get hired

Tailor your resume: Highlight SRE experience and IaC tools.
Showcase your code: Emphasize development proficiency and automation skills.
Demonstrate Linux expertise: Detail your system administration experience.
Prepare for technical interviews: Review SRE principles and Elastic Stack knowledge.
Highlight collaboration: Mention experience with distributed teams and Git workflows.

Frequently asked questions

What programming languages are most valued for the Site Reliability Engineer role at Elastic?

While Elastic values deep expertise in at least one language, Python, JavaScript, Clojure, and Haskell are commonly used by the Infrastructure team. However, proficiency in languages like Java or Go is also beneficial, as the specific language is less critical than the depth of your programming expertise.

What is the expected level of Linux administration experience for this Site Reliability Engineer position?

The role requires multiple years of hands-on experience administering Linux systems, ideally at scale and in distributed environments. Strong Linux fundamentals, including knowledge of syscall tracing, TCP internals, and init systems, are highly regarded.

How important is experience with Infrastructure-as-Code (IaC) for this role at Elastic?

Experience with IaC is crucial. The role involves automating production systems collaboratively, treating configuration as code, and managing it through version control. Familiarity with tools like Docker, Terraform, Puppet, Chef, Ansible, Salt, Packer, or Kubernetes is expected.

Does Elastic offer remote work opportunities for the Site Reliability Engineer position?

Elastic is a distributed company, and the job description mentions flexible locations and schedules for many roles, suggesting a potential for remote or hybrid arrangements. However, specific work arrangements are typically role-dependent and may be discussed during the application process.

What are the key responsibilities regarding security and privacy for this Site Reliability Engineer role?

As a Site Reliability Engineer at Elastic, you will take ownership of protecting data and systems by following security policies, standards, and procedures. This includes adhering to Elastic’s Secure Software Development Framework (SSDF) and participating in training to ensure alignment with data protection and system resilience standards.

What kind of production services does the Infrastructure team at Elastic manage?

The Infrastructure team builds and operates production services that are critical to the Elastic business. These include services for downloads, the Docker registry, and the maps service, among others.

Are open-source contributions a requirement for the Site Reliability Engineer role at Elastic?

Open-source contributions are considered a bonus point, not a strict requirement. Experience building reusable software components or contributing to open-source projects (via libraries, patches, or documentation) can strengthen an application.

Site Reliability Engineer II - Platform Security

Job highlights