Senior DevOps Engineer
@ TextLayer

Ottawa, Ontario, Canada
CA$200,000 - CA$220,000
On Site
Full Time
Posted 2 days ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXXX XXXXXXXXXXX XXXXXXXXXX****** @textlayer.com
Recommended after applying

Job Details

About TextLayer

TextLayer helps enterprises and funded startups deploy advanced AI systems without rewriting their infrastructure. We bridge the gap between AI potential and practical implementation in sectors like fintech and healthtech.

The Role

The Senior DevOps Engineer will architect production-grade monitoring, logging, and tracing systems for AI workloads. This role includes implementing OpenTelemetry pipelines, building deployment workflows with Infrastructure as Code, and creating resilient observability solutions for LLM applications and conversational AI systems.

Key Responsibilities

  • Design and maintain OpenTelemetry-based observability infrastructure.
  • Build and scale ELK stack deployments for log aggregation and visualization.
  • Implement tracing and monitoring for LLM inference and AI workflows.
  • Develop data ingestion pipelines for high-volume telemetry data.
  • Configure and optimize OpenSearch clusters for real-time analytics.
  • Deploy and manage observability platforms like Langfuse and OpenLLMetry.
  • Implement IaC using Terraform, CloudFormation, and similar tools.
  • Build automated alerting and incident response systems.
  • Collaborate with engineering teams for proper telemetry instrumentation.
  • Optimize data retention, indexing strategies, and query performance.

What You Will Bring

A deep expertise in observability infrastructure, experience with OpenTelemetry and ELK, and a passion for scaling AI workloads. Strong skills in IaC, container orchestration, and scripting are required.

Required Qualifications

  • 4+ years in DevOps/Infrastructure engineering with focus on observability.
  • Expert-level experience with OpenTelemetry implementation and customization.
  • Production experience with the ELK stack and cluster management.
  • Strong knowledge of distributed tracing, metrics collection, and log aggregation.
  • Experience with container orchestration (Kubernetes, Docker) and cloud platforms (AWS/GCP/Azure).
  • Proficiency in IaC tools like Terraform, Ansible, and CloudFormation.
  • Experience with high-throughput data ingestion and real-time analytics systems.
  • Strong scripting skills in Python and Bash.
  • Knowledge of observability best practices, SLIs/SLOs, and incident response.
  • Familiarity with monitoring tools like Prometheus, Grafana, or DataDog.

Bonus Points

  • Experience with LLMOps observability tools (Langfuse, LiteLLM, etc.).
  • Proficiency in Golang, Rust, or C/C++.
  • Knowledge of AI/ML system monitoring patterns and telemetry.
  • Experience with OpenSearch, ClickHouse, and conversational AI analytics.
  • Contributions to open-source observability or LLMOps projects.
  • Familiarity with eval-driven development and automated AI system testing frameworks.

Key skills/competency

  • OpenTelemetry
  • ELK
  • Observability
  • Monitoring
  • Infrastructure as Code
  • Kubernetes
  • Terraform
  • Scripting
  • Telemetry
  • AI Workloads

How to Get Hired at TextLayer

🎯 Tips for Getting Hired

  • Research TextLayer's culture: Explore its mission, values, and projects.
  • Tailor your resume: Highlight DevOps and observability achievements.
  • Showcase technical skills: Demonstrate OpenTelemetry and IaC experience.
  • Practice interview questions: Prepare for technical and behavioral queries.

📝 Interview Preparation Advice

Technical Preparation

Review OpenTelemetry configuration practices.
Practice ELK stack deployment scenarios.
Simulate IaC deployments with Terraform.
Study container orchestration and cloud setups.

Behavioral Questions

Describe a high-pressure problem-solving experience.
Explain collaboration in multi-team projects.
Discuss managing unexpected system failures.
Share experiences with continuous learning.

Frequently Asked Questions