
AI Observability Engineer
Bytespoke · India
- Hybrid
- Contract
- $150,000 / year
- India
Job highlights
- Design and implement OpenTelemetry observability solutions.
- Instrument applications across microservices and cloud-native.
- Define observability standards and best practices.
- Experience with Arize AI / Arize AX preferred.
- Requires 7+ years of engineering experience.
About the role
Observability Engineer
We are seeking an experienced Observability Engineer with 7+ years of experience to design, implement, and govern observability solutions across modern distributed systems. The ideal candidate will have strong hands-on experience with OpenTelemetry, be capable of executing and validating observability-related test cases, and define standards, best practices, and reusable blueprints. Familiarity with Arize AI / Arize AX is a strong plus, especially in environments involving ML or AI-powered systems.
Key Responsibilities
Observability Design & Implementation
- Design, implement, and maintain observability solutions using OpenTelemetry for metrics, logs, and traces.
- Support use case needs and custom demands.
- Instrument applications and services across microservices, cloud-native, and hybrid environments.
- Ensure consistent telemetry data collection aligned with architectural and organizational standards.
Standards, Blueprints & Governance
- Define and maintain observability standards, conventions, and naming strategies.
- Create reusable blueprints, reference architectures, and dashboards for application teams.
- Collaborate with platform, SRE, and engineering teams to enforce observability best practices.
Qualifications
- Good understanding of LLM and AI Applications.
- Proficiency in at least one programming language (e.g., Python, or JavaScript).
- Experience with Arize AI / Arize AX for ML observability.
- Hands-on experience with OpenTelemetry (OTel) SDKs, collectors, and pipelines.
- Experience with observability backends (e.g., mainly Arize or Prometheus, Grafana, Azure Monitor, Datadog, New Relic, Elastic, etc.).
Key skills/competency
- Observability Engineer
- OpenTelemetry
- Arize AI / Arize AX
- ML Observability
- AI Applications
- Distributed Systems
- Telemetry Data
- Metrics, Logs, Traces
- Python/JavaScript
- SRE
Skills & topics
- Observability Engineer
- AI Observability
- ML Observability
- OpenTelemetry
- Arize AI
- Arize AX
- Distributed Systems
- Cloud-Native
- Microservices
- SRE
- Python
- JavaScript
- Telemetry
- Metrics
- Logs
- Traces
- Observability
- Platform Engineering
How to get hired
- Tailor your resume: Highlight your 7+ years of experience in observability, OpenTelemetry, and AI/ML systems. Emphasize Arize AI/AX if applicable.
- Showcase technical skills: Detail your experience with metrics, logs, traces, and specific observability backends like Prometheus, Grafana, Datadog, or Elastic.
- Demonstrate governance experience: Provide examples of how you've defined standards, best practices, and reusable blueprints for observability.
- Prepare for technical interviews: Be ready to discuss your approach to designing and implementing observability solutions for distributed systems, including AI/ML applications.
- Understand company needs: Research Bytespoke's focus on modern distributed systems and AI, and how your skills align with their goals.
Technical preparation
Master OpenTelemetry SDKs, collectors, and pipelines.,Gain experience with Arize AI / Arize AX.,Instrument microservices, cloud-native, and hybrid systems.,Practice Python or JavaScript for telemetry instrumentation.
Behavioral questions
Describe a complex observability challenge you solved.,How do you define and enforce observability standards?,How would you collaborate with SRE and engineering teams?,Explain your approach to instrumenting AI/ML systems.
Frequently asked questions
- What specific OpenTelemetry components are most critical for this AI Observability Engineer role at Bytespoke?
- For this AI Observability Engineer position at Bytespoke, hands-on experience with OpenTelemetry (OTel) SDKs, collectors, and pipelines is crucial. The role involves designing and implementing observability solutions using these components for metrics, logs, and traces within distributed systems, especially those involving AI/ML applications.
- How important is Arize AI / Arize AX experience for the Observability Engineer job at Bytespoke?
- Experience with Arize AI / Arize AX for ML observability is a strong plus for the Observability Engineer role at Bytespoke. While not strictly mandatory, it is highly valued, particularly in environments leveraging ML or AI-powered systems, as it directly relates to the specialized observability needs of such applications.
- What programming languages are preferred for the Observability Engineer position at Bytespoke?
- Bytespoke prefers candidates for the Observability Engineer role to be proficient in at least one programming language. Common and preferred languages include Python or JavaScript, which are widely used for instrumenting applications and services in distributed and cloud-native environments.
- What kind of distributed systems will an AI Observability Engineer at Bytespoke work with?
- An AI Observability Engineer at Bytespoke will work with modern distributed systems, including microservices, cloud-native architectures, and hybrid environments. The focus is on ensuring consistent telemetry data collection across these complex systems to enhance their observability and performance.
- What does 'governance' entail for an Observability Engineer at Bytespoke?
- For an Observability Engineer at Bytespoke, governance involves defining and maintaining observability standards, conventions, and naming strategies. It also includes collaborating with teams to enforce best practices and creating reusable blueprints and reference architectures to ensure consistency and efficiency in observability solutions.