10 days ago

AI Research Engineer

Datadog

On Site
Full Time
$250,000
Paris, Île-de-France, France

Job Overview

Job TitleAI Research Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$250,000
LocationParis, Île-de-France, France

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Datadog AI Research (DAIR)

As an AI Research Engineer on our team, you will partner with research scientists to transform innovative research ideas into robust working systems. This involves building essential data infrastructure, developing specialized tooling, and setting up the underlying frameworks that facilitate rapid iteration, ensure trustworthy evaluation, and provide a seamless transition from prototype to production.

Building on Datadog's successful history of AI-powered solutions, such as Bits AI, Watchdog, and Toto, Datadog AI Research (DAIR) is dedicated to pursuing high-impact, high-reward projects. These initiatives are deeply rooted in addressing real-world challenges within cloud observability and security domains.

We are currently focused on three core research areas:

  • Observability Foundation Models: Developing state-of-the-art models for advanced forecasting, anomaly detection, and multi-modal telemetry analysis (logs, metrics, traces, etc.). These models will also serve as the foundational technology for our agents (described below) to natively analyze telemetry data.
  • Site Reliability Engineering (SRE) Autonomous Agents: Creating AI agents designed to automatically detect, diagnose, and resolve incidents in production environments. This pushes the boundaries of multi-step planning, reasoning, and domain-specific knowledge application.
  • Production Code Repair Agents: Developing intelligent agents and models that leverage code, logs, runtime data, and other signals to identify, fix, and even preempt performance issues and security vulnerabilities in production code.

What You’ll Do as an AI Research Engineer

  • Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling.
  • Implement models, run experiments at scale, and profile for reliability, performance, and cost efficiency.
  • Orchestrate distributed training and distributed Reinforcement Learning (RL) with Ray, including managing scheduling, scaling, and failure recovery.
  • Ensure the research stack is observable, reproducible, and user-friendly.
  • Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks.
  • Collaborate with Research Scientists, Product, and Engineering teams to integrate advanced AI capabilities into Datadog’s product ecosystem and harden prototypes into reliable services.
  • Contribute high-quality code, comprehensive documentation, and open-source artifacts that empower the community and internal teams to reproduce, extend, and evaluate results.

Who You Are

  • You possess strong software engineering skills with relevant experience in domains such as observability, SRE, or security.
  • You have deep expertise in distributed computing and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a significant advantage.
  • You are proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and comfortable working with modern cloud and data infrastructure.
  • You have practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration.
  • You are familiar with efficient training, fine-tuning, and inference techniques for large foundation models.
  • You can articulate design and performance trade-offs clearly to both technical and non-technical audiences.
  • You have a strong interest in open-science and open-source contributions, including establishing rigorous benchmarks and sharing artifacts with the community.

Bonus Points

  • You have a demonstrated ability to bridge cutting-edge research prototypes and real-world product applications, ideally with large foundation models, generative AI agents, or domain-specific LLM deployments.
  • You are passionate about advancing the frontiers of AI while maintaining a strong focus on customer impact, scalability, and responsible deployment of new technologies.
  • You have hands-on experience with GPU programming and optimization, including experience in CUDA.
  • You have experience writing production data pipelines and applications.
  • You have experience supporting or contributing to research publications.

Benefits and Growth at Datadog

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about AI Research and want to grow your skills, we encourage you to apply.

  • Competitive global benefits package.
  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP).
  • Opportunity to collaborate closely with colleagues across Datadog offices in New York City and Paris.
  • Opportunity to attend and present at conferences and meetups.
  • Intra-departmental mentor and buddy program for in-house networking.
  • An inclusive company culture and ability to join our Community Guilds (Datadog employee resource groups).

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Key skills/competency

  • AI Research
  • Machine Learning Engineering
  • Distributed Systems
  • MLOps
  • Python
  • Cloud Infrastructure
  • Deep Learning
  • Data Pipelines
  • Model Deployment
  • Observability

Tags:

AI Research Engineer
Machine Learning
Distributed Systems
MLOps
Foundation Models
Observability
SRE
Autonomous Agents
Code Repair
Data Pipelines
Python
Ray
PyTorch
JAX
CUDA
Cloud Infrastructure
Containerization
Orchestration
GPU Acceleration
Go

Share Job:

How to Get Hired at Datadog

  • Research Datadog's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor. Understand their focus on observability and cloud security.
  • Tailor your resume for AI Research Engineer: Highlight experience with distributed ML systems, Python, cloud infrastructure, and specifically Ray, PyTorch, or JAX. Emphasize projects bridging research and product.
  • Showcase your technical depth: Prepare to discuss your practical experience with ML training/inference, GPU acceleration, and efficient large model techniques. Demonstrate strong problem-solving in distributed environments.
  • Prepare for behavioral questions: Reflect on experiences demonstrating collaboration with scientists, product teams, and engineering. Be ready to articulate design trade-offs and your passion for customer impact and responsible AI deployment.
  • Engage with Datadog's open-source initiatives: If applicable, contributing to relevant open-source projects or having a strong GitHub profile can significantly strengthen your application. Highlight any work on benchmarks or shared artifacts.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background