Senior Research Data Engineer
DeepL
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About DeepL
DeepL is a global AI product and research company dedicated to building secure, intelligent solutions for complex business problems. Trusted by over 200,000 business customers and millions of individuals across 228 global markets, DeepL's Language AI platform provides human-like translation, improved writing, and real-time voice translation. Building on a history of innovation, quality, and security, DeepL is expanding its offerings beyond language, including DeepL Agent, an autonomous AI assistant. Founded in 2017 by CEO Jarek Kutylowski, DeepL boasts over 1,000 passionate employees and is supported by world-renowned investors like Benchmark, IVP, and Index Ventures.
Our goal is to become the global leader in trusted, intelligent AI technology, creating products that drive better communication, foster connections, and create meaningful impact. We are looking for talented individuals to join our journey and shape the future of AI in a fast-moving, purpose-driven environment.
What Sets DeepL Apart
DeepL offers a unique blend of cutting-edge AI technology, meaningful work, and a thriving culture. We are a team of innovators, researchers, and creators driven by a shared purpose: to unlock human potential by making work simpler, smarter, and more connected. Our technology helps millions of people and businesses communicate and work better every day, underpinned by a culture of trust, curiosity, and care. Being part of DeepL means joining a team dedicated to innovation, growth, and well-being. Discover more about life at DeepL on LinkedIn, Instagram, and our Blog.
Meet the Foundation Model Team
Innovation at DeepL begins in the research department, driven by researchers, engineers, and developers passionate about advancing AI. Data is the lifeblood fueling this passion, crucial for model training and quality evaluation. You will join our Foundation Model track, a cross-functional group of research scientists and data engineers specializing in machine learning. This team develops foundation models for use in DeepL's AI products. Data engineers in this team create, refine, and manage multi-modal training corpora, owning the associated data collection and preparation pipelines. The team works with unstructured data on a petabyte scale and leverages tens of thousands of cores in a hybrid cloud setting for ambitious projects.
Your Responsibilities as a Senior Research Data Engineer
- Work on ambitious frontier research projects as part of a team comprising research scientists and research data engineers.
- Architect, design, and build data pipelines capable of handling petabytes of multi-modal unstructured data.
- Develop a modern data engineering stack based on state-of-the-art technology for orchestration and parallel computation, extensively utilizing actively developing open-source solutions.
- Identify performance bottlenecks, debug issues, and create stable pipelines, from individual components to system-wide views.
- Leverage DeepL's large on-prem data centers and AWS cloud infrastructure for blazing data processing.
- Go beyond traditional “Big Data” and ETL, engineering and operating complex Python data solutions for real-world unstructured data, including text, code, image, and audio modalities.
- Collaborate effectively with stakeholders, research scientists, other research data engineers, and data tooling/platform teams.
- Raise the standard for excellence and act as owner and champion for the quality and availability of our foundation model training data.
- Ensure mission-critical reliability of data pipeline jobs and maintain high-quality code.
We encourage you to contribute with creativity, thoroughness, pragmatism, foresight, ingenuity, persistence, and every quality that elevates the team.
Qualities We Look For
- Professional Experience: In data, platform, or software engineering, ideally with a focus on large-scale unstructured data.
- Python Expertise: Extensive professional experience in Python software engineering, ideally maintaining proprietary or open-source software products.
- Data Handling: Experience with exploratory data analysis, cleaning, validation, and quality control beyond business intelligence and analytics scale.
- Pipeline Development: Experience building reproducible pipelines for storing and processing petabytes of data.
- Operations Proficiency: In containerization and automatic deployment, ideally with Kubernetes and cloud infrastructure.
- Scaling Knowledge: Experience with highly scalable, parallel compute workloads (e.g., Dask, Ray, Celery).
- Performance Optimization: Experience writing and optimizing highly performant code.
- Cross-functional Affinity: Ability to collaborate directly with researchers and engineering stakeholders to translate needs into data products with desired user experience and performance.
- Soft Skills: Excellent problem-solving abilities, strong communication skills, and a collaborative mindset.
Ideally, You Have Domain-Specific Experiences
- LLM or VLM training data preparation.
- NLP, text classification, reinforcement learning, model-based/GPU workflows.
- Dynamic workflow orchestration frameworks like Argo Workflows, Airflow, Dagster, or Flyte.
- Linguistics expertise or speaking multiple languages.
- Experience in a high-performance programming language like C++, Go, or Rust.
What DeepL Offers
- Diverse & International Team: Join a global community of over 90 nationalities, with a presence in the UK, Germany, Netherlands, Poland, US, and Japan.
- Open Communication & Feedback: A culture that values clear, honest communication, smooth collaboration, direct feedback, and growth mindset.
- Hybrid Work & Flexible Hours: Hybrid schedule (2 days in office), flexible working hours, and trust in your productivity.
- Regular In-person Team Events: Vibrant local, business unit, new-joiner, and company-wide gatherings.
- Monthly Full-day Hacking Sessions: “Hack Fridays” for passionate project work and cross-team collaboration.
- Generous Annual Leave: 30 days of annual leave (excluding public holidays) and access to mental health resources.
- Competitive Benefits: Tailored benefits package reflecting the diversity of our global team and locations.
If this role resonates with you, but you don't check every box, we encourage you to apply. DeepL values the potential you bring and the growth we can foster together.
Key skills/competency
- Data Engineering
- Foundation Models
- Python Programming
- Large-Scale Data
- Distributed Systems
- Cloud Infrastructure (AWS)
- Data Pipelines
- Performance Optimization
- Kubernetes
- Machine Learning Data
How to Get Hired at DeepL
- Research DeepL's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Customize your resume: Tailor your Senior Research Data Engineer resume to highlight experience with large-scale data, Python, cloud platforms, and ML data pipelines.
- Showcase data expertise: Prepare examples demonstrating your ability to architect, build, and optimize data solutions for unstructured, petabyte-scale data.
- Understand foundation models: Familiarize yourself with DeepL's AI products and the role of data in training and evaluating large language/vision models.
- Prepare for technical interviews: Be ready to discuss data architecture, distributed systems, Python performance, and problem-solving relevant to complex data challenges.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background