Senior Research Data Engineer - Foundation Models
DeepL
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About DeepL
DeepL is a global AI product and research company focused on building secure, intelligent solutions to complex business problems. Over 200,000 business customers and millions of individuals across 228 global markets today trust DeepL's Language AI platform for human-like translation, improved writing and real-time voice translation. Building on a history of innovation, quality and security, DeepL continues to expand its offerings beyond the field of Language, including DeepL Agent - an autonomous AI assistant designed to transform the way businesses and knowledge workers get work done. Founded in 2017 by CEO Jarek Kutylowski, DeepL now has over 1,000 passionate employees and is supported by world-renowned investors including Benchmark, IVP, and Index Ventures.
Our goal is to become the global leader in trusted, intelligent AI technology, building products that drive better communication, foster connections, and create a meaningful impact. To achieve this, we need talented people like you to join our journey. If you’re ready to shape the future of AI and grow your career in a fast-moving, purpose-driven environment, DeepL is your next destination.
What sets us apart is our blend of cutting-edge AI technology, meaningful work, and a culture where people truly thrive. We’re a team of innovators, researchers, and creators driven by a shared purpose to unlock human potential by making work simpler, smarter, and more connected.
When we share what it’s like to work at DeepL, the reactions are overwhelmingly positive. This might be because of our technology that helps millions of people and businesses communicate and work better every day, or because of the trust, curiosity, and care that shape our culture.
What we know for sure is this: being part of DeepL means joining a team dedicated to innovation, growth, and well-being. Discover more about life at DeepL on LinkedIn, Instagram, and our Blog.
The Team: Foundation Models
Every new innovation at DeepL begins in the research department, in the minds and hands of researchers, engineers and developers that are passionate about advancing AI. Data is the lifeblood that fuels this passion, and a crucial part of our work, from model training to quality evaluation.
You will join our Foundation Model track. As a cross-functional group of research scientists and data engineers specialising in machine learning, we develop foundation models for the use in our AI products. Our data engineers create, refine and manage multi-modal training corpora, and own the associated data collection and preparation pipelines. We work with unstructured data on a petabyte scale, and tens of thousands of cores in a hybrid cloud setting to fuel our most ambitious projects.
Your Responsibilities as a Senior Research Data Engineer - Foundation Models
- Work on ambitious frontier research projects as part of a team consisting of research scientists and research data engineers.
- Architect, design and build data pipelines that can handle petabytes of multi-modal unstructured data.
- Build a modern data engineering stack grounded in state-of-the-art technology for orchestration and parallel computation, and make extensive use of actively developing open-source solutions.
- From the lowest levels of components to the birds-eye view of a system - find performance bottlenecks, debug issues, and create pipelines with a focus on stability.
- Leverage our large on-prem data centers and AWS cloud infrastructure for blazing data processing.
- Go beyond “Big Data” and ETL, and engineer and operate complex Python data solutions for real-world unstructured data incl. text, code, image and audio modalities.
- Collaborate with stakeholders, research scientists, other research data engineers and data tooling and platform teams.
- Raise the standard for excellence and act as owner and champion for the quality and availability of our foundation model training data.
- Ensure mission-critical reliability of data pipeline jobs, and maintain high quality code.
Play to your strengths and contribute with creativity, thoroughness, pragmatism, foresight, ingenuity, persistence, and every part of you that elevates the team.
Qualities We Look For
- Professional experience: In data, platform or software engineering, ideally with a focus on large-scale unstructured data.
- Python: Extensive professional experience in Python software engineering. Ideally, experience in maintaining proprietary or open-source software products.
- Data: Experience with exploratory data analysis, cleaning, validation and quality control beyond business intelligence and analytics scale.
- Pipelines: Experience with building reproducible pipelines for storing and processing petabytes of data.
- Operations: Proficiency in containerization and automatic deployment. Ideally, experience with container orchestration with Kubernetes and cloud infrastructure.
- Scaling: Experience with highly scalable, parallel compute workloads (e.g., Dask, Ray, Celery).
- Performance: Experience with writing and optimizing highly performant code.
- Cross-functional Affinity: Ability to work directly with our researchers and engineering stakeholders to translate their needs into data products with the desired user experience and performance.
- Soft Skills: Excellent problem-solving abilities, strong communication skills, and a collaborative mindset.
Ideally, You Have Domain-Specific Experiences
- LLM or VLM training data preparation.
- NLP, text classification, reinforcement learning, model-based/GPU workflows.
- Dynamic workflow orchestration frameworks like Argo Workflows, Airflow, Dagster or Flyte.
- Linguistics expertise or speaking multiple languages.
- Experience in a high-performance programming language like C++, Go or Rust.
Tell us what you bring to the table and let us experience what you’re passionate for.
What DeepL Offers
- Diverse and internationally distributed team: Joining our team means becoming part of a large, global community with people of more than 90 nationalities. We're more than just colleagues; we're a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing–we've doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.
- Open communication, regular feedback: As a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
- Hybrid work, flexible hours: We offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.
- Regular in-person team events: We bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together–literally.
- Monthly full-day hacking sessions: Every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about and get the opportunity to work with other teams–we value your initiatives, impact, and creativity.
- 30 days of annual leave: We value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.
- Competitive benefits: Just as our team spans the globe, so does our benefits package. We've crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.
If this role and our mission resonate with you, but you're hesitant because you don't check all the boxes, don't let that hold you back. At DeepL, it's all about the value you bring and the growth we can foster together. Go ahead, apply—let's discover your potential together. We can't wait to meet you!
We are an equal opportunity employer
You are welcome at DeepL for who you are—we appreciate authenticity here. Our product is for everyone, and so is our workplace. The more voices we have represented and amplified in our business, the more we will all succeed, contribute, and think forward! So bring us your personal experience, your perspectives, and your background. It’s in our diversity that we will find the power to break down language barriers in the world.
Key skills/competency
- Large-scale unstructured data
- Python software engineering
- Petabyte-scale data pipelines
- Containerization & Kubernetes
- Cloud infrastructure (AWS)
- Parallel compute (Dask, Ray)
- Performance optimization
- Cross-functional collaboration
- Foundation model data preparation
- LLM/VLM training data
How to Get Hired at DeepL
- Research DeepL's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to align with their AI vision.
- Tailor your resume: Highlight extensive experience with large-scale unstructured data, Python software engineering, and building robust data pipelines specifically for AI/ML.
- Showcase relevant projects: Provide concrete examples of designing and operating petabyte-scale multi-modal data solutions and optimizing parallel compute workloads.
- Prepare for technical deep-dives: Be ready to discuss distributed systems, cloud infrastructure (AWS), container orchestration (Kubernetes), and performance tuning in detail.
- Emphasize collaboration & problem-solving: Illustrate your ability to work cross-functionally with research scientists and proactively solve complex data challenges for AI products.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background