Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Us
Owkin is an AI company on a mission to solve the complexity of biology. It is building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software. At the heart of this system is Owkin K, an AI copilot and its new LLM fine-tuned on biology called Owkin Zero, used by researchers, clinicians, and drug developers to better understand biology, validate scientific hypotheses, and deliver better diagnostics and therapies faster.
Position is based in our London office or remotely in UK and Germany.
Please submit your CV in English
About The Role: Data Engineer
As a Data Engineer at Owkin, you will be an integral part of the Engineering team. This role involves designing, building, and optimizing scalable ETL/ELT pipelines with Airflow to efficiently process complex datasets, ensuring high reliability and performance. You will be responsible for organizing and structuring data systems to align with business objectives, leveraging your expertise in scientific and healthcare information systems to deliver data products specifically tailored for machine learning and AI research. The position demands clear reporting and meticulous attention to detail, alongside the ability to manage high-volume, complex workstreams and prioritize multiple deadlines effectively. Professional interpersonal skills are crucial for collaborating with diverse stakeholders across biotechnology, and you will streamline production workflows for scientific processing and quality assurance.
- Organize and structure data systems at both macro and micro levels, designing and implementing data architectures that support business goals.
- Optimize data pipelines for performance, reliability, and scalability.
- Design, build, and maintain scalable ETL/ELT pipelines with Airflow to process large-scale, complex datasets.
- Demonstrate ability to deliver data products useful for machine learning and AI research and development (data models, metadata, and semantics).
- Possess strong organizational skills to effectively manage high-volume, complex workstreams while prioritizing multiple deadlines.
- Demonstrate knowledge of scientific and healthcare information systems and data sources, along with relevant software tools.
- Show ability to handle a variety of activities across operational delivery, development, and initiatives.
- Demonstrate professional interpersonal skills, working both independently and collaboratively with diverse stakeholders in complex biotechnology areas.
- Streamline the process of taking scientific processing and quality checks into production, ensuring proper monitoring of production workflows.
In Particular, You Will
- Design and optimize data pipelines using Airflow.
- Develop robust solutions in Python and SQL.
- Design, develop, and operate scalable ETL/ELT pipelines to process and transform datasets.
- Work with cross-functional teams, including data scientists, business developers, software engineers, and biomedical researchers, to deliver high-quality data solutions.
- Manage and monitor containerized data infrastructures with Docker and Kubernetes and other cloud platforms.
- Implement and enforce best practices for data governance, security, and compliance.
- Build, optimize, and maintain data architectures, including data lakes, data warehouses, and analytical insights.
- Productionize data processing pipelines, setting and enforcing standards and best practices across scientific teams to deliver high-quality data in an efficient and scalable way.
About You
Required qualifications / experience:
- Master degree in computer sciences or specialization in Data.
- Significant experience (5+ years) as a Data Engineer and have good knowledge of DataOps practices.
- Experience in Python and SQL and you have familiarity with R.
- Experience in architectural design of complex data platforms.
- Proficient in the technologies like Airflow, AWS Step Functions, PostgreSQL, Docker, Kubernetes, Grafana, Infrastructure as Code.
- Autonomous, meticulous, and enjoys teamwork.
- Software development with a focus on code quality, simplicity, maintainability.
- Experience in designing data architecture and building data products.
- Experience handling sensitive personal information.
- Fluent in English.
Preferred Qualifications/bonus:
- Knowledge in healthcare or biology areas.
- Experience with data quality tools such as Great Expectations, Pydantic, Pandera, SQLMesh etc.
- Debugging & Refactoring skills.
What we offer
- Flexible work organization.
- Friendly and informal working environment.
- Opportunity to work with an international team with high technical and scientific backgrounds.
Key skills/competency
- Data Engineering
- ETL/ELT
- Airflow
- Python
- SQL
- Data Architecture
- Docker
- Kubernetes
- AWS Step Functions
- PostgreSQL
- DataOps
- Cloud Platforms
- Data Governance
How to Get Hired at OWKIN
- Research Owkin's mission: Understand their AI in biology, BASI, Owkin K, and Owkin Zero initiatives to align your application.
- Tailor your resume: Highlight significant Data Engineering experience (5+ years), expertise in Python, SQL, Airflow, and DataOps practices.
- Showcase data architecture skills: Emphasize experience designing complex data platforms, ETL/ELT pipelines, and managing containerized infrastructures with Docker/Kubernetes.
- Prepare for technical depth: Be ready to discuss specific projects involving data modeling, pipeline optimization, and data governance within scientific or healthcare contexts.
- Demonstrate collaborative spirit: Prepare examples of cross-functional teamwork, problem-solving in complex biotech areas, and attention to detail in high-volume environments.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background