1 month ago

Data Engineer

Runway

Hybrid
Full Time
$150,000
Hybrid
Apply

Job Overview

Job TitleData Engineer
Job TypeFull Time
Offered Salary$150,000
LocationHybrid
Map of Hybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Data Engineer at Runway

Runway is building AI to simulate the world by merging art and science. We believe world models are the frontier of AI progress, crucial for advancements in robotics, disease research, and scientific discovery. Our mission requires models that learn through experience and trial-and-error, which can be accelerated through simulation. World models offer a clear path to general-purpose simulation, transforming storytelling, scientific progress, and humanity's future endeavors.

Our team is composed of creative, open-minded, caring, and ambitious individuals dedicated to global change. We continuously strive to build the impossible, and our success depends on assembling an exceptional team. If you share this drive, we want to hear from you.

About The Role

We are seeking a Data Engineer to construct and scale the data infrastructure essential for Runway's AI research and business intelligence. You will be responsible for critical data pipelines, encompassing production databases, analytics warehouses, and extensive ML training datasets. This position integrates data engineering, ML infrastructure, and analytics, supporting both cutting-edge research and data-informed business decisions.

You will tackle complex challenges at scale, including managing billions of rows of multimodal training data, establishing CDC streams from production systems, optimizing vector databases for ML workflows, and developing the foundational data layer for the entire company.

Technical Stack Overview

Our data infrastructure utilizes specialized systems such as LanceDB for vector storage and multimodal training data versioning, ClickHouse as an analytics warehouse fed by CDC streams from production Postgres via AWS Kinesis, and BigQuery for training run logs and evaluation results. We leverage Ray for large-scale distributed data processing on managed Kubernetes clusters, handling preprocessing, feature generation, and dataset curation. We are actively developing our data platform, integrating dbt for standardized transformations, enhancing dataset versioning and data lineage tracking, scaling data sourcing, and implementing robust data quality practices. Monitoring is handled by Prometheus and Grafana, with Terraform for infrastructure management. This role offers an opportunity to provide best practices and technical leadership as we mature our data infrastructure to meet growing ML training and research demands.

Responsibilities

  • Build and manage pipelines for the creation, curation, and processing of large-scale multimodal datasets, including LanceDB management and ML metadata query optimization.
  • Develop and maintain ETL and CDC streams from Postgres and ClickHouse to analytics warehouses.
  • Create standardized data transformation layers using dbt to transition from ad-hoc SQL queries to maintainable data models for business analytics.
  • Manage production databases (Postgres, ClickHouse) and ensure optimal performance and reliability.

Qualifications

  • 4+ years of industry experience in data engineering.
  • Strong proficiency in Python.
  • Experience with data quality, deduplication, and large-scale data cleaning.
  • Comfort working with cloud storage (S3) and managing large datasets.
  • Proven experience building and maintaining scalable ETL/CDC pipelines.
  • Strong SQL skills and experience with multiple database systems (e.g., Postgres, columnar databases like ClickHouse/Redshift).
  • Humility and an open mind; a willingness to learn from colleagues is valued at Runway.

Nice to Have

  • Experience with large-scale data processing frameworks (e.g., Spark, Ray) and ML frameworks (e.g., PyTorch, JAX).
  • Knowledge of cloud platforms (AWS, GCP, Azure) and their data services.
  • Understanding of data privacy and security best practices.
  • Experience with business intelligence and visualization tools (e.g., Looker, Tableau, PowerBI, Metabase).
  • Experience in a high-growth startup environment.

Compensation and Benefits

Runway is committed to recruiting and retaining exceptional talent from diverse backgrounds, ensuring pay equity. Salary ranges reflect competitive market rates for our company size, stage, and industry, with salary being one component of our comprehensive compensation package. Factors influencing salary include relevant experience, skill level, qualifications assessed during the interview process, and internal equity. The provided range is a general expectation for U.S.-based candidates, and adjustments may apply for international candidates or those with different experience levels. We will communicate any updates to the expected salary range.

Working at Runway

Great achievements stem from great teams. We encourage you to apply if you are driven by our mission.

We are dedicated to fostering an inclusive environment where employees can be their authentic selves and have equal opportunities for success. We welcome applications from all individuals, regardless of race, gender identity or expression, sexual orientation, religion, origin, ability, age, or veteran status, if our mission resonates with you.

About Runway

  • Universal World Simulator
  • GWM-1
  • Gen-4.5
  • General World Models
  • Robotics SDK
  • Conversational Real-time Agents
  • Runway Studios

We are proud to be recognized as a best place to work by:

  • Crain's
  • InHerSight
  • BuiltIn NYC
  • INC

Key skills/competency

  • Data Engineering
  • Python
  • SQL
  • ETL/CDC Pipelines
  • Cloud Storage (S3)
  • Vector Databases (LanceDB)
  • Distributed Data Processing (Ray)
  • Data Warehousing (ClickHouse)
  • Data Quality
  • Machine Learning Infrastructure

Tags:

Data Engineer
Data Infrastructure
AI
Machine Learning
Python
SQL
ETL
CDC
Vector Databases
Cloud Storage
BigQuery
ClickHouse
LanceDB
Ray
dbt
Runway

Share Job:

How to Get Hired at Runway

  • Tailor your resume: Highlight your 4+ years of data engineering experience, Python proficiency, and SQL skills. Emphasize experience with ETL/CDC pipelines, cloud storage (S3), and databases like Postgres and ClickHouse.
  • Showcase ML infrastructure knowledge: Detail any experience with vector databases (LanceDB), large-scale data processing (Ray, Spark), and ML frameworks.
  • Demonstrate cloud and data quality expertise: Provide examples of managing large datasets in cloud environments and implementing data quality, deduplication, and cleaning processes.
  • Research Runway's mission: Understand their focus on AI world models and simulation. Articulate how your skills align with building the data foundation for this innovative field.
  • Prepare for technical interviews: Be ready to discuss your experience with pipeline architecture, database optimization, and problem-solving complex data challenges.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background