
Member of Technical Staff, Data Infrastructure
Runway · United States
- Hybrid
- Full-time
- $150,000 / year
- United States
Job highlights
- Build and scale data infrastructure for AI research.
- Own critical data pipelines from production to ML datasets.
- Work with large-scale multimodal datasets and vector databases.
- Optimize data systems using Python, SQL, and cloud technologies.
- Contribute to a growing AI company focused on world simulation.
About the role
Data Engineer at Runway
Runway is building AI to simulate the world by merging art and science. We believe world models are the frontier of AI progress, crucial for advancements in robotics, disease research, and scientific discovery. Our mission requires models that learn through experience and trial-and-error, which can be accelerated through simulation. World models offer a clear path to general-purpose simulation, transforming storytelling, scientific progress, and humanity's future endeavors.
Our team is composed of creative, open-minded, caring, and ambitious individuals dedicated to global change. We continuously strive to build the impossible, and our success depends on assembling an exceptional team. If you share this drive, we want to hear from you.
About The Role
We are seeking a Data Engineer to construct and scale the data infrastructure essential for Runway's AI research and business intelligence. You will be responsible for critical data pipelines, encompassing production databases, analytics warehouses, and extensive ML training datasets. This position integrates data engineering, ML infrastructure, and analytics, supporting both cutting-edge research and data-informed business decisions.
You will tackle complex challenges at scale, including managing billions of rows of multimodal training data, establishing CDC streams from production systems, optimizing vector databases for ML workflows, and developing the foundational data layer for the entire company.
Technical Stack Overview
Our data infrastructure utilizes specialized systems such as LanceDB for vector storage and multimodal training data versioning, ClickHouse as an analytics warehouse fed by CDC streams from production Postgres via AWS Kinesis, and BigQuery for training run logs and evaluation results. We leverage Ray for large-scale distributed data processing on managed Kubernetes clusters, handling preprocessing, feature generation, and dataset curation. We are actively developing our data platform, integrating dbt for standardized transformations, enhancing dataset versioning and data lineage tracking, scaling data sourcing, and implementing robust data quality practices. Monitoring is handled by Prometheus and Grafana, with Terraform for infrastructure management. This role offers an opportunity to provide best practices and technical leadership as we mature our data infrastructure to meet growing ML training and research demands.
Responsibilities
- Build and manage pipelines for the creation, curation, and processing of large-scale multimodal datasets, including LanceDB management and ML metadata query optimization.
- Develop and maintain ETL and CDC streams from Postgres and ClickHouse to analytics warehouses.
- Create standardized data transformation layers using dbt to transition from ad-hoc SQL queries to maintainable data models for business analytics.
- Manage production databases (Postgres, ClickHouse) and ensure optimal performance and reliability.
Qualifications
- 4+ years of industry experience in data engineering.
- Strong proficiency in Python.
- Experience with data quality, deduplication, and large-scale data cleaning.
- Comfort working with cloud storage (S3) and managing large datasets.
- Proven experience building and maintaining scalable ETL/CDC pipelines.
- Strong SQL skills and experience with multiple database systems (e.g., Postgres, columnar databases like ClickHouse/Redshift).
- Humility and an open mind; a willingness to learn from colleagues is valued at Runway.
Nice to Have
- Experience with large-scale data processing frameworks (e.g., Spark, Ray) and ML frameworks (e.g., PyTorch, JAX).
- Knowledge of cloud platforms (AWS, GCP, Azure) and their data services.
- Understanding of data privacy and security best practices.
- Experience with business intelligence and visualization tools (e.g., Looker, Tableau, PowerBI, Metabase).
- Experience in a high-growth startup environment.
Compensation and Benefits
Runway is committed to recruiting and retaining exceptional talent from diverse backgrounds, ensuring pay equity. Salary ranges reflect competitive market rates for our company size, stage, and industry, with salary being one component of our comprehensive compensation package. Factors influencing salary include relevant experience, skill level, qualifications assessed during the interview process, and internal equity. The provided range is a general expectation for U.S.-based candidates, and adjustments may apply for international candidates or those with different experience levels. We will communicate any updates to the expected salary range.
Working at Runway
Great achievements stem from great teams. We encourage you to apply if you are driven by our mission.
We are dedicated to fostering an inclusive environment where employees can be their authentic selves and have equal opportunities for success. We welcome applications from all individuals, regardless of race, gender identity or expression, sexual orientation, religion, origin, ability, age, or veteran status, if our mission resonates with you.
About Runway
- Universal World Simulator
- GWM-1
- Gen-4.5
- General World Models
- Robotics SDK
- Conversational Real-time Agents
- Runway Studios
We are proud to be recognized as a best place to work by:
- Crain's
- InHerSight
- BuiltIn NYC
- INC
Key skills/competency
- Data Engineering
- Python
- SQL
- ETL/CDC Pipelines
- Cloud Storage (S3)
- Vector Databases (LanceDB)
- Distributed Data Processing (Ray)
- Data Warehousing (ClickHouse)
- Data Quality
- Machine Learning Infrastructure
Skills & topics
- Data Engineer
- Data Infrastructure
- AI
- Machine Learning
- Python
- SQL
- ETL
- CDC
- Vector Databases
- Cloud Storage
- BigQuery
- ClickHouse
- LanceDB
- Ray
- dbt
- Runway
How to get hired
- Tailor your resume: Highlight your 4+ years of data engineering experience, Python proficiency, and SQL skills. Emphasize experience with ETL/CDC pipelines, cloud storage (S3), and databases like Postgres and ClickHouse.
- Showcase ML infrastructure knowledge: Detail any experience with vector databases (LanceDB), large-scale data processing (Ray, Spark), and ML frameworks.
- Demonstrate cloud and data quality expertise: Provide examples of managing large datasets in cloud environments and implementing data quality, deduplication, and cleaning processes.
- Research Runway's mission: Understand their focus on AI world models and simulation. Articulate how your skills align with building the data foundation for this innovative field.
- Prepare for technical interviews: Be ready to discuss your experience with pipeline architecture, database optimization, and problem-solving complex data challenges.
Technical preparation
Behavioral questions
Frequently asked questions
- What are the primary responsibilities for a Data Engineer at Runway?
- As a Data Engineer at Runway, you will be responsible for building and scaling the data infrastructure that powers AI research and business intelligence. This includes owning critical data pipelines, managing large-scale multimodal datasets, optimizing vector databases, and ensuring the reliability and performance of production databases like Postgres and ClickHouse.
- What is the technical stack used by Runway's data infrastructure team?
- Runway's data infrastructure utilizes LanceDB for vector storage, ClickHouse as an analytics warehouse, and BigQuery for training logs. They use AWS Kinesis for CDC streams, Ray for distributed data processing on Kubernetes, dbt for transformations, Prometheus/Grafana for monitoring, and Terraform for infrastructure management.
- What experience is essential for the Data Engineer role at Runway?
- Essential qualifications include 4+ years of industry experience in data engineering, strong Python and SQL skills, experience with ETL/CDC pipelines, cloud storage (S3), and multiple database systems. Comfort with data quality, deduplication, and cleaning at scale is also required.
- Are there specific data processing or ML frameworks that are beneficial for this role?
- While not strictly required, experience with frameworks like Spark or Ray for large-scale data processing, and ML frameworks such as PyTorch or JAX, would be highly beneficial. Familiarity with business intelligence tools is also considered a plus.
- How does Runway approach compensation and benefits for its employees?
- Runway is committed to pay equity and offers competitive market-based compensation. The overall package includes salary and other benefits. Salary is determined by factors such as experience, skill level, qualifications, and internal equity, with specific ranges provided for U.S.-based candidates.
- What is Runway's company culture like?
- Runway fosters a culture of creative, open-minded, caring, and ambitious individuals dedicated to changing the world. They emphasize continuous learning, building impossible things, and creating an inclusive environment where employees can be their authentic selves.
- What kind of data challenges can I expect as a Data Engineer at Runway?
- You can expect to work on challenging problems at scale, such as managing billions of rows of multimodal training data, building CDC streams, optimizing vector databases for ML, and creating the foundational data layer for a rapidly growing AI company.
- Does Runway have specific requirements regarding cloud platforms?
- While knowledge of cloud platforms like AWS, GCP, or Azure and their data services is listed as a 'Nice to Have,' experience with cloud storage (S3) is a requirement. Familiarity with these platforms will be beneficial.