
Data Engineer (Databrick + Pyspark)
Capco · India; India - Pune
- On site
- India; India - Pune
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Data Engineer (Databrick + Pyspark) role at Capco
Hi Casey — I came across the Data Engineer (Databrick + Pyspark) opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Capco stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
About the role
Job Title: Data Engineer (PySpark / Databricks)
Experience: 5–9 Years Location: Pune (Hybrid – Capco Office)
Job Summary
We are looking for a skilled Data Engineer with strong expertise in PySpark, Databricks, and modern data engineering practices. The ideal candidate will have hands-on experience in building scalable data pipelines, working with large datasets, and leveraging cloud-based data platforms.
Key Responsibilities Design, develop, and maintain scalable ETL/ELT data pipelines Work extensively with PySpark and Apache Spark for large-scale data processing Build and manage workflows using Apache Airflow Develop and optimize data solutions on Databricks (Jobs, Delta Lake) Work with cloud-based data lakes (S3 or equivalent) Write efficient and complex SQL queries for data transformation and analysis Run and manage Spark workloads on EMR Serverless or other managed Spark platforms Ensure data quality, reliability, and performance optimization of pipelines Must Have Skills Strong hands-on experience with PySpark and Apache Spark internals Experience with Databricks (Jobs, Delta Lake) Proficiency in Apache Airflow for workflow orchestration Solid experience building ETL/ELT pipelines at scale Strong SQL skills and experience with Data Warehouse (DWH) systems Experience running Spark workloads on EMR Serverless or managed Spark platforms Hands-on experience with cloud data lakes (S3 or equivalent) Good to Have Skills Experience with Delta Lake / Apache Iceberg Exposure to streaming frameworks (Spark Structured Streaming, Kafka) Familiarity with CI/CD pipelines for data engineering workflows Knowledge of data governance, cataloging, and lineage tools
