PitchMeAI
Capgemini

Data Science

Capgemini · Navi Mumbai, Maharashtra, India

  • On site
  • Full-time
  • $120,000 / year
  • Navi Mumbai, Maharashtra, India

Job highlights

  • Design scalable data pipelines using Azure Data Services.
  • Develop and optimize PySpark and SQL workflows.
  • Ensure data quality, security, and governance.
  • Collaborate with AI/ML teams on LLM applications.
  • Focus on career growth and flexible work.

About the role

Data Engineer

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.

Your Profile

We are looking for experienced Data Engineers to support the design, development and deployment of scalable data pipelines. The ideal candidate will have strong hands-on experience with Azure Data Services and Databricks. They will be expected to:

  • Design and implement scalable data pipelines using Azure Data Factory, Data Lake, and Databricks ensuring they are optimized for data processing and analytics.
  • Develop and optimize PySpark and SQL workflows for data transformation and orchestration.
  • Create and refine processes for data modeling, mining, and production to support operational and analytical needs.
  • Establish and enforce data quality checks and validation routines to maintain high standards of data accuracy and reliability.
  • Collaborate with AI/ML teams to support RAG-based LLM applications.
  • Ensure data security and governance using Azure Key Vault and related services.

Your Role

  • Over 6 years of hands-on experience with Azure Data Services including Data Lake, Data Factory, Key Vault and Cognitive Search.
  • Proficient in Databricks ecosystem including cluster optimization, performance tuning with expertise in Delta Lake, PySpark and orchestrating workflows.
  • Experience with any relational SQL (SQL Server/Oracle) and NoSQL (MongoDB/DynamoDB) databases including Snowflake along with strong expertise in Python/PySpark for large‑scale data processing.
  • Experience with real-time systems like Event Hubs, Apache Kafka, Spark-Streaming, etc.
  • Experience with any Big Data frameworks like Spark/Kafka/ Hive/ Hadoop etc.
  • Strong programming skills in Python and SQL for data engineering and analytics.
  • Basic understanding of GenAI concepts including RAG and related AI/ML technologies and experience in generating embeddings for both structured and unstructured data sources would be preferred.
  • Familiarity with DevOps practices including CI/CD pipelines, automation strategies and experience in technologies like GitHub and Bitbucket will be good to have.
  • Working knowledge of BI tools (Tableau, Power BI) and data engineering platforms (Microsoft Fabric, Apache Storm, Apache NiFi) for reporting and pipeline setup will be beneficial.

What You Will Love About Working Here

  • We recognize the significance of flexible work arrangements to provide support. Be it remote work, or flexible work hours, you will get an environment to maintain healthy work life balance.
  • At the heart of our mission is your career growth. Our array of career growth programs and diverse professions are crafted to support you in exploring a world of opportunities.
  • Equip yourself with valuable certifications in the latest technologies such as Generative AI.

Capgemini is an AI-powered global business and technology transformation partner, delivering tangible business value. We imagine the future of organizations and make it real with AI, technology and people. With our strong heritage of nearly 60 years, we are a responsible and diverse group of 420,000 team members in more than 50 countries. We deliver end-to-end services and solutions with our deep industry expertise and strong partner ecosystem, leveraging our capabilities across strategy, technology, design, engineering and business operations. The Group reported 2024 global revenues of €22.1 billion.

Make it real | www.capgemini.com

Key skills/competency

  • Data Engineering
  • Azure Data Services
  • Databricks
  • PySpark
  • SQL
  • Data Pipelines
  • Data Modeling
  • Data Quality
  • RAG
  • DevOps

Skills & topics

  • Data Engineer
  • Data Pipelines
  • Azure Data Services
  • Databricks
  • PySpark
  • SQL
  • Data Modeling
  • Data Quality
  • GenAI
  • DevOps
  • Big Data
  • Cloud Computing
  • ETL
  • Data Transformation
  • Data Lake

How to get hired

  • Tailor your resume: Highlight your Azure Data Services, Databricks, PySpark, and SQL experience. Quantify achievements where possible.
  • Showcase relevant projects: Detail your experience with data pipeline design, optimization, and data quality initiatives.
  • Prepare for technical questions: Be ready to discuss your experience with Azure Data Factory, Data Lake, Databricks, PySpark, and SQL.
  • Understand Capgemini's values: Research their commitment to sustainability, inclusivity, and technological innovation.
  • Highlight GenAI understanding: Emphasize any knowledge or experience with GenAI concepts like RAG.

Technical preparation

Practice designing and optimizing Azure data pipelines.,Refine your PySpark and SQL coding for large datasets.,Build projects demonstrating data modeling and quality.,Study Azure Key Vault and RAG for LLM apps.

Behavioral questions

Describe a complex data pipeline you designed.,How do you ensure data quality and reliability?,Share an experience collaborating with AI/ML teams.,How do you approach optimizing workflow performance?

Frequently asked questions

What specific Azure Data Services are most important for this Data Engineer role at Capgemini?
For this Data Engineer position at Capgemini, strong hands-on experience with Azure Data Factory, Azure Data Lake, and Azure Key Vault is crucial. Familiarity with Cognitive Search is also highly valued.
How important is Databricks experience for this Data Engineer job at Capgemini?
Databricks experience is very important for this role. The job description specifically mentions proficiency in the Databricks ecosystem, including cluster optimization, performance tuning, Delta Lake, PySpark, and workflow orchestration.
Does Capgemini require extensive experience with Generative AI for this Data Engineer position?
While not strictly required, a basic understanding of GenAI concepts like RAG and experience generating embeddings is preferred for this Data Engineer role at Capgemini. It's an area where demonstrating familiarity can set you apart.
What kind of data transformation and processing skills are Capgemini looking for in a Data Engineer?
Capgemini is looking for strong skills in developing and optimizing PySpark and SQL workflows for data transformation and orchestration. Expertise in Python/PySpark for large-scale data processing is also essential.
Are real-time data processing systems a key requirement for the Data Engineer role at Capgemini?
Yes, experience with real-time systems like Event Hubs, Apache Kafka, and Spark-Streaming is a key requirement mentioned in the job description for this Data Engineer position at Capgemini.
What are the expected benefits of working as a Data Engineer at Capgemini regarding work-life balance?
Capgemini emphasizes flexible work arrangements, including remote work and flexible hours, to support a healthy work-life balance for their Data Engineers. They offer an environment conducive to maintaining personal well-being.
Does Capgemini offer opportunities for career growth and development for Data Engineers?
Absolutely. Capgemini highlights career growth as a core mission, offering a range of programs and diverse professional paths to help Data Engineers explore opportunities and develop their careers.
Is certification in Generative AI encouraged for Data Engineers at Capgemini?
Yes, Capgemini encourages its employees to equip themselves with valuable certifications in the latest technologies, including Generative AI, supporting continuous learning and skill development for Data Engineers.