
Data Engineer
SourcingXPress · Bengaluru, Karnataka, India
- On site
- Full-time
- ₹1,700,000 / year
- Bengaluru, Karnataka, India
Job highlights
- Build scalable data pipelines with PySpark.
- Develop high-performance batch and real-time systems.
- Transform complex data from various sources.
- Optimize SQL queries and data transformations.
- Collaborate on data-driven applications.
About the role
PySpark Data Engineer at IGS GLOBAL
We are seeking a talented PySpark Data Engineer to join our team. In this role, you will be responsible for designing, building, and optimizing scalable data pipelines and distributed data processing systems. You will work with high-performance ETL workflows using Spark and cloud-native technologies to ensure reliable and efficient data delivery.
This is a hands-on individual contributor position focused on developing robust data products within a fast-paced, engineering-driven environment.
Key Responsibilities
- Design, build, and maintain scalable ETL pipelines using PySpark.
- Develop high-performance batch and real-time data processing systems.
- Extract, transform, and load data from multiple sources (databases, APIs, files).
- Work extensively with large and complex datasets, including nested JSON structures.
- Build efficient data transformation logic using Python, Spark, NumPy, and Pandas.
- Write and optimize complex SQL queries for data processing and analytics.
- Collaborate with data scientists, analysts, and engineers to support data-driven applications.
- Ensure data quality, consistency, and performance across pipelines.
- Optimize Spark jobs for scalability and performance.
Must-Have Skills
- Strong hands-on experience with PySpark (mandatory).
- Advanced Python skills with NumPy and Pandas.
- Strong SQL skills (complex queries, joins, transformations).
- Experience building data pipelines and ETL workflows.
- Ability to handle complex data transformations and nested JSON structures.
- Experience integrating data from APIs, databases, and flat files.
- Strong problem-solving skills in data processing and manipulation.
Good-to-Have Skills
- Experience with Apache Airflow or similar workflow orchestration tools.
- Familiarity with AWS or GCP cloud platforms.
- Knowledge of modern data lake technologies (e.g., Iceberg, Delta Lake).
- Experience working in startup or fast-paced product environments.
- Exposure to distributed systems and large-scale data platforms.
Ideal Candidate Profile
We are looking for engineers who are:
- Strong in hands-on coding (not just theoretical knowledge).
- Passionate about data engineering and scalable systems.
- Comfortable working in fast-paced, high-ownership environments.
- Highly analytical, detail-oriented, and problem-solving driven.
- Proactive, energetic, and committed to delivering high-quality data solutions.
Key skills/competency
- PySpark
- Data Engineering
- ETL
- Python
- SQL
- Data Pipelines
- Spark
- NumPy
- Pandas
- Cloud Technologies
Skills & topics
- PySpark Data Engineer
- Data Engineering
- ETL
- Python
- Spark
- SQL
- Data Pipelines
- Cloud Data
- Big Data
- Information Technology
How to get hired
- Customize your resume: Highlight your PySpark, Python, SQL, and ETL experience, aligning it with the job description's requirements.
- Showcase coding proficiency: Emphasize your hands-on coding projects and contributions to scalable data systems.
- Demonstrate problem-solving: Prepare examples of how you handled complex data transformations and optimizations.
- Research IGS GLOBAL: Understand their B2B product/service focus and Series C funding stage to tailor your application.
- Prepare for technical interviews: Be ready to discuss your experience with PySpark, data pipelines, and SQL query optimization.
Technical preparation
Practice PySpark coding and Spark optimizations.,Build sample ETL pipelines with Python and SQL.,Work with nested JSON data structures.,Optimize complex SQL queries for performance.
Behavioral questions
Describe a challenging data pipeline problem.,How do you ensure data quality and consistency?,Tell me about a time you worked in a fast-paced environment.,How do you approach problem-solving for complex data issues?
Frequently asked questions
- What are the key responsibilities for a PySpark Data Engineer at IGS GLOBAL?
- As a PySpark Data Engineer at IGS GLOBAL, you will design, build, and maintain scalable ETL pipelines, develop high-performance data processing systems, transform complex data from various sources, and optimize Spark jobs. You'll also write SQL queries and collaborate with data scientists and analysts.
- What are the mandatory skills for this PySpark Data Engineer role at IGS GLOBAL?
- The mandatory skills for this role include strong hands-on PySpark experience, advanced Python with NumPy and Pandas, strong SQL skills for complex queries, and experience building data pipelines and ETL workflows. You must also be able to handle complex data transformations and integrate data from diverse sources.
- Does IGS GLOBAL offer opportunities for professional development in cloud technologies?
- Yes, while not mandatory, familiarity with AWS or GCP cloud platforms is considered a good-to-have skill, indicating IGS GLOBAL values and potentially utilizes these technologies. This suggests opportunities to work with or learn more about cloud environments.
- What kind of work environment can I expect as a Data Engineer at IGS GLOBAL?
- IGS GLOBAL is a small/medium business in the Information Technology industry, funded at Series C. They emphasize a fast-paced, engineering-driven environment with high ownership, suitable for proactive and energetic individuals passionate about data engineering and scalable systems.
- How does IGS GLOBAL assess candidates for the PySpark Data Engineer position?
- IGS GLOBAL looks for engineers with strong hands-on coding skills, not just theoretical knowledge. They value passion for data engineering, comfort in fast-paced environments, analytical thinking, problem-solving abilities, and a commitment to delivering high-quality solutions.
- What is the salary range for the PySpark Data Engineer role at IGS GLOBAL?
- The salary range for this PySpark Data Engineer position at IGS GLOBAL is ₹ 14-17 Lacs PA (per annum).
- Is this a remote or on-site position for the PySpark Data Engineer at IGS GLOBAL?
- The job description does not explicitly state the work arrangement (remote, hybrid, or on-site). Given the mention of a 'fast-paced, engineering-driven environment' and 'high-ownership', it might lean towards an on-site or hybrid model, but clarification would be needed.