
Lead Systems Engineer - Data DevOps/MLOps
EPAM Systems · Chennai, Tamil Nadu, India
- On site
- Full-time
- $150,000 / year
- Chennai, Tamil Nadu, India
Job highlights
- Lead systems engineering for data and ML operations.
- Design and manage CI/CD pipelines for ML models.
- Build cloud infrastructure for ML processing and serving.
- Automate data validation, transformation, and orchestration.
- Collaborate with data scientists and engineers on production ML.
About the role
Lead Systems Engineer Data DevOps MLOps
We are seeking a skilled and passionate Lead Systems Engineer with Data DevOps/MLOps expertise to drive innovation and efficiency across our data and machine learning operations.
Responsibilities
- Design, deploy, and manage CI/CD pipelines for seamless data integration and ML model deployment
- Establish robust infrastructure for processing, training, and serving machine learning models using cloud-based solutions
- Automate critical workflows such as data validation, transformation, and orchestration for streamlined operations
- Collaborate with cross-functional teams, including data scientists and engineers, to integrate ML solutions into production environments
- Improve model serving, performance monitoring, and reliability in production ecosystems
- Ensure data versioning, lineage tracking, and reproducibility across ML experiments and workflows
- Identify and implement opportunities to improve scalability, efficiency, and resilience of the infrastructure
- Enforce rigorous security measures to safeguard data and ensure compliance with relevant regulations
- Debug and resolve technical issues in data pipelines and ML deployment workflows
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
- 8+ years of experience in Data DevOps, MLOps, or related disciplines
- Expertise in cloud platforms such as Azure, AWS, or GCP
- Skills in Infrastructure as Code tools like Terraform, CloudFormation, or Ansible
- Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
- Hands-on experience with data processing frameworks including Apache Spark and Databricks
- Proficiency in Python with familiarity with libraries including Pandas, TensorFlow, and PyTorch
- Knowledge of CI/CD tools such as Jenkins, GitLab CI/CD, and GitHub Actions
- Experience with version control systems and MLOps platforms including Git, MLflow, and Kubeflow
- Understanding of monitoring and alerting tools like Prometheus and Grafana
- Strong problem-solving and independent decision-making capabilities
- Effective communication and technical documentation skills
Nice to have
- Background in DataOps methodologies and tools such as Airflow or dbt
- Knowledge of data governance platforms like Collibra
- Familiarity with Big Data technologies such as Hadoop or Hive
- Showcase of certifications in cloud platforms or data engineering tools
Key skills/competency
- Data DevOps
- MLOps
- Cloud Platforms (Azure, AWS, GCP)
- Infrastructure as Code (Terraform, CloudFormation, Ansible)
- Containerization (Docker, Kubernetes)
- Data Processing (Spark, Databricks)
- Python
- CI/CD (Jenkins, GitLab CI/CD, GitHub Actions)
- Version Control (Git)
- MLOps Platforms (MLflow, Kubeflow)
Skills & topics
- Lead Systems Engineer
- Data DevOps
- MLOps
- Cloud Engineering
- CI/CD
- Python
- Docker
- Kubernetes
- Apache Spark
- AWS
- Azure
- GCP
- Terraform
- Ansible
- Jenkins
- GitLab CI/CD
- GitHub Actions
- MLflow
- Kubeflow
- Databricks
- TensorFlow
- PyTorch
- Prometheus
- Grafana
- Airflow
- dbt
- Collibra
- Hadoop
- Hive
- Systems Engineering
How to get hired
- Tailor your resume: Highlight your experience with Data DevOps, MLOps, cloud platforms (Azure, AWS, GCP), and specific tools like Terraform, Docker, Kubernetes, Spark, and Python.
- Showcase your skills: Quantify your achievements in designing CI/CD pipelines, automating workflows, and managing ML infrastructure.
- Prepare for technical questions: Be ready to discuss your experience with data processing, containerization, and cloud-native solutions.
- Demonstrate problem-solving: Prepare examples of how you've debugged complex issues in data pipelines and ML deployments.
- Research EPAM Systems: Understand their work in digital transformation and how your skills align with their client projects.
Technical preparation
Master CI/CD concepts and tools.,Gain expertise in cloud platforms (AWS, Azure, GCP).,Practice Infrastructure as Code (Terraform, Ansible).,Build projects with Docker and Kubernetes.
Behavioral questions
Describe a complex data pipeline issue you resolved.,How do you collaborate with data scientists?,Share an experience automating critical workflows.,How do you ensure ML model reliability in production?
Frequently asked questions
- What are the key MLOps tools used at EPAM Systems for this Lead Systems Engineer role?
- For the Lead Systems Engineer Data DevOps/MLOps position at EPAM Systems, proficiency in MLOps tools such as MLflow and Kubeflow is highly valued. Experience with CI/CD tools like Jenkins, GitLab CI/CD, and GitHub Actions is also crucial for managing the ML model lifecycle.
- What cloud platforms are most relevant for the Lead Systems Engineer role at EPAM Systems?
- EPAM Systems leverages major cloud platforms for its operations. For this Lead Systems Engineer role, expertise in Azure, AWS, or GCP is essential for designing and managing the cloud-based infrastructure required for data processing and ML model deployment.
- How important is Python programming for the Lead Systems Engineer Data DevOps/MLOps position at EPAM Systems?
- Python is a core requirement for the Lead Systems Engineer Data DevOps/MLOps role at EPAM Systems. Proficiency in Python, along with familiarity with data science libraries like Pandas, TensorFlow, and PyTorch, is critical for developing and managing data pipelines and ML workflows.
- What level of experience is expected for the Lead Systems Engineer - Data DevOps/MLOps role at EPAM Systems?
- EPAM Systems is looking for a Lead Systems Engineer with at least 8 years of experience in Data DevOps, MLOps, or closely related fields. A Bachelor's or Master's degree in Computer Science, Data Engineering, or a similar discipline is also required.
- What are the primary responsibilities of a Lead Systems Engineer at EPAM Systems focusing on Data DevOps/MLOps?
- As a Lead Systems Engineer at EPAM Systems, you will be responsible for designing and managing CI/CD pipelines, establishing robust cloud infrastructure for ML, automating data workflows, and collaborating with cross-functional teams to deploy ML solutions into production environments.