
AI/ML DevOps Engineer, AS
Deutsche Bank · Pune Division, Maharashtra, India
- On site
- Full-time
- $100,000 / year
- Pune Division, Maharashtra, India
Job highlights
- Operate and support AI/ML systems in production.
- Monitor and troubleshoot AI/ML models proactively.
- Automate AI/ML model deployment pipelines.
- Optimize AI/ML infrastructure for performance and cost.
- Collaborate with data scientists and engineers.
About the role
AI/ML DevOps Engineer
Deutsche Bank's Technology, Data & Innovation (TDI) function is seeking an L2 AI/ML DevOps Engineer to join their global Innovation team. This role will focus on the daily operations and real-time support of AI/ML systems in production, leveraging cutting-edge technologies in financial services innovation.
About the Team
DB Technology is a global team of tech specialists committed to technical excellence and financial services innovation. Our India tech center is a key growing hub, dedicated to building a diverse workforce and providing excellent opportunities for talented engineers.
What We’ll Offer You
- Best-in-class leave policy
- Gender-neutral parental leaves
- 100% reimbursement under childcare assistance
- Sponsorship for industry-relevant certifications and education
- Employee Assistance Program
- Comprehensive Hospitalization Insurance
- Accident and Term life Insurance
- Complementary Health screening
Your Key Responsibilities
- Manage Incident, Service, Problem, and Change Management of Shared AI Platforms.
- Monitor production AI/ML models for performance, latency, accuracy, data drift, and model drift.
- Proactively troubleshoot production issues and collaborate with L3 engineers and data scientists for escalations.
- Automate model packaging, versioning, and rollbacks.
- Optimize resource allocation for cost-effective AI workloads.
- Detect and mitigate data drift affecting model performance.
- Troubleshoot model failures, latency issues, and deployment errors.
- Utilize containerization technologies like Docker to package models and dependencies.
- Develop and maintain CI/CD pipelines for automating testing, integration, and deployment of ML models.
- Implement version control for code and model artifacts.
- Establish monitoring solutions for performance and health of deployed models.
- Set up logging mechanisms for debugging and auditing.
- Optimize ML infrastructure for scalability and cost-effectiveness.
- Implement auto-scaling mechanisms for varying workloads.
- Enforce security best practices and ensure compliance with regulations.
- Oversee management of data pipelines and storage systems.
- Implement data versioning and lineage tracking.
- Collaborate with DevOps teams to align MLOps practices.
- Continuously optimize and fine-tune ML models.
- Identify and address system bottlenecks.
- Maintain clear and comprehensive documentation of MLOps processes and procedures.
Your Skills And Experience
- Excellent communication and presentation skills, highly organized and disciplined.
- Experienced in working with multiple stakeholders and maintaining good business relationships.
- Comfortable working in VUCA environments.
- Expertise required in: Google Cloud (GKE, Terraform, IAM, BigQuery, Cloud Shell, Cloud Storage), AI/ML (AI Agents, ML models, Vertex AI, AutoML, BigQuery ML), MLOps & CI/CD Pipelines, Kubeflow, Vertex AI pipelines.
- Proficiency in designing, deploying, and managing AI agents (e.g., chatbots, virtual assistants).
- Familiarity with GCP Networking, Security concepts, VPC, Load balancers.
- Basic Unix server administration.
- Proficiency in Python, Shell Scripting, SQL.
- Familiarity with fine-tuning and deploying large language models on GCP.
- Understanding of security best practices, data governance, encryption, and AI regulations.
- Experience with GCP Cloud Logging, Cloud Monitoring, and AI Model Performance Tracking.
- 4+ years of IT work experience (AVP – 6+, Associate – 4+).
- Strong problem-solving skills and passion for AI research.
- Good inter-personal skills and ability to collaborate.
Educational Qualifications
- B.E. / B. Tech. / Master’s degree in Computer Science or equivalent.
- Added advantage: GCP Certifications, Kubernetes Certifications, AI/ML Educational background or Certifications.
How We’ll Support You
- Training and development opportunities.
- Coaching and support from experts.
- A culture of continuous learning.
- A range of flexible benefits.
Key skills/competency
- AI/ML DevOps
- Google Cloud Platform (GCP)
- Kubernetes
- CI/CD Pipelines
- Model Monitoring
- Data Drift Detection
- Python
- Shell Scripting
- Docker
- Large Language Models (LLMs)
Skills & topics
- AI/ML DevOps Engineer
- Deutsche Bank
- Google Cloud
- GKE
- Terraform
- AI/ML
- Vertex AI
- MLOps
- CI/CD
- Python
- Pune
- India
- Financial Services
- LLM
- Docker
- Kubernetes
How to get hired
- Tailor your resume: Highlight your 4+ years of IT experience, emphasizing expertise in Google Cloud (GKE, Terraform, BigQuery), AI/ML concepts (Vertex AI, AutoML), and MLOps/CI/CD pipelines. Quantify achievements where possible.
- Showcase GCP & AI/ML skills: Demonstrate proficiency in Python, Shell Scripting, SQL, Docker, and experience with deploying AI agents and LLMs on GCP. Mention any relevant GCP or Kubernetes certifications.
- Address technical requirements: Prepare to discuss your experience with model monitoring, data drift detection, resource optimization, and security best practices within AI/ML systems.
- Highlight collaboration: Emphasize your ability to work with diverse stakeholders, communicate effectively, and collaborate with L3 engineers and data scientists, especially in dynamic environments.
- Prepare for behavioral questions: Be ready to discuss your problem-solving skills, passion for AI research, and ability to work in ambiguous situations.
Technical preparation
Master GCP services: GKE, BigQuery, Vertex AI.,Build and manage CI/CD pipelines for ML models.,Implement Docker for model packaging and deployment.,Practice Python scripting and SQL for data tasks.
Behavioral questions
Describe handling production AI/ML system issues.,Share experience troubleshooting model performance.,Discuss collaborating with data scientists.,Explain managing resources for cost efficiency.
Frequently asked questions
- What specific Google Cloud services are most critical for the AI/ML DevOps Engineer role at Deutsche Bank?
- For the AI/ML DevOps Engineer position at Deutsche Bank, expertise in Google Kubernetes Engine (GKE), Terraform, IAM, BigQuery, Cloud Shell, and Cloud Storage is crucial. Proficiency in Vertex AI, AutoML, and BigQuery ML for AI/ML tasks is also highly valued, alongside experience with Kubeflow and Vertex AI pipelines for MLOps and CI/CD.
- How does Deutsche Bank support professional development for its AI/ML DevOps Engineers?
- Deutsche Bank supports its AI/ML DevOps Engineers through dedicated training and development programs, coaching from experts, and fostering a culture of continuous learning. They also offer sponsorship for industry-relevant certifications and education, along with a range of flexible benefits to aid career progression.
- What is the expected experience level for an AI/ML DevOps Engineer at Deutsche Bank?
- The role requires a minimum of 4 years of IT work experience. For AVP level roles, 6+ years are expected, while Associate level roles typically require 4+ years. A B.E./B.Tech./Master's degree in Computer Science or equivalent is also a standard educational qualification.
- How does Deutsche Bank approach MLOps and CI/CD for AI/ML models?
- Deutsche Bank emphasizes robust MLOps practices and CI/CD pipelines for AI/ML models. This includes automating model packaging, versioning, rollbacks, testing, integration, and deployment. They utilize tools like Kubeflow and Vertex AI pipelines to streamline these processes and ensure efficient model lifecycle management.
- What are the key responsibilities regarding AI/ML model monitoring at Deutsche Bank?
- Key responsibilities include continuously monitoring production AI/ML models for performance, latency, accuracy, data drift, and model drift. Engineers are expected to proactively troubleshoot issues, set up logging and monitoring solutions using GCP tools like Cloud Logging and Cloud Monitoring, and track AI model performance to ensure operational health.
- Does Deutsche Bank require experience with Large Language Models (LLMs) for this role?
- Yes, familiarity with fine-tuning and deploying Large Language Models (LLMs) on GCP is listed as a requirement. Experience with AI agents, such as chatbots and virtual assistants, is also a key aspect of the role.