
Senior AIOps Solution Architect
EPAM Systems · Pune Division, Maharashtra, India
- On site
- Full-time
- $150,000 / year
- Pune Division, Maharashtra, India
Job highlights
- Design Gen-AI AIOps solutions for enterprises.
- Architect Gen-AI & LLM Engineering solutions.
- Develop MLOps pipelines and ML techniques.
- Implement RAG, Vector DBs, and semantic search.
- Lead Kubernetes and cloud-native initiatives.
About the role
About the Role
We are seeking a highly experienced Senior AIOps Solution Architect with exceptional expertise in Gen-AI-enabled Cloud Engineering, Observability, Operational Intelligence, and AI-driven automation. The ideal candidate will bring 10+ years of enterprise-level architecture experience, with a focus on building innovative Gen-AI-enabled platforms, data-driven automation frameworks, and enterprise-grade AIOps solutions to advance operational efficiency.
Responsibilities
- Design and deliver scalable Gen-AI-powered AIOps solutions for large enterprise platforms to improve MTTR, achieve automated incident resolution, and drive operational excellence.
- Architect and implement Gen-AI & LLM Engineering solutions using tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, and LangChain.
- Develop and optimize MLOps pipelines and model deployment workflows leveraging SageMaker, Azure ML, clustering, topic modeling, and anomaly detection techniques.
- Implement RAG, Vector DBs, and advanced semantic search across platforms using PGVector, Elasticsearch, and Bedrock Knowledge Sources.
- Create and automate solutions for Cloud Platforms and Infrastructure with AWS, Azure, GCP, Terraform, CloudFormation, and Helm, alongside Python and Shell Scripting.
- Lead Kubernetes-based container orchestration and DevSecOps initiatives, including CI/CD pipelines, Istio, and KEDA deployment strategies.
- Design and integrate serverless and cloud-native architectures using API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis.
- Implement end-to-end Observability solutions using DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda.
- Ensure seamless ITSM and ServiceNow integration for AI-driven operations and automation.
- Work with ITSM tools like ServiceNow, Jira Service Management, and Manage Engine to streamline incident management workflows.
- Provide thought leadership in AIOps, automation, and AI-powered operational intelligence to leadership and engineering teams.
Requirements
- 19+ years of overall IT experience.
- 10+ years of professional experience in Enterprise Cloud, Infrastructure Engineering, SRE, Automation, and Architecture roles.
- Proven track record of delivering Gen-AI-powered AIOps solutions in production environments, driving efficiencies like MTTR improvement and operational automation.
- Expertise in Gen-AI and LLM Engineering tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, LangChain, and Bedrock Agents.
- Proficiency in RAG, Vector Databases, and semantic search solutions like PGVector, Elasticsearch, and Bedrock Knowledge Sources.
- Background in MLOps, model development, and machine learning techniques using SageMaker, Azure ML, clustering, topic modeling, and anomaly detection.
- Skills in cloud engineering and automation technologies, including AWS, Azure, GCP, Terraform, CloudFormation, Helm, Python, and Shell Scripting.
- Capability to design and operate Kubernetes-based infrastructure, CI/CD pipelines, security automation, Istio, and KEDA.
- Familiarity with serverless computing and cloud-native tools like API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis.
- Knowledge of Observability platforms such as DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda.
- Understanding of ITSM platforms, including ServiceNow, Jira Service Management, and Manage Engine.
- Showcase of AI and Machine Learning expertise in areas like anomaly detection, GenAI implementation, and agentic AI solutions.
- Ability to communicate effectively in both written and spoken English (B2 level or higher).
Nice to have
- Experience leading AIOps/Cloud Practices or platform engineering organizations.
- Certifications in AWS ML, Cloud Architecture, or AI Leadership.
Key skills/competency
- AIOps Solution Architect
- Gen-AI Cloud Engineering
- Observability
- Operational Intelligence
- AI-driven Automation
- LLM Engineering
- MLOps
- Kubernetes
- Cloud Platforms (AWS, Azure, GCP)
- ITSM (ServiceNow)
Skills & topics
- AIOps
- Solution Architect
- GenAI
- Cloud Engineering
- Observability
- Automation
- LLM
- MLOps
- Kubernetes
- AWS
- Azure
- GCP
- ServiceNow
- SRE
- Enterprise Architecture
- EPAM Systems
How to get hired
- Tailor your resume: Highlight your 10+ years of enterprise architecture and Gen-AI AIOps experience, focusing on MTTR improvement and automation.
- Showcase technical expertise: Emphasize your proficiency with Gen-AI tools (Bedrock, Azure OpenAI), MLOps, cloud platforms (AWS, Azure, GCP), Kubernetes, and Observability tools.
- Quantify achievements: Provide specific examples of AIOps solutions you've delivered and their impact on operational efficiency and incident resolution.
- Prepare for technical interviews: Be ready to discuss complex architectural designs, cloud-native solutions, and AI/ML concepts.
- Demonstrate leadership: Highlight experience in thought leadership and communication with stakeholders regarding AIOps and automation strategies.
Technical preparation
Master Gen-AI/LLM tools: Bedrock, Azure OpenAI.,Deepen cloud skills: AWS, Azure, GCP, Terraform.,Practice Kubernetes and container orchestration.,Study Observability and ITSM tools integration.
Behavioral questions
Describe a complex AIOps solution you designed.,How do you drive operational efficiency with AI?,Explain your experience with GenAI in production.,How do you communicate technical strategies to leadership?
Frequently asked questions
- What are the key Gen-AI tools I should highlight for the Senior AIOps Solution Architect role at EPAM Systems?
- For the Senior AIOps Solution Architect position at EPAM Systems, it's crucial to highlight your experience with Gen-AI and LLM Engineering tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, and LangChain. Demonstrating proficiency in these specific technologies will align well with the role's requirements.
- How important is experience with Observability platforms for this EPAM Systems AIOps role?
- Experience with Observability platforms is highly important for this Senior AIOps Solution Architect role at EPAM Systems. The job description specifically mentions needing knowledge of tools like DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda. Showcase your ability to implement end-to-end observability solutions.
- What specific cloud platforms and automation technologies are most relevant for the Senior AIOps Solution Architect position at EPAM?
- For the Senior AIOps Solution Architect role at EPAM Systems, your skills in major cloud platforms like AWS, Azure, and GCP are essential. Additionally, demonstrate expertise in automation technologies such as Terraform, CloudFormation, Helm, Python, and Shell Scripting, as these are core to building and managing cloud infrastructure.
- Does EPAM Systems require specific ITSM experience for the Senior AIOps Solution Architect role?
- Yes, EPAM Systems requires understanding of ITSM platforms for this Senior AIOps Solution Architect role. Proficiency with tools like ServiceNow, Jira Service Management, and Manage Engine is important for streamlining incident management workflows and ensuring seamless integration for AI-driven operations.
- What is the minimum experience level EPAM Systems looks for in a Senior AIOps Solution Architect?
- EPAM Systems requires a significant amount of experience for the Senior AIOps Solution Architect role. You'll need 19+ years of overall IT experience, with at least 10+ years specifically in Enterprise Cloud, Infrastructure Engineering, SRE, Automation, and Architecture roles. A proven track record in delivering Gen-AI-powered AIOps solutions is also a must.
- Are there specific MLOps or machine learning techniques that are critical for the Senior AIOps Solution Architect job at EPAM Systems?
- Yes, for the Senior AIOps Solution Architect position at EPAM Systems, a strong background in MLOps, model development, and machine learning techniques is critical. Experience with SageMaker, Azure ML, clustering, topic modeling, and anomaly detection will be highly valued.