
AI/ML/LLM Systems Engineer - Enterprise AI Platform Engineer - Relocate to Saudi Arabia, Permanent Expat Relocation Package
aramco · Houston, TX
- On site
- Full-time
- SAR 150,000 / year
- Houston, TX
Job highlights
- Develop enterprise-scale AI platforms for LLMs.
- Deploy and optimize models on NVIDIA SuperPods.
- Build scalable inference pipelines with Kubernetes.
- Integrate vector and relational databases.
- Implement CI/CD for model delivery.
About the role
AI/ML/LLM Systems Engineer - Enterprise AI Platform Engineer
This position requires full relocation to Saudi Arabia. It is a permanent full-time Expat position with an attractive relocation package. Please note only qualified candidates will be contacted. 8+ years of experience in Python/SQL, LLM and AI/ML systems is REQUIRED.
Overview
We are seeking an AI/ML/LLM Systems Engineer to join our Digital & AI Center of Excellence and contribute to the development of enterprise-scale AI platforms that support advanced machine learning and language model inference across Saudi Aramco’s operations. The Digital & AI Center of Excellence is responsible for delivering scalable, secure, and high-performance AI/ML/LLM systems that drive innovation and operational efficiency. In this role, you will design and maintain infrastructure for deploying and optimizing large language models (LLMs) and vision models, hosted on NVIDIA SuperPods/ Cloud and containerized environments. Your primary role is to ensure the efficient and scalable operation of AI models within enterprise platforms. You will be responsible for deploying, monitoring, and optimizing inference workloads, integrating vector and relational databases, and implementing orchestration and DevOps pipelines to support continuous model improvement and delivery.
Duties & Responsibilities
- Deploy and manage LLMs and vision models on NVIDIA SuperPods, Cloud, ensuring high performance and efficient use of GPU resources.
- Build and maintain scalable inference pipelines using Kubernetes (K8s), Docker, and OpenShift for enterprise AI platforms.
- Optimize inference performance through multiple techniques.
- Benchmark and evaluate LLMs for performance, accuracy, latency, and resource utilization across different hardware and software configurations.
- Implement and support LLMOps frameworks with full observability, including logging, tracing, and model performance tracking.
- Integrate and manage vector databases (Elasticsearch) and relational databases (PostgreSQL) for efficient data retrieval and user interaction history tracking.
- Implement and maintain CI/CD (Continuous Integration/Continuous Delivery) pipelines for model and platform updates using Git, Bitbucket, Jenkins, and ArgoCD.
- Ensure high availability and reliability of AI application workflows using frameworks like Haystack.
- Collaborate with infrastructure teams on GPU provisioning and resource allocation for AI workloads.
- Develop and maintain monitoring, alerting, and dashboarding systems for AI/ML workloads to ensure SLA/SLO compliance.
Minimum Requirements
- Hold a Bachelor’s degree in Computer Science, Software Engineering, or a related field.
- Have 8 years of experience in AI/ML systems or cloud-native infrastructure, including at least 4 years in LLM deployment and optimization.
- Proficiency in Python and SQL is required, with experience in building and optimizing AI/ML applications.
- Ability to work with Kubernetes (K8s), Docker, and OpenShift in production environments.
- Experience deploying and optimizing LLMs and vision models on NVIDIA GPU clusters and high-performance computing (HPC) environments and Cloud environment.
- Ability to demonstrate proficiency in inference scaling, distributed computing, and SLA/SLO planning for AI workloads.
- Strong knowledge in Elasticsearch, PostgreSQL, and workflow frameworks like Haystack for AI application development.
- Ability to implement CI/CD pipelines using tools like Git, Bitbucket, Jenkins, and ArgoCD.
- Experience in benchmarking and evaluating LLMs for performance, accuracy, and efficiency is required.
- Monitoring and dashboarding for AI/ML systems is also necessary.
Key Skills/Competency
- Python
- SQL
- LLM Deployment
- AI/ML Systems
- Kubernetes (K8s)
- Docker
- OpenShift
- NVIDIA GPUs
- CI/CD
- Observability
Skills & topics
- AI Engineer
- ML Engineer
- LLM Engineer
- Systems Engineer
- Enterprise AI
- Platform Engineer
- Python
- SQL
- Kubernetes
- NVIDIA GPUs
- Saudi Arabia
- Relocation
How to get hired
- Tailor your resume: Highlight your 8+ years of Python/SQL experience and specific LLM/AI/ML systems expertise, aligning with job requirements.
- Showcase platform skills: Emphasize experience with Kubernetes, Docker, OpenShift, and NVIDIA GPU environments in your application.
- Quantify achievements: Provide examples of optimizing inference performance and deploying LLMs/vision models with measurable results.
- Address relocation: Clearly state your willingness and readiness to relocate to Saudi Arabia.
- Prepare for technical questions: Be ready to discuss LLMOps, CI/CD pipelines, and database integrations.
Technical preparation
Behavioral questions
Frequently asked questions
- What are the key technical skills required for the AI/ML/LLM Systems Engineer role at Aramco?
- The role demands a strong foundation in Python and SQL, extensive experience with AI/ML systems, and specialized knowledge in deploying and optimizing Large Language Models (LLMs). Proficiency in containerization technologies like Kubernetes, Docker, and OpenShift, along with experience on NVIDIA GPU clusters, is crucial. Familiarity with CI/CD tools (Git, Jenkins, ArgoCD), vector databases (Elasticsearch), and relational databases (PostgreSQL) is also essential for this AI/ML/LLM Systems Engineer position.
- Does Aramco provide relocation assistance for this AI/ML/LLM Systems Engineer position?
- Yes, this AI/ML/LLM Systems Engineer position is advertised as a permanent Expat role with an attractive relocation package, requiring full relocation to Saudi Arabia. Aramco supports international hires with the necessary assistance for their move.
- What is the minimum experience required for the AI/ML/LLM Systems Engineer role?
- The minimum requirement for the AI/ML/LLM Systems Engineer role is 8 years of experience in AI/ML systems or cloud-native infrastructure. This must include at least 4 years specifically focused on LLM deployment and optimization. A Bachelor's degree in Computer Science, Software Engineering, or a related field is also mandatory.
- What kind of AI models will I be working with as an AI/ML/LLM Systems Engineer at Aramco?
- As an AI/ML/LLM Systems Engineer at Aramco, you will be responsible for deploying and optimizing Large Language Models (LLMs) and vision models. These models will be hosted on enterprise-scale AI platforms, leveraging NVIDIA SuperPods/Cloud and containerized environments to drive innovation across Saudi Aramco's operations.
- What are the career growth opportunities for an AI/ML/LLM Systems Engineer at Aramco?
- Aramco invests heavily in talent development, offering significant opportunities for professional and technical growth. As an AI/ML/LLM Systems Engineer, you'll work on world-scale projects with cutting-edge technology, supported by extensive workforce development programs designed to enhance sector-specific knowledge and competencies.