4 days ago

ML Ops Engineer

Achievers

On Site
Full Time
CA$126,000
Toronto, ON

Job Overview

Job TitleML Ops Engineer
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered SalaryCA$126,000
LocationToronto, ON

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Achievers' Data Science Team

Our Data Science team is a highly motivated and curious group, spearheading Achievers' efforts to build products powered by AI. They thrive on solving complex problems associated with building at scale. We foster a flexible environment where team members have the opportunity to shape the work and the craft.

The Opportunity: ML Ops Engineer

We are seeking a skilled and driven ML Ops Engineer to support the full operational lifecycle of both traditional machine learning systems and emerging generative AI-driven applications. This role encompasses infrastructure, automation, quality, and reliability engineering. A key focus is enabling scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads, including managing model upgrades, framework versions, regression testing, maintenance tasks, and maintaining performance across systems and solutions.

Why You'll Love This Role at Achievers

  • Lead high-impact initiatives that shape how millions of people experience work around the world.
  • Bring your unique perspective to complex and challenging projects, applying your expertise in data science, influencing technical direction, and sharing knowledge.
  • Join a close-knit, no-ego, high-performing team that solves meaningful problems and celebrates successes together.
  • Work alongside an experienced leadership team genuinely invested in your career growth.
  • Thrive in a fast-paced, high-growth environment where innovation is encouraged and your voice truly matters.

How You'll Shape ML Ops at Achievers

This role involves extensive work with Google Cloud’s AI/ML ecosystem, including Vertex AI (ML and GenAI), managed pipelines, vector databases, embeddings workflows, and model optimization tools.

Model Deployment & Serving (ML + GenAI)
  • Deploy and operate ML models and LLMs using Vertex AI, Cloud Run, and GKE.
  • Automate packaging, versioning, and release of models, prompts, embeddings, and related artifacts.
  • Design scalable inference architectures (sync, async, agentic), including batching and GPU/TPU autoscaling.
Pipeline Engineering & Automation
  • Build and maintain ML and GenAI workflows using Vertex AI Pipelines, Cloud Composer (Airflow), or custom orchestration.
  • Implement CI/CD for ML code and GenAI artifacts (prompts, fine-tuned models, evaluation suites).
  • Add automated validation for data quality, model performance, regression, and LLM evaluation metrics.
  • Implement quality gates in production pipelines, imagining and implementing tests that will gate deployment changes and identify production issues.
  • Schedule retraining, re-embedding, and re-indexing to ensure model freshness.
GenAIOps & Artifact Lifecycle
  • Manage and version prompts, system instructions, RAG components, and agent workflows.
  • Operationalize fine-tuned or custom models using Vertex AI tuning capabilities.
  • Implement safety guardrails, filtering, and approval workflows for generative systems.
  • Enable experimentation across prompts, models, and RAG strategies.
Cloud Infrastructure & Reliability
  • Build scalable training and inference environments using GCP services (Vertex AI, BigQuery ML, Dataflow/Dataproc, Cloud Storage, Cloud Run/GKE).
  • Manage infrastructure as code using Terraform or Deployment Manager.
  • Apply cost optimization, reliability, and scaling best practices.
Observability, Monitoring & Governance
  • Monitor model, data, and embedding drift.
  • Track LLM-specific metrics (latency, cost, prompt performance, safety triggers).
  • Implement logging, lineage, and metadata using Vertex ML Metadata and Cloud Logging.
  • Embed AI governance controls (explainability, bias, performance, data usage).
  • Support audit-ready workflows with model cards, prompt cards, and evaluation documentation.
  • Align operational practices with emerging external AI regulations and frameworks (e.g., responsible AI, model risk management, audit readiness).
  • Partner with security, legal, privacy, and risk teams to operationalize AI governance without slowing experimentation.
Cross-Functional Collaboration
  • Partner with data scientists, GenAI engineers, product managers, and engineers to deliver production-ready ML systems.
  • Promote best practices for reliable, scalable, and governed ML and GenAI operations.

Experience That Will Set You Up for Success

  • Experience in ML Ops, ML platform engineering, or cloud-based AI infrastructure.
  • Strong hands-on cloud experience (GCP preferred but not required), especially Vertex AI (ML & GenAI), BigQuery/BigQuery ML, Cloud Run or GKE, and Cloud Composer.
  • Strong Python skills with experience in testing, CI/CD, containerization, and infrastructure automation (Terraform).
  • Experience with LLM workflows: embeddings, vector databases, prompt engineering, and evaluation.
  • Exposure to agentic workflows and frameworks such as MCP.
  • Familiarity with Vertex AI Model Garden, tuning, monitoring, and vector search technologies.
  • Exposure to LLM safety, moderation, or red-teaming workflows.

Soft Skills

  • Strong communication and cross-functional collaboration skills.
  • Detail-oriented, reliability-focused mindset.
  • Comfortable working in fast-evolving environments.
  • Strong sense of ownership and accountability.

Key skills/competency

  • GCP
  • Vertex AI
  • Python
  • MLOps
  • LLMs
  • CI/CD
  • Terraform
  • Data Science
  • Machine Learning
  • Cloud Infrastructure

Tags:

ML Ops Engineer
ML lifecycle
model deployment
pipeline automation
GenAIOps
cloud infrastructure
observability
AI governance
cross-functional collaboration
reliability engineering
scalable training
Google Cloud Platform
Vertex AI
Python
Terraform
Kubernetes
BigQuery
Apache Airflow
CI/CD
Vector Databases
LLMs

Share Job:

How to Get Hired at Achievers

  • Research Achievers' culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Highlight ML Ops, GCP, Python, and GenAI experience for the ML Ops Engineer role at Achievers.
  • Showcase your impact: Prepare examples of how your work drove scalable, reliable ML/GenAI solutions.
  • Understand their products: Familiarize yourself with Achievers' recognition platform and how AI enhances it.
  • Prepare for technical depth: Be ready to discuss Vertex AI, CI/CD, and LLM operationalization challenges.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background