2 days ago

Data Scientist, Private AI/Big Data

Apoddo

Hybrid
Full Time
$160,000
Hybrid

Job Overview

Job TitleData Scientist, Private AI/Big Data
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$160,000
LocationHybrid

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Data Scientist, Private AI/Big Data at Apoddo

We’re looking for a Data Scientist to work side-by-side with a Senior AI Engineer building AI agents inside a private-infrastructure SaaS platform that works with 100,000 records on a daily basis.

What You’ll Do

  • Define Golden Record logic: Specify how to resolve conflicts within a duplicate cluster (which record is correct, which is stale/erroneous, and what should be merged).
  • Design entity resolution strategies: Build and evaluate matching approaches (rules + ML), including blocking/candidate generation, pair scoring, and cluster-level decisions.
  • Build labeling + evaluation pipelines: Create ground-truth datasets and annotation guidelines; measure precision/recall, cluster purity, and “bad merge” risk.
  • Develop features & signals: Engineer features from structured data (names, addresses, timestamps, metadata, relationships across tables) and from embeddings/vector search signals.
  • Reduce risk with confidence + explainability: Provide confidence scoring, evidence summaries, and decision reasoning to support human-in-the-loop verification.
  • Partner with AI Engineering: Collaborate on agent prompts/workflows, retrieval strategies, and safe execution checkpoints before UPDATE/DELETE operations.
  • Monitor production quality: Define metrics, drift detection, and feedback loops so the system improves over time.

Key Tech & Data Environment

  • Datastores: PostgreSQL, Elasticsearch (vector/kNN search)
  • AI Workflows: LangGraph (stateful reasoning), agent orchestration patterns
  • Models (Private): Llama 3.x / Mistral via vLLM or Ollama
  • Infra: Kubernetes, Docker, private AWS/Azure environments

Who You Are

  • 3+ years (or equivalent) experience as a Data Scientist / ML Engineer / Applied Scientist working with real-world messy data.
  • Strong grasp of entity resolution / record linkage / deduplication concepts (matching, blocking, clustering, merge strategies).
  • Comfortable with SQL and large-scale data analysis; can navigate multi-table relational schemas confidently.
  • Experience designing evaluation frameworks and datasets (labeling strategy, sampling, measuring trade-offs).
  • Pragmatic mindset: you care about false merges as much as false negatives, and you know how to reduce operational risk.
  • Strong communication skills—able to turn ambiguous data issues into clear rules, metrics, and experiments.

Nice to Have

  • Experience with customer/account/event data domains (CRM, transactions, identity, behavioral events).
  • Familiarity with vector search, embeddings, or retrieval-based matching.
  • Experience collaborating on LLM/agent systems (RAG, tool use, human-in-the-loop validation).
  • Background in data quality, MDM, or data governance.

Key skills/competency

  • Entity Resolution
  • Machine Learning
  • Data Quality
  • Feature Engineering
  • Evaluation Frameworks
  • LLM/AI Agents
  • Big Data Analysis
  • SQL
  • Production Monitoring
  • Cloud Infrastructure

Tags:

Data Scientist
Entity Resolution
Machine Learning
Data Quality
Feature Engineering
Evaluation
LLM
Big Data
SQL
Production Monitoring
AI Engineering
PostgreSQL
Elasticsearch
LangGraph
Llama
Mistral
Kubernetes
Docker
AWS
Azure
Vector Search

Share Job:

How to Get Hired at Apoddo

  • Research Apoddo's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Highlight experience in entity resolution, ML pipeline development, and big data processing, aligning with Apoddo's focus.
  • Showcase relevant projects: Present portfolio work demonstrating your ability to handle messy, real-world data and implement AI/ML solutions.
  • Master technical concepts: Prepare for in-depth questions on SQL, evaluation frameworks, vector search, and LLM agent orchestration.
  • Prepare for behavioral questions: Emphasize your pragmatic mindset, risk reduction strategies, and strong communication skills for technical discussions.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background