2 days ago
Data Scientist, Private AI/Big Data
Apoddo
Hybrid
Full Time
$160,000
Hybrid
Job Overview
Job TitleData Scientist, Private AI/Big Data
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$160,000
LocationHybrid
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Data Scientist, Private AI/Big Data at Apoddo
We’re looking for a Data Scientist to work side-by-side with a Senior AI Engineer building AI agents inside a private-infrastructure SaaS platform that works with 100,000 records on a daily basis.
What You’ll Do
- Define Golden Record logic: Specify how to resolve conflicts within a duplicate cluster (which record is correct, which is stale/erroneous, and what should be merged).
- Design entity resolution strategies: Build and evaluate matching approaches (rules + ML), including blocking/candidate generation, pair scoring, and cluster-level decisions.
- Build labeling + evaluation pipelines: Create ground-truth datasets and annotation guidelines; measure precision/recall, cluster purity, and “bad merge” risk.
- Develop features & signals: Engineer features from structured data (names, addresses, timestamps, metadata, relationships across tables) and from embeddings/vector search signals.
- Reduce risk with confidence + explainability: Provide confidence scoring, evidence summaries, and decision reasoning to support human-in-the-loop verification.
- Partner with AI Engineering: Collaborate on agent prompts/workflows, retrieval strategies, and safe execution checkpoints before UPDATE/DELETE operations.
- Monitor production quality: Define metrics, drift detection, and feedback loops so the system improves over time.
Key Tech & Data Environment
- Datastores: PostgreSQL, Elasticsearch (vector/kNN search)
- AI Workflows: LangGraph (stateful reasoning), agent orchestration patterns
- Models (Private): Llama 3.x / Mistral via vLLM or Ollama
- Infra: Kubernetes, Docker, private AWS/Azure environments
Who You Are
- 3+ years (or equivalent) experience as a Data Scientist / ML Engineer / Applied Scientist working with real-world messy data.
- Strong grasp of entity resolution / record linkage / deduplication concepts (matching, blocking, clustering, merge strategies).
- Comfortable with SQL and large-scale data analysis; can navigate multi-table relational schemas confidently.
- Experience designing evaluation frameworks and datasets (labeling strategy, sampling, measuring trade-offs).
- Pragmatic mindset: you care about false merges as much as false negatives, and you know how to reduce operational risk.
- Strong communication skills—able to turn ambiguous data issues into clear rules, metrics, and experiments.
Nice to Have
- Experience with customer/account/event data domains (CRM, transactions, identity, behavioral events).
- Familiarity with vector search, embeddings, or retrieval-based matching.
- Experience collaborating on LLM/agent systems (RAG, tool use, human-in-the-loop validation).
- Background in data quality, MDM, or data governance.
Key skills/competency
- Entity Resolution
- Machine Learning
- Data Quality
- Feature Engineering
- Evaluation Frameworks
- LLM/AI Agents
- Big Data Analysis
- SQL
- Production Monitoring
- Cloud Infrastructure
How to Get Hired at Apoddo
- Research Apoddo's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight experience in entity resolution, ML pipeline development, and big data processing, aligning with Apoddo's focus.
- Showcase relevant projects: Present portfolio work demonstrating your ability to handle messy, real-world data and implement AI/ML solutions.
- Master technical concepts: Prepare for in-depth questions on SQL, evaluation frameworks, vector search, and LLM agent orchestration.
- Prepare for behavioral questions: Emphasize your pragmatic mindset, risk reduction strategies, and strong communication skills for technical discussions.
Frequently Asked Questions
Find answers to common questions about this job opportunity
01What specific AI agents will a Data Scientist work on at Apoddo?
02How does Apoddo manage data privacy within its AI infrastructure for the Data Scientist role?
03What is Apoddo's approach to human-in-the-loop verification for AI decisions?
04Can you describe a typical data lifecycle for a Data Scientist at Apoddo?
05What tools does Apoddo use for production quality monitoring and drift detection in this role?
06What kind of 'messy' data challenges will I encounter as a Data Scientist at Apoddo?
07How does Apoddo support professional growth for its Data Scientists?
Explore similar opportunities that match your background