Senior Data Engineer
Discovered Labs
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About Discovered Labs
At Discovered Labs, we partner with $10M - $50M ARR companies to significantly boost their leads, users, and customers from platforms like Google, Bing, and AI assistants such as ChatGPT, Claude, and Perplexity.
Our marketing approach mirrors engineering systems: data-driven insights, continuous feedback loops, and measurable outcomes guiding every decision. We design workflows to eliminate manual bottlenecks and ensure compounding returns over time.
High-level Overview Of Our Approach
- Data-driven automation: We treat marketing programs as products, instrumenting everything, automating repetitive tasks, and focusing human effort on high-leverage problems.
- First principles thinking: We delve into the fundamental mechanics of how search and AI systems operate, then build solutions from that core understanding, rather than just copying others.
- Full-stack ownership: SEO and AEO are rarely isolated tasks. We engage across the entire funnel and multiple surface areas to ensure we own client outcomes and drive success.
The Team
We are a deeply technical team striving to be the SpaceX of the AEO & SEO space. You'll collaborate with engineers who have developed fraud engines for Stripe, Plaid, and Coinbase; built self-driving car systems at Aurora; and conducted AI research at Stanford. Our flat structure means you'll work directly with founders deeply knowledgeable in architecture, code, and product, without layers of management.
This Role
As a Senior Data Engineer, you will own the critical data infrastructure that powers automated reporting, AI visibility monitoring, competitive intelligence, and proactive alerting for a growing multi-tenant client base.
The primary challenge in this role lies in operational complexity rather than petabyte-scale volume. We manage numerous clients, each with distinct data sources, varied schemas, different API rate limits, diverse failure modes, and unique freshness requirements. Your solutions must ensure fault isolation, graceful degradation, and per-tenant reliability are built in from the outset, preventing a single failure from impacting others.
This is largely a greenfield opportunity. You will be responsible for building out comprehensive monitoring, observability, data quality layers, and robust pipeline orchestration systems.
You will report directly to the CTO and work closely with product engineers who consume your data layer for feature development. Together, you'll define interfaces and data contracts. There is no separate platform team; you will own your infrastructure, CI/CD, and monitoring end-to-end.
What You'll Do
- Design and implement multi-tenant data infrastructure, covering ingestion, validation, and transformation across diverse data sources. Focus on fault isolation, schema variation, and graceful handling of upstream failures.
- Integrate with third-party APIs, building robust and resilient connectors that manage complex authentication flows, rate limits, pagination quirks, and breaking changes across many client accounts.
- Develop and maintain data quality systems, including automated checks on distributions, volumes, null rates, and freshness. Implement statistical validation beyond just schema validation to ensure bad data doesn't propagate downstream.
- Establish comprehensive data observability, including freshness monitoring, volume anomaly detection, schema drift detection, lineage tracking, and blast radius analysis. You'll ensure systems verify data correctness, not just code execution.
- Design effective alerting systems. This involves threshold tuning, noise reduction, and strategies to prevent alert fatigue, with Mean Time to Detection as a core metric.
- Define and implement freshness SLAs per data source, building the necessary infrastructure to meet them and proactively alerting before breaches occur.
- Develop event-driven trigger infrastructure to surface performance changes, quality regressions, and freshness violations for consumption by downstream systems.
- Design entity data models for client, competitor, and content entities, owning schema evolution and ensuring backward compatibility.
- Manage the operational environment, including CI/CD, containers, deployment pipelines, and credential management, ensuring every deploy passes CI before reaching production.
The Ideal Person for This Role
- A builder who ships: You prioritize getting working systems into production over endless planning or excessive polish, with a track record of building reliable data infrastructure.
- An operator, not just an architect: You don't just design systems; you run them, finding satisfaction in making them consistently reliable, not just functional once.
- An owner: You take full responsibility for outcomes, not just completing tasks. When a pipeline breaks at 3 AM, you fix it and implement measures to prevent recurrence.
- Humble and curious: You acknowledge gaps in your knowledge, ask insightful questions, and possess a genuine desire to learn, viewing feedback as an opportunity for growth.
- A first-principles thinker: You understand the underlying 'why' behind system mechanics, capable of diving deep into schema decisions, validation strategies, and architecture tradeoffs.
- Always improving: You are not content with 'good enough,' actively seeking ways to enhance your craft and improve systems over time.
Requirements
- 4+ years in data engineering, platform engineering, or infrastructure-heavy backend work.
- Proficiency in Python, SQL, and pipeline orchestration tools (e.g., Airflow, Dagster, Prefect).
- Experience with event-driven architectures or real-time data processing.
- Proven ability to integrate with third-party APIs, building resilient connectors that handle authentication flows, rate limits, pagination, and breaking changes in a production environment.
- Strong understanding of pipeline fundamentals: idempotent pipelines, backfill strategies, and graceful schema evolution in production.
- Experience implementing data quality systems in production, including automated checks on distributions, volumes, freshness, and null rates.
- Expertise in data observability: freshness monitoring, anomaly detection, lineage tracking, and blast radius analysis.
- Demonstrated ability in alerting design: threshold tuning, noise reduction, and establishing effective escalation paths, with a focus on minimizing false positives and missed detections.
- Capability to own your infrastructure, including containers, CI/CD, deployment pipelines, monitoring, and credential management, without relying on a separate platform team.
- Experience with multi-tenant or multi-client data systems, including tenant isolation, per-client configuration, and managing operational overhead at scale.
- Experience building APIs or service layers for data exposure that other systems consume.
- Strong collaborative skills, working closely with product engineers to define data contracts and interfaces, communicating tradeoffs clearly in writing, documenting decisions, and writing clear specifications.
Preferred Qualifications
- Experience with marketing or analytics data (e.g., GA4, GSC, SEO tools).
- Prior experience at a fast-moving startup.
What's in It for You
- Fully remote position.
- Direct collaboration with the CTO on high-impact projects.
- High ownership and autonomy with no micromanagement.
- First-hand exposure to cutting-edge AI and search technology.
- Your work will directly impact the performance of well-known ($10M+ ARR) companies.
- Join a fast-growing company at the intersection of AI and marketing.
Our Hiring Process
- Application
- Take-Home Project
- Technical Deep Dive
- Leadership Interview
- Reference Checks
Key skills/competency
- Data Engineering
- Multi-tenant Architecture
- API Integration
- Data Quality
- Data Observability
- Pipeline Orchestration
- Python
- SQL
- CI/CD
- Fault Tolerance
How to Get Hired at Discovered Labs
- Research Discovered Labs' culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, focusing on their engineering-first, data-driven approach to AI/SEO.
- Tailor your resume strategically: Highlight direct experience with multi-tenant data systems, complex API integrations, data quality, and observability, emphasizing Python and SQL proficiency.
- Prepare for the take-home project: Showcase your ability to build, operate, and ship robust data infrastructure, demonstrating a first-principles thinking approach.
- Master technical deep dives: Be ready to discuss system architecture, pipeline fundamentals, fault tolerance, and data quality design with founders and senior engineers.
- Demonstrate ownership and curiosity: Emphasize your commitment to reliable systems, proactive problem-solving, and continuous learning, aligning with Discovered Labs' builder-operator mindset.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background