7 hours ago

Senior Data Engineer

Discovered Labs

Remote

Full Time

$180,000

Remote

Job Overview

Job TitleSenior Data Engineer

Job TypeFull Time

Offered Salary$180,000

LocationRemote

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

About Discovered Labs

At Discovered Labs, we partner with $10M - $50M ARR companies to significantly boost their leads, users, and customers from platforms like Google, Bing, and AI assistants such as ChatGPT, Claude, and Perplexity.

Our marketing approach mirrors engineering systems: data-driven insights, continuous feedback loops, and measurable outcomes guiding every decision. We design workflows to eliminate manual bottlenecks and ensure compounding returns over time.

High-level Overview Of Our Approach

Data-driven automation: We treat marketing programs as products, instrumenting everything, automating repetitive tasks, and focusing human effort on high-leverage problems.
First principles thinking: We delve into the fundamental mechanics of how search and AI systems operate, then build solutions from that core understanding, rather than just copying others.
Full-stack ownership: SEO and AEO are rarely isolated tasks. We engage across the entire funnel and multiple surface areas to ensure we own client outcomes and drive success.

The Team

We are a deeply technical team striving to be the SpaceX of the AEO & SEO space. You'll collaborate with engineers who have developed fraud engines for Stripe, Plaid, and Coinbase; built self-driving car systems at Aurora; and conducted AI research at Stanford. Our flat structure means you'll work directly with founders deeply knowledgeable in architecture, code, and product, without layers of management.

This Role

As a Senior Data Engineer, you will own the critical data infrastructure that powers automated reporting, AI visibility monitoring, competitive intelligence, and proactive alerting for a growing multi-tenant client base.

The primary challenge in this role lies in operational complexity rather than petabyte-scale volume. We manage numerous clients, each with distinct data sources, varied schemas, different API rate limits, diverse failure modes, and unique freshness requirements. Your solutions must ensure fault isolation, graceful degradation, and per-tenant reliability are built in from the outset, preventing a single failure from impacting others.

This is largely a greenfield opportunity. You will be responsible for building out comprehensive monitoring, observability, data quality layers, and robust pipeline orchestration systems.

You will report directly to the CTO and work closely with product engineers who consume your data layer for feature development. Together, you'll define interfaces and data contracts. There is no separate platform team; you will own your infrastructure, CI/CD, and monitoring end-to-end.

What You'll Do

Design and implement multi-tenant data infrastructure, covering ingestion, validation, and transformation across diverse data sources. Focus on fault isolation, schema variation, and graceful handling of upstream failures.
Integrate with third-party APIs, building robust and resilient connectors that manage complex authentication flows, rate limits, pagination quirks, and breaking changes across many client accounts.
Develop and maintain data quality systems, including automated checks on distributions, volumes, null rates, and freshness. Implement statistical validation beyond just schema validation to ensure bad data doesn't propagate downstream.
Establish comprehensive data observability, including freshness monitoring, volume anomaly detection, schema drift detection, lineage tracking, and blast radius analysis. You'll ensure systems verify data correctness, not just code execution.
Design effective alerting systems. This involves threshold tuning, noise reduction, and strategies to prevent alert fatigue, with Mean Time to Detection as a core metric.
Define and implement freshness SLAs per data source, building the necessary infrastructure to meet them and proactively alerting before breaches occur.
Develop event-driven trigger infrastructure to surface performance changes, quality regressions, and freshness violations for consumption by downstream systems.
Design entity data models for client, competitor, and content entities, owning schema evolution and ensuring backward compatibility.
Manage the operational environment, including CI/CD, containers, deployment pipelines, and credential management, ensuring every deploy passes CI before reaching production.

The Ideal Person for This Role

A builder who ships: You prioritize getting working systems into production over endless planning or excessive polish, with a track record of building reliable data infrastructure.
An operator, not just an architect: You don't just design systems; you run them, finding satisfaction in making them consistently reliable, not just functional once.
An owner: You take full responsibility for outcomes, not just completing tasks. When a pipeline breaks at 3 AM, you fix it and implement measures to prevent recurrence.
Humble and curious: You acknowledge gaps in your knowledge, ask insightful questions, and possess a genuine desire to learn, viewing feedback as an opportunity for growth.
A first-principles thinker: You understand the underlying 'why' behind system mechanics, capable of diving deep into schema decisions, validation strategies, and architecture tradeoffs.
Always improving: You are not content with 'good enough,' actively seeking ways to enhance your craft and improve systems over time.

Requirements

4+ years in data engineering, platform engineering, or infrastructure-heavy backend work.
Proficiency in Python, SQL, and pipeline orchestration tools (e.g., Airflow, Dagster, Prefect).
Experience with event-driven architectures or real-time data processing.
Proven ability to integrate with third-party APIs, building resilient connectors that handle authentication flows, rate limits, pagination, and breaking changes in a production environment.
Strong understanding of pipeline fundamentals: idempotent pipelines, backfill strategies, and graceful schema evolution in production.
Experience implementing data quality systems in production, including automated checks on distributions, volumes, freshness, and null rates.
Expertise in data observability: freshness monitoring, anomaly detection, lineage tracking, and blast radius analysis.
Demonstrated ability in alerting design: threshold tuning, noise reduction, and establishing effective escalation paths, with a focus on minimizing false positives and missed detections.
Capability to own your infrastructure, including containers, CI/CD, deployment pipelines, monitoring, and credential management, without relying on a separate platform team.
Experience with multi-tenant or multi-client data systems, including tenant isolation, per-client configuration, and managing operational overhead at scale.
Experience building APIs or service layers for data exposure that other systems consume.
Strong collaborative skills, working closely with product engineers to define data contracts and interfaces, communicating tradeoffs clearly in writing, documenting decisions, and writing clear specifications.

Preferred Qualifications

Experience with marketing or analytics data (e.g., GA4, GSC, SEO tools).
Prior experience at a fast-moving startup.

What's in It for You

Fully remote position.
Direct collaboration with the CTO on high-impact projects.
High ownership and autonomy with no micromanagement.
First-hand exposure to cutting-edge AI and search technology.
Your work will directly impact the performance of well-known ($10M+ ARR) companies.
Join a fast-growing company at the intersection of AI and marketing.

Our Hiring Process

Application
Take-Home Project
Technical Deep Dive
Leadership Interview
Reference Checks

Key skills/competency

Data Engineering
Multi-tenant Architecture
API Integration
Data Quality
Data Observability
Pipeline Orchestration
Python
SQL
CI/CD
Fault Tolerance

Tags:

Senior Data Engineer

data infrastructure

multi-tenant

API integration

data quality

observability

alerting

schema design

CI/CD

pipeline orchestration

fault isolation

Python

SQL

Airflow

Dagster

Prefect

containers

event-driven

real-time data

GA4

GSC

How to Get Hired at Discovered Labs

Research Discovered Labs' culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, focusing on their engineering-first, data-driven approach to AI/SEO.
Tailor your resume strategically: Highlight direct experience with multi-tenant data systems, complex API integrations, data quality, and observability, emphasizing Python and SQL proficiency.
Prepare for the take-home project: Showcase your ability to build, operate, and ship robust data infrastructure, demonstrating a first-principles thinking approach.
Master technical deep dives: Be ready to discuss system architecture, pipeline fundamentals, fault tolerance, and data quality design with founders and senior engineers.
Demonstrate ownership and curiosity: Emphasize your commitment to reliable systems, proactive problem-solving, and continuous learning, aligning with Discovered Labs' builder-operator mindset.

Frequently Asked Questions

Find answers to common questions about this job opportunity

01What are the primary technical challenges for a Senior Data Engineer at Discovered Labs?

02How does Discovered Labs ensure data quality and observability in its systems?

03What kind of third-party API integration experience is valued for this role at Discovered Labs?

04What is the team structure and reporting relationship for the Senior Data Engineer at Discovered Labs?

05What are the opportunities for impact and growth for a Senior Data Engineer at Discovered Labs?

06What does the take-home project for the Senior Data Engineer role at Discovered Labs entail?

Explore similar opportunities that match your background

Senior Data Engineer

Discovered Labs

Job Overview

Who's the hiring manager?

Job Description

About Discovered Labs

High-level Overview Of Our Approach

The Team

This Role

What You'll Do

The Ideal Person for This Role

Requirements

Preferred Qualifications

What's in It for You

Our Hiring Process

Key skills/competency

Tags:

Share Job:

How to Get Hired at Discovered Labs

Frequently Asked Questions