Day 19 — Data quality frameworks

Phase 1: Foundations & Frameworks | Category: Data Quality & Governance

Why This Is a Senior-Level Differentiator

Data quality is where DE interviews separate good candidates from great ones. Everyone can build a pipeline. Senior engineers own the trustworthiness of data end-to-end — from ingestion to the dashboard a VP uses to make a $10M decision. As Surfalytics puts it: “Data quality is one of the biggest pain points in data engineering. Show you take it seriously.” At Meta, Netflix, Google, OpenAI, and Anthropic, you’re expected to have an opinionated answer to: “How do you know your data is correct, fresh, and complete — and how do your consumers know they can trust it?”

The Six Dimensions of Data Quality

These are the standard dimensions your interviewers expect you to enumerate and apply. Not as a checklist recitation — as a framework for designing validation strategies.

Dimension	Definition	Example check	What breaks without it
Completeness	All required data is present	NOT NULL on order_id, daily row count ≥ expected minimum	Silent data loss — reports show partial numbers, nobody knows
Accuracy	Values reflect reality	Revenue is positive, ages are 0–120, GPS coordinates are valid ranges	Wrong decisions from plausible-but-wrong numbers
Consistency	Same entity has same values across systems	User’s country in CRM matches warehouse. No order_date after ship_date	Conflicting reports from different teams
Timeliness / Freshness	Data arrives when expected	Table refreshed by 8 AM SLA, max event age < 2 hours	Stale dashboards used for time-sensitive decisions
Validity	Values conform to expected format/type	Email matches regex, ISO country codes only, category is in allowed list	Type errors in downstream models, foreign key violations
Uniqueness	No duplicate records	Order IDs are unique, no duplicate user events	Double-counted metrics, inflated KPIs

Senior-level framing: In an interview, don’t just list these. Map them to the layers they’re enforced at:

Completeness + freshness → pipeline monitoring (Airflow SLAs, row count checks)
Validity + accuracy → transformation-time tests (dbt tests, Great Expectations)
Consistency → cross-table and cross-system reconciliation tests
Uniqueness → primary key constraints, deduplication logic

Where Quality Checks Belong: The Defense-in-Depth Model

The biggest mistake junior engineers make: testing quality at only one layer. Senior engineers build multiple layers of defense:

Source Systems
    ↓ [Layer 1: Ingestion Gate]
    Schema validation — reject malformed records
    Row count sanity — flag drops > 20% from prior day
    Required fields — NULL check at ingestion
    ↓ Bronze (raw)
    ↓ [Layer 2: Source Assertions]
    dbt source tests — freshness, not_null, unique on keys
    Volume anomaly detection — Z-score on daily row counts
    ↓ Silver (cleaned)
    ↓ [Layer 3: Transformation Tests]
    dbt model tests — business logic validation
    Referential integrity — FK relationships
    Range checks — order_total > 0, age BETWEEN 0 AND 120
    Custom SQL tests — "revenue today within 30% of 7-day avg"
    ↓ Gold (serving)
    ↓ [Layer 4: Output Validation]
    Row count / null rate must match SLO
    Critical metric sanity — DAU shouldn't drop 50% overnight
    Schema continuity — no unexpected column drops
    ↓ Consumers (BI, ML, APIs)
    ↓ [Layer 5: Consumer-Side Monitoring]
    Anomaly detection on KPIs — alert on metric spikes/drops
    Freshness monitoring — is the dashboard data from today?

What to say in an interview: “I embed quality checks at every layer — not just at the end. By the time data reaches gold, it’s passed three validation gates. Late detection means analysts have already acted on bad data. Early detection means we fix upstream before anyone notices.”

Tool Deep Dive

dbt Tests: Quality in the Transformation Layer

dbt’s test framework runs assertions against your SQL models. Two categories:

Generic tests (built-in, declared in YAML):

models:
  - name: fact_orders
    columns:
      - name: order_id
        tests:
          - unique           # No duplicate order IDs
          - not_null         # Every order has an ID
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customer')
              field: customer_id  # Referential integrity
      - name: order_status
        tests:
          - accepted_values:
              values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']

Singular tests (custom SQL — returns rows that FAIL the test):

-- tests/no_future_order_dates.sql
-- Returns rows where order date is in the future (should return 0 rows)
SELECT order_id, order_date
FROM {{ ref('fact_orders') }}
WHERE order_date > CURRENT_DATE

dbt source freshness (checks when data last arrived):

sources:
  - name: raw_orders
    freshness:
      warn_after: {count: 6, period: hour}
      error_after: {count: 12, period: hour}
    loaded_at_field: _ingested_at

Test severity (warn vs error — prevents alert fatigue):

- name: order_id
  tests:
    - unique:
        severity: error    # Blocks pipeline
    - not_null:
        severity: warn     # Alerts but continues

What dbt tests buy you:

Tests run in CI/CD — bad code doesn’t reach production
Tests run on schedule — catches data issues in production
Results stored in metadata — audit trail of quality over time
Documentation is auto-generated from tests

Great Expectations: Profiling + Advanced Validation

Great Expectations (GX) is more powerful than dbt for:

Statistical assertions (“this column’s mean should be within 2 standard deviations of last week’s”)
Volume assertions (“row count should be between 95% and 110% of yesterday”)
Distribution checks (“null rate on email_address shouldn’t exceed 5%”)
Multi-column assertions (“ship_date must be >= order_date”)

# Great Expectations expectation suite
expect_column_values_to_not_be_null(column="order_id")
expect_column_values_to_be_unique(column="order_id")
expect_column_values_to_be_between(
    column="order_total", min_value=0, max_value=1000000
)
expect_table_row_count_to_be_between(
    min_value=1000000, max_value=50000000  # Daily volume SLO
)
expect_column_value_z_scores_to_be_less_than(
    column="daily_revenue", threshold=3.0  # Anomaly detection
)

When to use GX vs dbt tests:

Scenario	Tool
Basic schema validation, FK integrity, accepted values	dbt (simpler, native)
Statistical anomaly detection, distribution drift	Great Expectations
Complex multi-table assertions	dbt singular tests
Profiling new datasets before onboarding	Great Expectations
Integration into non-dbt pipelines (Spark, Airflow)	Great Expectations
SQL-native pipeline teams	dbt

Other Tools in 2026

Tool	Best For
Soda	Business-friendly data quality platform, SQL-based checks, SLO tracking
Monte Carlo	ML-powered anomaly detection, “data observability” platform, automated lineage
Bigeye	Automated anomaly detection with Snowflake/Big Query/Redshift native integration
Dataplex (GCP)	Native GCP data quality scanning + auto data discovery
AWS Glue Data Quality	AWS-native, integrates with Glue pipelines and Data Catalog

Data Quality SLIs, SLOs, and SLAs

Borrowing from SRE (Site Reliability Engineering), this framework turns vague quality aspirations into measurable commitments (Monte Carlo, Customer Science):

SLI (Service Level Indicator): The metric you measure.

- Data freshness: max(event_timestamp) in the orders table
- Completeness: (non-null order_id rows / total rows) × 100
- Volume: daily row count for fact_orders
- Error rate: failed quality checks / total checks × 100

SLO (Service Level Objective): The target for that indicator.

- Freshness SLO: fact_orders updated by 8:00 AM PT for 29 of 30 days (96.7%)
- Completeness SLO: order_id null rate < 0.01% for 99% of daily runs
- Volume SLO: daily row count within 10% of 7-day average for 95% of days
- Error rate SLO: < 0.1% quality check failure rate per week

SLA (Service Level Agreement): External commitment (to business teams, analysts, downstream systems).

- "The daily revenue dashboard reflects data from the prior day by 8:00 AM PT"
- "GDPR deletion requests are reflected in all reports within 24 hours"

Error budgets: If the SLO is 96.7% (29/30 days), you have a budget of 1 failure/month. If you’re burning through your error budget faster than expected, you stop new feature work and fix reliability. This is the SRE approach applied to data pipelines.

Alerting: Design for Action, Not Volume

The most common failure mode: too many alerts → alert fatigue → engineers ignore everything → real incidents missed.

Alert design principles:

Topic	Details
P1 Critical	Data missing for SLA-critical table Pager Duty / phone15 min response, immediate fix
P2 High	Quality check failure on tier-1 dataset Slack #data-alerts Same business day investigation
P3 Warning	Volume anomaly, freshness degrading Slack digest / ticket Next sprint investigation
P4 Info	Non-critical check failed Daily summary email Review weekly

Multi-window burn rate alerting (SRE-style): Alert when you’re consuming your error budget faster than expected. A 1-day window catches fast burns (sudden disaster). A 7-day window catches slow burns (gradual quality degradation). Both together = complete coverage.

Avoid these alerting mistakes:

Alerting on every test failure regardless of business impact
Static thresholds that don’t account for seasonality (holiday traffic spikes)
Alerting on non-actionable failures (source system is down — you can’t fix that)
No runbooks attached to alerts (on-call engineer doesn’t know what to do)

Tiering Your Data Assets

Not all tables deserve the same quality investment. Define tiers:

Topic	Details
Tier 1 (Critical)	Powers executive dashboards, regulatory reports, revenue calculations Full test suite, SLAs, Pager Duty alerts, runbooks
Tier 2 (Important)	Product analytics, operational dashboards Core tests, SLOs, Slack alerts
Tier 3 (Exploratory)	Data science experiments, ad-hoc analysis Basic not-null and freshness checks

What to say in interviews: “I’d tier our data assets by business impact. Tier-1 tables have comprehensive test suites, defined SLOs, and PagerDuty alerting for violations. Tier-3 exploratory tables have basic freshness checks. This prevents alert fatigue while ensuring our most critical data is protected.”

Interview Questions

Q1: “How would you build a data quality framework for a new analytics platform from scratch?”

Model Answer: “I’d approach it in four phases. First, asset inventory and tiering: catalog all datasets, classify each as Tier 1/2/3 based on business impact. Start with the 5-10 tables that power the most critical decisions. Second, implement the defense-in-depth model: ingestion-time schema validation and row count checks, dbt source freshness tests on raw tables, dbt model tests (unique, not_null, FK integrity, accepted values, custom SQL) on silver and gold, and output validation before serving. Third, define SLIs and SLOs per tier: freshness SLOs (‘fact_orders updated by 8 AM for 29/30 days’), volume SLOs (‘daily row count within 15% of 7-day average’), completeness SLOs. Fourth, wire alerting to SLO breaches — not to every test failure. Tier-1 SLO violations go to PagerDuty with a runbook attached; Tier-2 go to a Slack channel. I’d use dbt for transformation-layer tests and Great Expectations or Monte Carlo for anomaly detection. The key principle: tests in CI/CD prevent bad code from reaching production; runtime monitoring catches bad data in production.”

Q2: “A dashboard that the VP of Product uses shows a 40% drop in DAU overnight. How do you diagnose it?”

Model Answer: “This is a data quality incident. I’d work backwards from the consumption layer. First, check if it’s a display issue — is the BI tool connected, is the cache stale, is the date filter correct? If the underlying data is genuinely down 40%, step two: check data freshness — did the pipeline run on time? Check the orchestration tool (Airflow) for job failures or delayed runs. If the pipeline ran, step three: check row counts in the fact table vs. the prior day and the 7-day average. A sudden volume drop points to a source-side issue — the upstream system sent fewer events. Step four: check the source system — was there a deployment, an API outage, a schema change that caused events to fail validation and get dropped? Step five: if the data looks right but DAU shows a drop, check the SQL definition — did someone change the deduplication logic or session definition? This exact scenario is why I’d have automated volume anomaly detection that fires before the VP opens their dashboard at 9 AM. Mean-time-to-detect is the key metric — catching this at 7 AM beats catching it at 9 AM.”

Think About This

You’re in a Meta interview. The prompt: “Meta’s ads revenue dashboard shows a 25% revenue drop for yesterday. Half the data team thinks it’s a real business event; half thinks it’s a data quality issue. How would you design a system that answers this question in under 30 minutes?”

Walk through:

What checks would definitively rule out a data quality issue? (Volume check on raw impression/click events vs. prior day. Freshness check — did all pipeline jobs complete. Schema check — did any upstream field change. Cross-system reconciliation — does the ads billing system agree with the warehouse?)
What would you build proactively so this situation never happens? (Automated anomaly detection that fires before business hours. Tiered alerting: a 25% revenue drop on a Tier-1 table is a P1 incident. Data lineage showing which upstream tables feed the revenue figure.)
How do you prevent false positive investigations? (Statistical thresholds that account for day-of-week seasonality — Sunday revenue is genuinely lower than Friday. “25% below the prior same-day-of-week” is more meaningful than “25% below yesterday.”)
What’s the SLO for this table? (Revenue dashboard available with verified data by 8 AM PT. Quality checks passed for 99% of business days.)

The insight: the question “is this a business event or a data issue?” should be answerable in < 5 minutes, not 30 minutes, if you have proper quality infrastructure. The 30-minute investigation means the quality framework is missing.

Quick Reference

Six dimensions: Completeness, Accuracy, Consistency, Timeliness, Validity, Uniqueness — map each to a pipeline layer and tool
Defense-in-depth: Quality checks at ingestion, source, transformation, output, and consumer layers — not just one place
dbt for transformation tests (generic + singular tests, CI/CD integration). Great Expectations for statistical/distribution checks and non-dbt pipelines
SLI → SLO → SLA: Measure (SLI) → target (SLO) → commitment (SLA). Apply error budgets to prioritize reliability work
Alert on SLO breaches, not every test failure — prevents alert fatigue while ensuring critical issues get attention
Tier your assets: Tier-1 critical tables get full test suites + PagerDuty. Tier-3 exploratory tables get basic checks. Invest proportionally to business impact

Tomorrow’s Preview

Day 20: Data Lineage & Cataloging — Lineage tracking (column-level vs table-level), tools (OpenLineage, Datahub, Amundsen), data catalogs for discovery, and how lineage helps you debug the hardest pipeline failures — the governance foundation that senior DEs are expected to design and champion.

Day 19: Data quality frameworks