Phase 2: Deep Dives | Category: Advanced Data Modeling

Why Time-Series Is a Fundamentally Different Problem

Time-series data has four properties that make it unlike analytical warehouse data:

  1. Writes are append-only: New measurements always arrive at the current time — no random updates, only inserts at the “now” end of the timeline

  2. Queries are almost always range-based: “Give me all readings for sensor X between T1 and T2” is the dominant access pattern

  3. Compression is extreme: Consecutive timestamps differ by a fixed interval (e.g., 1000ms), so delta encoding compresses timestamps to almost nothing. The Facebook Gorilla paper showed XOR compression achieving 1.37 bytes per data point for float64 values

  4. Data ages out: Recent data at full resolution; older data downsampled or deleted. This lifecycle is automatic in purpose-built TSDBs

When these properties appear in your system design: pipeline observability (Kafka lag, Flink checkpoint duration), product analytics (DAU over time), ML training metrics (loss curves), IoT sensors, infrastructure monitoring (CPU, memory), financial tick data. Any system with a “time” axis as the primary dimension.

The Three Major Time-Series Databases

InfluxDB: Purpose-Built TSDB

Data model (tags + fields + timestamp):

Measurement
: sensor_temperature  Tags (indexed, string, used for filtering):    device_id = "sensor-042"    location = "factory-floor-3"    unit = "celsius"  Fields (un-indexed, the actual values):    temperature = 23.4    humidity = 61.2  Timestamp: 2026-04-12T00:00:00Z

Critical design rule — tags vs fields:

  • Tags: metadata for filtering, grouping. Low cardinality (device_id, not request_id). Always strings. Indexed. Stored in the index structure.
  • Fields: actual measurements. Any type (float, int, string, bool). NOT indexed. Value you’re tracking.
  • Cardinality trap: Using a high-cardinality value as a tag (user_id = 500M users) creates 500M separate time series → massive index memory pressure → InfluxDB’s performance collapses. This is the #1 InfluxDB production mistake.

TSM Storage Engine (Time-Structured Merge Tree):

  • Write path: WAL → in-memory cache → TSM files (columnar, compressed)
  • Compression: delta encoding for timestamps (if data arrives every second, delta is always 1000ms → compresses to nearly nothing), XOR for floats
  • InfluxDB 3.0: replaced TSM with Apache Parquet on object storage — compute-storage separation, dramatically lower storage cost

Query (Flux language):

from
(bucket: "sensors")    |> range(start: -1h)    |> filter(fn: (r) => r.device_id == "sensor-042")    |> aggregateWindow(every: 5m, fn: mean)    |> yield(name: "5min_avg_temperature")

Or SQL (InfluxDB 3.0+):

SELECT
TIME_BUCKET('5 minutes', time) AS bucket,         device_id, AVG(temperature) AS avg_temp  FROM sensor_temperature  WHERE time >= NOW() - INTERVAL '1 hour'    AND device_id = 'sensor-042'  GROUP BY 1, 2  ORDER BY 1;

Retention policy per bucket:

Bucket
: "sensors-raw"       retention: 30 days   (full resolution, 10-sec intervals)  Bucket: "sensors-hourly"    retention: 1 year    (hourly aggregates, downsampled)  Bucket: "sensors-daily"     retention: 7 years   (daily aggregates, compliance)

TimescaleDB: PostgreSQL + Time-Series Superpowers

The key insight: TimescaleDB IS PostgreSQL. Every tool, driver, ORM, BI connector, and SQL feature that works with PostgreSQL works with TimescaleDB.

Hypertable creation:

--
Step 1: Create a regular PostgreSQL table  CREATE TABLE sensor_readings (      time         TIMESTAMPTZ NOT NULL,      device_id    TEXT NOT NULL,      location     TEXT,      temperature  FLOAT8,      humidity     FLOAT8  );  -- Step 2: Convert to hypertable (automatic time-based partitioning)  SELECT create_hypertable('sensor_readings', 'time',       chunk_time_interval => INTERVAL '1 day');  -- TimescaleDB automatically creates a new chunk/partition each day  -- Queries with time filters automatically skip irrelevant chunks (chunk exclusion)

Continuous Aggregates (the killer feature):

--
Define an automatically-refreshed materialized view at hourly granularity  CREATE MATERIALIZED VIEW sensor_hourly  WITH (timescaledb.continuous) AS  SELECT      time_bucket('1 hour', time) AS bucket,      device_id,      AVG(temperature) AS avg_temp,      MAX(temperature) AS max_temp,      MIN(temperature) AS min_temp,      COUNT(*) AS reading_count  FROM sensor_readings  GROUP BY 1, 2;  -- Set auto-refresh policy: update when new data arrives  SELECT add_continuous_aggregate_policy('sensor_hourly',      start_offset => INTERVAL '3 hours',      end_offset => INTERVAL '1 hour',      schedule_interval => INTERVAL '1 hour');

A query for “hourly avg temperature last 30 days” automatically reads from sensor_hourly (1,440 rows) instead of sensor_readings (26M rows for a 10-second sensor). Zero application code change.

Native compression:

--
Compress chunks older than 1 day (90-97% compression ratio typical)  SELECT add_compression_policy('sensor_readings', INTERVAL '1 day');  -- Compressed chunks remain queryable — no application change needed  SELECT time, temperature FROM sensor_readings  -- reads compressed + uncompressed seamlessly  WHERE time > NOW() - INTERVAL '7 days'    AND device_id = 'sensor-042';

Retention + downsampling policy:

--
Delete raw data older than 30 days  SELECT add_retention_policy('sensor_readings', INTERVAL '30 days');  -- Keep hourly aggregates for 1 year  SELECT add_retention_policy('sensor_hourly', INTERVAL '365 days');  -- Result: 30 days of 10-sec raw data → 1 year of hourly aggregates → forever at daily level

Why TimescaleDB wins when data must JOIN with relational tables:

--
This query is trivial in TimescaleDB — impossible in InfluxDB without application-level work  SELECT      sr.device_id,      d.asset_name,      d.maintenance_schedule,      d.location_building,      AVG(sr.temperature) AS avg_temp  FROM sensor_readings sr  JOIN devices d ON sr.device_id = d.id    -- JOIN with regular PostgreSQL table!  WHERE sr.time >= NOW() - INTERVAL '24 hours'    AND sr.temperature > d.temperature_threshold  -- JOIN condition using dimension table  GROUP BY 1, 2, 3, 4  ORDER BY avg_temp DESC;

Prometheus: The Kubernetes Monitoring Standard

Pull model (the key architectural difference):

  • Prometheus doesn’t receive data — it scrapes /metrics endpoints at configured intervals
  • Application exposes: http_requests_total{method="GET",status="200"} 1234
  • Prometheus pulls this value every 15-60 seconds
  • Advantage: Prometheus controls scrape rate → won’t be overwhelmed by a misbehaving producer

PromQL (pull-oriented query language):

#
Rate of HTTP requests per second over last 5 minutes  rate(http_requests_total{job="api-server"}[5m])  # P99 latency from histogram  histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))  # Alert: error rate > 5%  sum(rate(http_requests_total{status=~"5.."}[5m])) /  sum(rate(http_requests_total[5m])) > 0.05

The cardinality problem: Labels with high cardinality (user_id, request_id) create one time series per unique label combination. At 1M unique request IDs × 10 metrics each = 10M time series → Prometheus memory explodes. Rule: never use high-cardinality values as Prometheus labels.

For long-term storage: Prometheus default retention is 15 days. For longer retention, pair with Thanos (object storage) or Mimir (horizontally scalable Prometheus). These add block compaction + downsampling at 5-min and 1-hour resolution for old data.

Downsampling: The Critical Time-Series Lifecycle Pattern

The problem: A sensor emitting every 10 seconds generates:

  • 8,640 data points/day/sensor
  • 3,153,600 data points/year/sensor
  • At 10,000 sensors: 31.5 billion data points/year

Without downsampling: storage explodes, old-data queries get slower.

The tiered resolution pattern:

Time
Range           Resolution    Storage           Use Case  ────────────────────────────────────────────────────────────────────  Last 24 hours        10 seconds    Hot (SSD)         Real-time dashboards  Last 30 days         1 minute      Warm (SSD/HDD)    Operational monitoring  Last 1 year          1 hour        Cool (Object)     Trend analysis  Last 7 years         1 day         Cold (Archive)    Compliance, reporting

Implementation in TimescaleDB:

--
3 continuous aggregates = automatic tiered resolution  CREATE MATERIALIZED VIEW sensor_1min WITH (timescaledb.continuous) AS  SELECT time_bucket('1 minute', time) bucket, device_id, AVG(temperature) avg_temp  FROM sensor_readings GROUP BY 1,2;  CREATE MATERIALIZED VIEW sensor_1hour WITH (timescaledb.continuous) AS  SELECT time_bucket('1 hour', bucket) bucket, device_id, AVG(avg_temp) avg_temp  FROM sensor_1min GROUP BY 1,2;  CREATE MATERIALIZED VIEW sensor_1day WITH (timescaledb.continuous) AS  SELECT time_bucket('1 day', bucket) bucket, device_id, AVG(avg_temp) avg_temp  FROM sensor_1hour GROUP BY 1,2;  -- TimescaleDB query routing: automatically uses the right resolution  -- Query for last 1 hour → reads sensor_readings (raw 10-sec, ~360 rows)  -- Query for last 30 days → reads sensor_1min (43,200 rows)  -- Query for last 1 year → reads sensor_1hour (8,760 rows)  -- Query for last 7 years → reads sensor_1day (2,555 rows)

Time-Series vs Analytical Warehouse: The Decision Framework

RequirementTime-Series DB (InfluxDB/TimescaleDB)Analytical Warehouse (BigQuery/Snowflake)Sub-second write latency✅ Optimized append writes❌ Batch loading is preferredAuto-downsampling + retention✅ Built-in policies❌ Manual dbt jobs requiredMillisecond read latency✅ Chunk exclusion, compressed scans❌ Typically seconds minimumAggregating 1B rows per query❌ Not the sweet spot✅ MPP execution on petabytesJOIN with business entitiesTimescaleDB ✅ / InfluxDB ❌✅ Native SQL JOINsBI tool compatibilityTimescaleDB ✅ / InfluxDB ⚠️✅ NativeCost at petabyte scale❌ Expensive per-row storage✅ Columnar compression, cheapDevOps/Kubernetes monitoringPrometheus ✅❌ Wrong tool entirely

The hybrid architecture (what production systems actually do):

Sensors
/Servers/Apps      ↓  Prometheus (short-term metrics, alerting, Kubernetes monitoring)      ↓ remote_write / archival  TimescaleDB or InfluxDB (operational time-series, 1-90 days hot)      ↓ downsampled aggregates via nightly batch  BigQuery / Snowflake (long-term analytics, joining with business data, BI dashboards)

Time-Series Modeling Anti-Patterns

1. High-cardinality tags (InfluxDB)

BAD
:  tags: {user_id: "u-12345678"}     ← millions of unique series  GOOD: tags: {user_tier: "premium"}      ← low cardinality        fields: {user_id: "u-12345678"}   ← high cardinality in fields (not indexed)

2. Using timestamps as primary keys in relational DBs

--
BAD: in PostgreSQL without TimescaleDB  CREATE TABLE metrics (      id BIGSERIAL PRIMARY KEY,      timestamp TIMESTAMP,   -- not the primary access path      device_id TEXT,      value FLOAT  );  -- Result: scans entire table for time-range queries on timestamp column  -- GOOD: TimescaleDB hypertable with time as the partitioning dimension  SELECT create_hypertable('metrics', 'time');  -- Time-range queries now do chunk exclusion: O(n_chunks_in_range) vs O(n_total_rows)

3. Not setting retention policies (storage explosion)

Production
incident: "Our InfluxDB instance grew from 500 GB to 5 TB in 3 months"  Root cause: no retention policy set  Fix: set retention policy immediately, implement tiered downsampling

4. Querying raw high-frequency data for long time ranges

--
BAD: scanning 10-second data for a 1-year trend  SELECT date_trunc('day', time), AVG(temperature)  FROM sensor_readings                    -- 31.5M rows for 1 sensor/year  WHERE device_id = 'sensor-042'  GROUP BY 1;  -- GOOD: query the daily continuous aggregate  SELECT bucket, avg_temp  FROM sensor_1day                        -- 365 rows for 1 sensor/year  WHERE device_id = 'sensor-042';  -- 86,000x fewer rows to scan

Interview Questions

Q1: “You’re building observability for a data pipeline processing 10 million Kafka events per second. What metrics would you collect and how would you store them?”

Model Answer: “Five metric categories, each with different collection strategies. First, Kafka metrics: consumer lag per topic/partition, producer throughput, broker disk/network utilization. These are already exposed via JMX → collected by Prometheus Kafka exporter, stored in Prometheus with 15-day retention. Critical alert: consumer lag growing continuously for > 5 minutes = throughput gap, pages on-call. Second, Flink metrics: checkpoint duration, restart count, backpressure ratio, processing latency percentiles. Flink exposes these via its metrics interface → Prometheus. Third, data quality metrics: records processed per minute, schema validation failures, DLQ message count. These come from my pipeline code instrumented with Prometheus client library. Fourth, data freshness: MAX(event_timestamp) in the destination table vs current time. This is NOT a Prometheus metric — I’d run a SQL query every 5 minutes against the serving table and write the result to InfluxDB or as a Prometheus gauge. Fifth, infrastructure: CPU, memory, disk I/O on cluster nodes. Prometheus node_exporter, scraping every 15 seconds. Architecture: Prometheus for all operational metrics (short-term, alerting), with Thanos for long-term retention. For pipeline-specific business metrics (data freshness, quality rates) that analysts need in dashboards — write to TimescaleDB so they can JOIN with pipeline configuration tables and use standard SQL in Grafana.”

Q2: “A factory deploys 50,000 IoT sensors each emitting temperature, humidity, and pressure every 10 seconds. Design the data storage layer.”

Model Answer: “At 50,000 sensors × 3 metrics × 6 readings/minute = 900,000 data points/minute = 15,000 points/second. Five-year retention required. First, the write path: sensors → MQTT broker (Mosquitto) → MQTT bridge → InfluxDB 3.0 (object storage backend). InfluxDB 3.0’s Parquet-on-S3 storage handles 15K points/sec easily and costs ~10x less than InfluxDB 2.x local disk at this scale. Second, data model: measurement: factory_metrics, tags: {sensor_id, location_building, asset_type} (low cardinality — 50K sensors is actually fine as a tag in InfluxDB 3.0’s improved architecture), fields: {temperature, humidity, pressure}. Third, retention and downsampling: raw 10-second data retained 7 days in hot InfluxDB bucket. 1-minute aggregates retained 30 days. 1-hour aggregates retained 1 year. 1-day aggregates retained 5 years. This is configured as InfluxDB tasks (or continuous aggregates in TimescaleDB if we needed SQL JOINs). Fourth, alerting: Telegraf agents at the edge do threshold alerting (temperature > 80°C) before data even reaches InfluxDB — sub-second local response. Prometheus handles infrastructure monitoring for the MQTT and InfluxDB cluster itself. Fifth, analytics: for business reporting (maintenance scheduling, efficiency analysis), a daily Dataflow/Spark job reads daily aggregates from InfluxDB and writes to BigQuery for SQL-accessible historical analytics with JOINs against asset registration and maintenance records.”

Think About This

You’re in a Netflix interview. The prompt: “Netflix monitors 15,000+ Flink jobs streaming 60 PB/day. Design the observability data platform for these pipelines.”

Walk through:

  1. What are the metric categories? (Flink internals: checkpoint duration, backpressure, restart count. Data metrics: records/sec, lag, DLQ rate. Infrastructure: CPU, memory, network per task manager.)

  2. Which tool for each? (Flink + infrastructure → Prometheus + Grafana, 30-day retention with Thanos. Data freshness → custom probes writing to TimescaleDB since analysts query it alongside pipeline config tables. Long-term trend analysis → BigQuery with daily aggregates from TimescaleDB.)

  3. What’s the critical alert for data freshness? (MAX(event_time) > threshold for any of 15K jobs — a join against a job registry to know expected freshness per job. Some jobs process real-time (alert at 2 min lag), others are hourly batch (alert at 2 hour lag).)

  4. How do you avoid alert fatigue at 15K jobs? (Tiered: P1 = SLO-critical pipelines only (200 jobs) → PagerDuty. P2 = tier-2 pipelines → Slack channel. P3 = everything else → daily digest. The alert routing rules live in a database where each pipeline has a tier field.)

Quick Reference

  • Time-series properties: append-only writes, time-range queries, extreme compression (timestamps: delta encoding; values: XOR/delta), data lifecycle (downsample then delete).
  • InfluxDB: purpose-built TSDB, tags+fields+timestamp model. Avoid high-cardinality tags. InfluxDB 3.0 = Parquet on S3, 10x cheaper storage, SQL support.
  • TimescaleDB: PostgreSQL extension. Hypertables = automatic time-based partitioning. Continuous aggregates = auto-refreshing tiered resolution. Full SQL + JOINs with relational tables. The choice when time-series must coexist with relational data.
  • Prometheus: pull-model, scrapes /metrics endpoints. Kubernetes-native. Short retention (15 days). PromQL for rate(), histogram_quantile(). High-cardinality labels kill performance.
  • Downsampling tiers: Raw → 1-min → 1-hour → 1-day. Each tier reduces storage ~60-86x. TimescaleDB continuous aggregates automate this.
  • The hybrid architecture: Prometheus (short-term operational) → TimescaleDB/InfluxDB (medium-term, sub-second queries) → BigQuery/Snowflake (long-term analytical + business JOINs).
  • Decision threshold: Start with PostgreSQL (< 50M rows, simple queries). Add TimescaleDB (millions of writes/day, time-range queries dominant). Use InfluxDB (pure sensor/monitoring workload, no SQL JOINs needed). Use Prometheus (Kubernetes observability, alerting).

Tomorrow’s Preview

Day 44: Exactly-Once Semantics Deep Dive — At-most-once, at-least-once, exactly-once. Kafka transactions, Flink checkpointing, idempotent producers, two-phase commit. Why exactly-once is hard and when you actually need it vs when at-least-once + deduplication is the right (and cheaper) answer.