Day 27 — API design for data systems

Phase 1: Foundations & Frameworks | Category: API & Interface Design

Why This Shows Up in Senior DE Interviews

Data engineers increasingly design APIs — not just for internal pipelines, but for data products, feature stores, self-serve analytics platforms, and ML inference. At OpenAI and Anthropic, the data engineer role explicitly includes designing the interfaces that consumers (ML teams, product teams, external developers) use to access data. As System Design Handbook puts it: “What they’re testing: API idempotency, pagination, statelessness, performance.” A senior DE who can only build pipelines but can’t design a clean, scalable API for data consumers is incomplete.

The Three Protocols: Decision Framework First

Topic	REST	gRPC	GraphQL
Transport	HTTP/1.1 or HTTP/2	HTTP/2	HTTP/1.1 or HTTP/2
Serialization	JSON (human-readable)	Protobuf (binary, compact)	JSON
Performance	Good	Best (~5–10× faster than JSON REST for same payload)	Good
Schema contract	Loose (OpenAPI/Swagger)	Strict (.proto files)	Strict (GraphQL schema)
Caching	Excellent (GET is cacheable)	Difficult (HTTP/2, binary)	Difficult (single endpoint)
Browser support	Native	Requires grpc-web or proxy	Native
Streaming	Limited (SSE, long polling)	Native bidirectional streaming	Subscriptions (limited)
Error handling	HTTP status codes	gRPC status codes	Application-layer errors
Learning curve	Low	Medium	Medium–High
Best for	Public APIs, CRUD, external consumers	Internal microservices, high-throughput	Frontend-driven apps, flexible queries

The decision rule (Kong, LinkedIn):

Is this a public API or external-facing?
├── YES → REST. Simplicity, cacheability, browser native, widest client support.
└── NO (internal) → Is latency/throughput the primary concern?
    ├── YES → gRPC. Binary encoding, HTTP/2 multiplexing, strongly typed.
    └── NO → Do clients need flexible, ad-hoc queries (different subsets of data)?
        ├── YES → GraphQL. Client-driven queries; reduces over/under-fetching.
        └── NO → REST (default). Simpler than GraphQL; sufficient for fixed query patterns.
            Real-time push to clients?
            → WebSocket (persistent, bidirectional) or SSE (server-to-client only).

In practice, modern systems combine protocols:

REST for the external analytics API (public, cacheable, simple)
gRPC for internal service-to-service communication (feature store → inference service)
GraphQL for the self-serve analytics portal (analysts query flexible subsets)
WebSocket for live dashboard pushes

Data API Design Patterns

1. The Analytics Query API (REST)

The most common data API for FAANG interviews — an HTTP interface that lets consumers query aggregated metrics.

Endpoint design:

GET /v1/metrics/{metric_name}
  ?dimensions=country,device_type
  &filters=country:US,country:UK
  &start_date=2026-04-01
  &end_date=2026-04-10
  &granularity=daily
  &limit=100
  &cursor=eyJ0eXBlIjoiY3Vyc29yIn0=

Response structure:

{
  "meta": {
    "metric": "dau",
    "granularity": "daily",
    "query_time_ms": 142,
    "next_cursor": "eyJ0eXBlIjoiY3Vyc29yIn0=",
    "total_rows": 20
  },
  "data": [
    {
      "date": "2026-04-01",
      "country": "US",
      "device_type": "mobile",
      "value": 1243567
    }
  ]
}

Versioning via URL path (/v1/, /v2/): The most explicit and common approach. When breaking changes are needed, publish /v2/ alongside /v1/ with a migration timeline.

Header-based versioning alternative: Accept: application/vnd.analytics.v2+json — less visible but cleaner URLs.

2. The Feature Serving API (gRPC)

For ML inference pipelines, the feature store is accessed via gRPC for minimal latency. Per Anthropic’s interview expectations, the feature API design is a first-class concern.

Proto definition:

syntax = "proto3";

service FeatureStore {
  // Fetch online features for a single entity
  rpc GetFeatures(FeatureRequest) returns (FeatureResponse);
  // Batch fetch for multiple entities (inference batching)
  rpc GetFeaturesBatch(FeatureBatchRequest) returns (FeatureBatchResponse);
  // Server-streaming for real-time feature updates
  rpc WatchFeatures(WatchRequest) returns (stream FeatureUpdate);
}

message FeatureRequest {
  string entity_type = 1; // "user", "item", "session"
  string entity_id = 2; // "user-12345"
  repeated string feature_names = 3; // ["ltv_30d", "engagement_score", "risk_tier"]
  int64 timestamp_ms = 4; // point-in-time correctness
}

message FeatureResponse {
  map<string, double> features = 1;
  int64 served_at_ms = 2;
  string entity_id = 3;
}

Why gRPC for feature serving:

Protobuf binary encoding: ~5x smaller payload than JSON for numerical feature vectors
HTTP/2 multiplexing: multiple in-flight feature requests over one connection
Client-side load balancing: built into gRPC client libraries
Strongly typed schema: prevents feature type mismatch bugs at compile time (catching ML training-serving skew early)
Target SLO: p99 < 10ms — JSON over REST at this latency budget is wasteful

3. The Self-Serve Analytics API (GraphQL)

When different teams need different subsets of the same underlying data, GraphQL eliminates over-fetching:

# Finance team — financial fields only
query FinanceMetrics {
  orderMetrics(
    dateRange: { start: "2026-04-01", end: "2026-04-10" }
    groupBy: [COUNTRY, PAYMENT_METHOD]
  ) {
    date
    country
    paymentMethod
    revenue
    grossMargin
    orderCount
  }
}

# Marketing team — different fields, same API
query MarketingMetrics {
  orderMetrics(
    dateRange: { start: "2026-04-01", end: "2026-04-10" }
    groupBy: [CHANNEL, DEVICE_TYPE]
  ) {
    date
    channel
    deviceType
    conversionRate
    clickThroughRate
    newCustomerCount
  }
}

Both queries hit the same /graphql endpoint. The server resolves only the requested fields. No REST endpoint proliferation.

GraphQL trade-offs to mention:

No HTTP caching: Every query is a POST → can’t cache at CDN. Solution: persisted queries (pre-registered queries with a hash ID → GET request that is cacheable).
N+1 query problem: Nested resolvers can trigger N database queries for N records. Solution: DataLoader pattern (batches and deduplicates DB calls within a request).
Query complexity limits: Clients can craft deeply nested queries that hammer the backend. Solution: query depth limits + complexity scoring.

Pagination: The Critical API Design Decision

Every data API that returns more than ~100 records needs pagination. Wrong choice causes correctness bugs at scale. (Pipeline to Insights, APIs You Won’t Hate)

Offset-Based (Simple, Problematic at Scale)

GET /v1/orders?limit=100&offset=0
  → rows 1–100

GET /v1/orders?limit=100&offset=100
  → rows 101–200

GET /v1/orders?limit=100&offset=1000000
  → rows 1000001–1000100

Problems:

Performance degrades: OFFSET 1000000 requires the DB to scan and discard 1M rows
Data consistency: if a row is inserted between requests, row 100 from request 1 appears again as row 1 in request 2 (duplicates)
Total count queries: COUNT(*) on large tables is slow

Use when: Small datasets, simple admin interfaces, users who need jump-to-page navigation.

Cursor-Based (Correct, Scalable — Production Default)

GET /v1/orders?limit=100
  → response includes: "next_cursor": "eyJvcmRlcl9pZCI6IDEwMH0="

GET /v1/orders?limit=100&cursor=eyJvcmRlcl9pZCI6IDEwMH0=
  → WHERE order_id > 100 ORDER BY order_id LIMIT 100
  → consistent, O(1) regardless of position

How cursor works: The cursor encodes the last record’s sort key (base64-encoded payload like {"order_id": 100}). The next query resumes from that exact position using a WHERE clause — no scanning skipped rows.

Properties:

Consistent: insertions/deletions don’t cause duplicate or missing rows
Performance: O(log n) with an index on the sort key regardless of page position
Forward-only: can’t jump to arbitrary page (trade-off for correctness)

Cursor design principles:

{
  "next_cursor": "eyJvcmRlcl9pZCI6IDEwMCwgImNyZWF0ZWRfYXQiOiAiMjAyNi0wNC0xMCJ9"
}

Opaque cursor (best practice) — clients don’t decode it. Example of what it encodes internally (server-side only):

{ "order_id": 100, "created_at": "2026-04-10" }

Use multiple fields in the cursor when sorting by a non-unique column (created_at can have duplicates) — always include the primary key as a tiebreaker.

Use cursor-based for: Large datasets (millions of records), streaming exports, any API where consistency matters, ML training data fetches.

Rate Limiting: Design at Scale

Every data API needs rate limiting. Uncontrolled access can overwhelm a warehouse or serving store, and internal teams are just as capable of sending runaway query loops as external clients.

Rate limiting strategies:

Token Bucket (most common):

Each client has a bucket with capacity C tokens
Each request costs 1 token (or more for complex queries)
Tokens replenish at rate R per second
If bucket empty: 429 Too Many Requests

Allows burst traffic (empty bucket fills up) while enforcing average rate. Best for APIs where clients occasionally burst.

Sliding Window Counter (smoothest):

Track request count in rolling N-second window. More accurate than fixed windows (no thundering herd at window boundary).

Implementation tiers for data APIs:

Tier 1 (default): 100 req/min, 10,000 req/day, max 100 rows/request
Tier 2 (analytics teams): 1,000 req/min, 100,000 req/day, max 10,000 rows/request
Tier 3 (ML pipelines): 10,000 req/min, unlimited, max 1M rows/request (streaming)

Response headers (expose rate limit state):

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1744385400
Retry-After: 60
(on 429 response)

Cost-based rate limiting (for expensive queries): Simple request counting isn’t enough for analytical queries where a simple count costs 10ms but a complex ad-hoc join costs 10 seconds. Assign query cost scores based on estimated execution cost and deduct from the client’s token bucket proportionally.

Data API Versioning

API versioning is how you evolve your data API without breaking consumers.

Strategy 1: URL versioning (most explicit)

/v1/metrics — current, stable
/v2/metrics — new response format (breaking change)
/v1/ deprecated → sunset date: 2026-10-01

Simple, visible, widely understood. Every breaking change requires a new version. Maintain N and N-1 simultaneously.

Strategy 2: Additive evolution (preferred for non-breaking changes)

Response v1: {"dau": 1243567}
Response v2: {"dau": 1243567, "dau_7d_avg": 1150000}  ← new field added

Adding fields to responses is non-breaking — existing clients ignore unknown fields. Reserve URL version bumps for truly breaking changes (field removal, type change, semantic change).

Breaking vs non-breaking changes:

Change	Breaking?	Strategy
Add new response field	No	Additive — no version bump
Add optional query parameter	No	Additive
Remove response field	Yes	New URL version
Change field type	Yes	New URL version
Rename field	Yes	New URL version, or dual-field with deprecation notice
Change pagination behavior	Yes	New URL version

Feature Serving API: The Senior DE Signature Topic

At OpenAI, Anthropic, and Meta, data engineers own the feature store pipeline AND the serving API. Key design decisions:

SLO targets:

p50: < 2ms (online feature lookup)
p95: < 5ms
p99: < 10ms
Availability: 99.99%

Architecture:

ML Inference Service (gRPC call)
    ↓ GetFeatures(user_id, feature_names)
Feature Store API (gRPC)
    ↓ cache check
Redis (L1 cache, 1ms, TTL=5min)
    ↓ cache miss
Online Feature Store (DynamoDB / Bigtable, 3–5ms)
    ↓ return features
Feature Store API
    ↓ populate Redis cache
ML Inference Service

Point-in-time correctness: The timestamp_ms parameter in the feature request is critical for avoiding training-serving skew. During model training, features are computed as-of the training label time. During inference, features must be computed with the same logic. The feature serving API should support serving features as they existed at a given historical timestamp — not just the current value.

Idempotency in inference APIs:

POST /v1/infer
Content-Type: application/json

{
  "idempotency_key": "req-abc-123",
  "entity_id": "user-456",
  "model_version": "recommendation-v2.3"
}

If the server receives the same idempotency_key twice (network retry), it returns the cached response without re-running inference. Prevents duplicate predictions being logged for billing or experimentation.

Interview Questions

Q1: “Design an API that lets analysts query any metric aggregated by any dimension combination. How do you handle the combinatorial explosion of possible queries?”

Model Answer: “I’d use GraphQL for this use case because different analyst teams query completely different subsets of dimensions — forcing them through fixed REST endpoints causes API proliferation. The GraphQL schema defines the available metrics and dimensions, and clients compose queries dynamically. For the backend, the GraphQL resolver translates the query into a parameterized SQL template against the warehouse: SELECT <requested_dims>, SUM(<metric>) FROM gold.<metric_table> WHERE <filters> GROUP BY <dims> PARTITION filter: <date_range>. To prevent runaway queries, I’d implement query complexity scoring — each dimension and filter adds cost, and requests exceeding a cost budget return 429 with a ‘query too complex’ message. For frequently run queries, I’d use persisted queries (client pre-registers the query by hash) — these get full HTTP caching via CDN and bypass the GraphQL complexity check. The API is versioned at the schema level, not URL level — non-breaking additions (new metrics, new dimensions) just extend the schema.”

Q2: “Your internal feature serving API is used by 20 ML models at different teams. How do you ensure one model’s traffic spike doesn’t degrade latency for other models?”

Model Answer: “Multi-tenant isolation. I’d implement per-model quotas with separate token buckets in Redis. Each model gets a budget of requests/second and tokens/second (for cost-weighted limiting). When a model exhausts its quota, it receives 429 — its degradation is isolated from other tenants. At the infrastructure layer, I’d use dedicated Redis slots per model for the L1 cache (key prefix: model:<model_id>:<entity_id>) to prevent hot keys from one model evicting another model’s cache. For the backing online store (DynamoDB/Bigtable), read request units are pre-provisioned per model based on expected traffic. If a model genuinely needs to burst, it requests a temporary quota increase via a self-service UI backed by the platform team’s approval workflow. Circuit breakers protect the online store from a runaway model: if a model’s error rate exceeds 20% over 30 seconds, its requests are automatically rejected with a 503 until the circuit resets — preventing it from hammering the store during recovery.”

Think About This

You’re in an Anthropic interview. The prompt: “Design the data API for Claude.ai’s usage analytics. Enterprise customers need to query their own conversation analytics — token usage, latency, cost, and model performance metrics. They need both a programmatic REST API and a dashboard UI.”

Walk through:

What protocol for the REST API? (REST — it’s external, developer-facing, needs caching and broad client support. gRPC is wrong for external customers.)
What protocol for the dashboard UI? (REST for standard queries + WebSocket for any real-time usage alerts. GraphQL could work if the UI needs highly flexible queries.)
How do you enforce tenant isolation? (Every API call requires an organization API key. The backend enforces row-level filtering: WHERE org_id = authenticated_org_id. No cross-tenant data leakage possible.)
What pagination strategy? (Cursor-based — usage logs can be millions of rows. Cursor on request_id or timestamp + request_id for stable iteration.)
What rate limits? (Cost-based limiting — a query for 1 row vs 1M rows should cost differently. Free tier: 100 API calls/day, 10K rows/request. Enterprise: custom limits per contract.)

Quick Reference

REST = public APIs, external consumers, caching matters. Default choice unless requirements push elsewhere.
gRPC = internal services, high throughput, low latency, strongly typed. Feature stores, ML inference.
GraphQL = flexible client queries, frontend-heavy, analyst self-serve. Cost: no HTTP caching, N+1 risk, query complexity.
Pagination rule: Offset for small datasets; cursor-based for production (consistent, O(log n), no duplicates on inserts).
Rate limiting: Token bucket for API quotas. Cost-based limiting for analytical queries. Always expose X-RateLimit-* headers.
Versioning: Additive changes (add fields) are non-breaking. Breaking changes (remove, rename, type change) need URL versioning + 30-60 day migration window.
Feature serving: gRPC + point-in-time correctness + per-tenant isolation + idempotency keys. SLO: p99 < 10ms.

Tomorrow’s Preview

Day 28: Caching Strategies for Data Systems — Redis, Memcached. Cache-aside, write-through, write-behind. TTL strategies. Materialized views as caching. Pre-computation vs on-demand for analytics. How caching decisions affect consistency, freshness, and cost in data systems.

Day 27: API design for data systems