Phase 2: Company-Specific | Category: Meta-Specific
The Prompt
“Design the data infrastructure for Meta’s News Feed. The system must support real-time ranking (content shown to users must be ranked by relevance, not just recency), feature engineering for ML models, analytics for product teams, and privacy-compliant data access. Scale: 3 billion daily active users.”
This is the most commonly asked Meta DE interview question. Set a 45-minute timer.
Step 1: Requirements (5 min)
Functional:
- Ingest all user engagement events (views, likes, comments, shares, clicks, dwells) across Feed, Reels, Stories
- Support real-time ranking signals for the News Feed ML model (freshness-sensitive features)
- Support batch feature computation for training (historical engagement patterns)
- Provide analytics for product teams (engagement metrics, A/B test results, retention)
- Comply with privacy requirements (no individual user data exposed to advertisers or analysts)
Non-functional:
Scale
: 3B DAU × ~500 events/user/day = 1.5T events/day ≈ 17M events/sec peak Latency: Ranking features must be available < 100ms after event occurs (online path) Batch features updated daily for training (offline path) Freshness: Real-time features: < 30 second lag Analytics dashboards: < 4 hours (batch) Availability: 99.99% for ranking feature serving (feed breaks without it) Privacy: User-level data never exposed externally. Minimum aggregation thresholds enforced. Retention: Hot: 30 days. Warm: 1 year. Cold: 7 years (regulatory)
Scoping for the interview: “I’ll design three interconnected pipelines: (1) the event ingestion layer, (2) the real-time feature pipeline for serving, and (3) the batch analytics pipeline. I’ll connect them to the ranking system at a high level. I won’t design the ML model itself — the focus is data infrastructure.”
Step 2: Scale Estimation (3 min)
17M
events/sec × 300 bytes avg (Scribe log format, structured JSON) = ~5 GB/sec raw Compressed Parquet (6x): ~850 MB/sec = ~73 TB/day compressed Annual storage at 6 years warm+cold: ~160 PB Fan-out calculation (critical for Meta): Avg user has 300 friends/follows 17M events/sec → need to update feed caches for affected users Fan-out ratio: 1 event → ~300 cache writes = 5B cache writes/sec PEAK → Cannot fan-out synchronously for every event → must be selective Ranking model features: Per user: ~2,000 features (historical preferences, affinity scores) Per content piece: ~500 features (engagement velocity, content quality signals) Feature serving: 3B DAU × 10 feed opens/day × ~2,500 feature lookups = 30T lookups/day = 350M lookups/sec → Requires massive in-memory feature store with multiple layers of caching
Architecture implications: 17M events/sec rules out synchronous fan-out for all users. 350M feature lookups/sec requires tiered caching (in-memory → Redis → persistent feature store). 73 TB/day requires distributed storage with tiering.
Step 3: High-Level Architecture
┌────────────────────────────────────────────────────────────────────┐ │ EVENT SOURCES │ │ Facebook web/mobile → Scribe SDK (batch 100ms or 50 events) │ │ Instagram → Scribe SDK │ │ WhatsApp (anonymized signals) → Scribe SDK │ │ Backend services (impression logging) → Scribe direct │ └───────────────────────────┬────────────────────────────────────────┘ ↓ structured event logs ┌────────────────────────────────────────────────────────────────────┐ │ SCRIBE (Meta's internal Kafka-equivalent) │ │ Topics: user_engagements, content_views, ad_impressions, signals │ │ Partitioned by user_id (per-user event ordering) │ │ Retention: 7 days hot, tiered to cold storage │ └──────────────┬────────────────────────────┬────────────────────────┘ ↓ (real-time path) ↓ (batch path) ┌──────────────────────────┐ ┌────────────────────────────────────┐ │ FLINK STREAM PROCESSOR │ │ SPARK BATCH PROCESSOR │ │ • Dedup (1hr window) │ │ • Daily batch runs │ │ • PII tokenization │ │ • Complex feature computation │ │ • Session detection │ │ • Training dataset generation │ │ • Real-time aggregation │ │ • Gold layer analytics tables │ │ • Affinity scoring │ │ Reads from: Hive/Iceberg bronze │ └──────┬───────────────────┘ └──────────────┬─────────────────────┘ ↓ ↓ ┌──────────────────────────┐ ┌────────────────────────────────────┐ │ ONLINE FEATURE STORE │ │ OFFLINE DATA LAKE (Hive/Iceberg) │ │ (RocksDB + Redis cache) │ │ Bronze: raw events (immutable) │ │ • Real-time features │ │ Silver: cleaned, deduped, joined │ │ • User engagement state │ │ Gold: fact tables, features, A/B │ │ • Content signals │ │ Partitioned: date + surface_type │ │ p99 read: < 5ms │ │ Query: Presto (interactive) │ └──────┬───────────────────┘ │ Spark (batch ETL) │ │ └──────────────┬─────────────────────┘ ↓ ↓ ┌──────────────────────────┐ ┌────────────────────────────────────┐ │ FEED RANKING SERVICE │ │ SCUBA (Real-time OLAP) │ │ Calls feature store at │ │ • Analyst self-serve │ │ feed request time │ │ • Last 30 days hot data │ │ Presto for feature │ │ • Product team dashboards │ │ federation queries │ │ • A/B experiment results │ └──────────────────────────┘ └────────────────────────────────────┘
Step 4: The Fan-Out Architecture — The Meta Signature Topic
Per Hello Interview and LinkedIn, this is the critical design decision Meta interviewers probe:
The Three Fan-Out Strategies
Fan-out on Write (Push Model):
User
A posts → Fetch A's 300 followers → Write post to each follower's feed cache Result: Pre-computed feeds, sub-ms read latency Cost: 300 writes per post. For 100M posts/day × 300 followers = 30B cache writes/day Problem: What if User A is a celebrity with 50M followers? 50M cache writes for one post = thundering herd
Fan-out on Read (Pull Model):
User
opens feed → Fetch all users they follow → Fetch recent posts from each → Rank → Serve Result: Always fresh, no pre-computation Cost: High read latency (merge 300+ user timelines + ranking on every request) Problem: If user follows 500 accounts, 500 DB reads per feed open × 3B DAU = prohibitive
Meta’s Hybrid (Production):
Regular
users (< 10K followers): Fan-out on WRITE → When they post, pre-populate followers' feed caches → Fast reads, manageable write amplification Celebrities / High-follower accounts (> 10K followers): Fan-out on READ → Their posts are NOT pushed to all follower caches → At feed-load time, fetch their recent posts separately and merge Active vs Inactive users: → Active users: pre-compute feed (fan-out on write) → Inactive users (not logged in > 30 days): no pre-computation On return, compute feed on-demand (fan-out on read)
The data pipeline implication: The fan-out service needs a real-time classifier of user type (regular vs celebrity, active vs inactive). This is a Flink stateful job that maintains per-user follower counts and activity state, and routes each post to the appropriate fan-out path.
Step 5: The Ranking Pipeline — Four Stages
Per Meta’s engineering blog and YouTube ML system design interview, News Feed ranking uses a multi-pass pipeline:
Feed
Request (user opens Facebook) ↓ STAGE 1: CANDIDATE GENERATION - Pull eligible posts from feed cache (fan-out on write) or from social graph query (fan-out on read for celebrities) - ~500-2,000 candidate posts for each user - Fast retrieval, coarse filtering (deleted posts, blocked users, seen content) ↓ STAGE 2: LIGHTWEIGHT SCORING (Pass 1) - Apply fast logistic regression model on ~50 features - Reduces 2,000 candidates to top 500 - Features: recency, author affinity, content type preference - Latency budget: ~10ms ↓ STAGE 3: HEAVYWEIGHT SCORING (Pass 2) - Apply deep neural network on top 500 candidates - ~2,500 features per candidate (from feature store) - Multi-task: predicts likelihood of like, comment, share, video watch time - Aggregate into single ranking score V = w1×P(like) + w2×P(comment) + w3×P(share) - Latency budget: ~50ms ↓ STAGE 4: POST-RANKING ADJUSTMENTS - Content diversity (don't show 5 videos in a row) - Business rules (ads insertion, integrity filtering) - Freshness boost for very recent content - Final ranked list of 20-50 items ↓ Serve to user
The data pipeline’s job: Provide the features for Stages 2 and 3 in < 100ms total.
Step 6: Feature Engineering — The Core DE Deliverable
The data pipeline must produce two types of features:
Online Features (Real-Time, < 30 sec lag)
These are computed by Flink from the Scribe stream and stored in RocksDB:
| Feature | Computation | Update frequency |
|---|---|---|
post_like_count_last_1hr | Tumbling 1-hour window, COUNT(likes) per post_id | Every minute |
user_session_length_min | Session window per user (gap = 30 min), current duration | Real-time |
user_engagement_last_30min | Sliding 30-min window, SUM(actions) per user | Every 30 seconds |
author_post_velocity | Count of posts by author in last 6 hours | Every 5 minutes |
content_virality_score | Rate of change of like count (delta / time) | Every minute |
Flink job example:
on
Flink stateful aggregation: 1-hour tumbling window per post stream .keyBy("post_id") .window(TumblingEventTimeWindows.of(Time.hours(1))) .aggregate(CountAggregate()) .addSink(RocksDBSink("post_like_count_last_1hr"))
Offline Features (Batch, Updated Daily)
These are computed by Spark from the data lake and stored in Hive/Iceberg:
| Feature | Computation | Update frequency |
|---|---|---|
user_category_affinity | Last 90 days of views by content category, normalized | Daily |
user_friend_interaction_score | Mutual engagement history per friend pair | Daily |
author_quality_score | Engagement rate, share rate, report rate over 30 days | Daily |
user_video_watch_rate_p90 | Percentile of video completion rates | Daily |
user_time_of_day_preference | Engagement distribution by hour | Daily |
The point-in-time correctness requirement: When training a model on historical data, features must reflect what was known at the time of the training label — not future knowledge. This requires time-windowed feature computation in the training pipeline: “For each (user, post, timestamp) training example, compute features using only data available before timestamp.”
Step 7: Privacy-Aware Design
This is where Meta interviews differentiate. You must proactively address privacy — don’t wait to be asked.
PII tokenization at ingestion (Layer 1):
Raw
event: { "user_id": "12345", "ip": "192.168.1.1", "name": "Alice" } After PII gate: { "user_id_hash": "a8f2c...", "geo_region": "US-West", "name": [REDACTED] } Only the hash enters the pipeline — the raw PII never leaves the ingestion layer
Access control by layer (Layer 2):
Bronze
(raw, hashed): accessible to data platform team only Silver (cleaned, aggregated): accessible to ML and product teams Gold (analytics, aggregated): accessible to all analysts → Minimum threshold: no metric shown for < 1000 users (prevents re-identification)
Differential privacy for model training (Layer 3):
- Training data uses DP-SGD (Differentially Private Stochastic Gradient Descent)
- No individual user’s data can be inferred from model outputs
Right to erasure:
User
requests deletion: 1. user_id_hash added to deletion_requests table 2. Iceberg row-level deletes from bronze/silver tables (soft delete, physically removed at compaction) 3. RocksDB features for that user evicted from online store 4. Model training pipeline excludes deleted users from training sets 5. Audit log: deletion completed within 24 hours per regulatory SLA
Step 8: Data Model — The Gold Layer
The gold layer serves three consumers with different models:
For product analytics teams (Presto on Hive/Iceberg):
gold.fact_feed_engagements
(Transaction Fact) Grain: one row per engagement event (view, like, comment, share) per user per content Partitioned: by event_date, clustered by surface_type engagement_id, user_id_hash, content_id, author_id_hash, surface_type (feed/reels/stories), event_type, session_id, engagement_timestamp, content_age_min, is_organic, is_sponsored, event_date → dim_date, user_segment_sk → dim_user_segment
For ML training (Spark reads from silver):
silver.user_content_interactions
(Wide event table, schema-on-read) All raw engagement signals with full feature context Point-in-time query support via Iceberg time travel Partitioned by event_date
For A/B testing:
gold.experiment_assignments
(Transaction Fact) Grain: one row per user per experiment per day experiment_id, user_id_hash, variant_id, assignment_date, country gold.experiment_metrics (Periodic Snapshot) Grain: one row per experiment × variant × metric × day Joined with fact_feed_engagements to compute metric deltas
Step 9: Trade-offs to Articulate
Trade-off 1: Fan-out threshold (10K followers)“I chose 10K followers as the celebrity threshold for switching from write to read fan-out. Below 10K, the write amplification (10K writes per post) is acceptable given the read-latency benefit. Above 10K, the celebrity’s posts are merged at read time. This threshold is empirically tuned — Meta likely uses something similar based on write throughput and cache budget.”
Trade-off 2: Online vs offline features”I split features into online (Flink, < 30 sec) and offline (Spark, daily) based on how quickly they change and their impact on ranking accuracy. Rapidly changing signals (post virality, current session behavior) must be online — a post with 10K likes in the last hour is far more relevant than one with the same engagement over a week. Historical preferences (category affinity, friend interaction scores) change slowly — daily batch is sufficient and far cheaper to compute.”
Trade-off 3: Exactness vs scalability for engagement counts”For post_like_count_last_1hr, I’d use HyperLogLog sketches for unique user counts (COUNT DISTINCT) and exact counts for total likes. HLL introduces ~2% error but reduces memory from O(n) to O(log log n) — at 17M events/sec, this is a multi-TB vs multi-GB difference in Flink state.”
Trade-off 4: Scuba vs Presto for analytics”Scuba serves last-30-days real-time analytics for product teams who need to check feature impact immediately after a launch. Presto on Hive/Iceberg serves historical analytical queries. They’re complementary: Scuba is fast and fresh but limited in retention; Presto is flexible and historical but queries take seconds-to-minutes.”
Failure Mode Analysis
| Failure | Impact | Mitigation |
|---|---|---|
| Flink job crashes | Online features go stale, ranking uses cached values | Flink recovers from 30-sec checkpoint. Fallback: serve last known feature values for up to 5 min |
| RocksDB node failure | Feature lookups fail, ranking degrades | Replicated RocksDB. Graceful degradation: serve simplified ranking without real-time features |
| Scribe topic lag | Events delayed, features stale | Kafka consumer lag monitoring. Alert at > 1M events lag. Scale Flink consumers. |
| Fan-out service overload | Feeds not updated for some users | Async queue. Users see slightly stale feed — acceptable for eventual consistency |
| Privacy gate bug | PII reaches pipeline | Defense-in-depth: automated PII scanner on bronze layer sends alerts + blocks downstream tables |
Interview Questions
Q1: “How does your pipeline handle the case where a post goes viral — from 100 likes to 10 million likes in 30 minutes?”
Model Answer: “The virality scenario is exactly why the online feature pipeline must be Flink-based with short windows, not daily batch. When a post starts going viral, the post_like_count_last_1hr feature in RocksDB updates every minute from Flink’s tumbling window. The ranking model sees the rapidly increasing like count and progressively boosts the post’s rank for all users — even those who haven’t opened their feed yet. The fan-out mechanism also adapts: once the post reaches the celebrity threshold on engagement velocity, it may be treated as ‘high-reach content’ and pushed more aggressively to relevant users via the content recommendation path (distinct from the friend-based social graph fan-out). The key is that the feature freshness drives the ranking update organically — I don’t need a special ‘viral content’ code path. The pipeline just needs to update features fast enough that the model sees the signal within minutes.”
Q2: “The product team wants to run an A/B test comparing ranked feeds vs chronological feeds. What data infrastructure do you need?”
Model Answer: “This requires three pipeline additions. First, experiment assignment: a deterministic assignment service assigns each user to a variant (ranked or chronological) based on user_id hash. The assignment is logged to gold.experiment_assignments — one row per user per experiment per day. Second, variant-aware feature logging: every engagement event in fact_feed_engagements must include experiment_id and variant_id columns, set at event generation time. This is the key — you can’t retroactively determine which variant a user was in for past events. Third, metric computation: a daily Spark job joins experiment_assignments with fact_feed_engagements on user_id_hash and date to compute per-variant metrics (engagement rate, session length, 7-day retention). Statistical significance computed using sequential testing (allows early stopping). Privacy consideration: the experiment metrics are aggregated — no individual user behavior is shown per-variant, only aggregate statistics. The minimum reporting threshold (1000 users per variant) applies here too.”
Self-Assessment: 5 Questions Before Moving On
-
Why does Meta use a hybrid fan-out model rather than pure fan-out on write?
-
What are the four stages of the ranking pipeline and what does each reduce?
-
What’s the difference between online and offline features — give a concrete example of each?
-
How does point-in-time correctness affect training data feature computation?
-
At what layer is PII stripped and why must it happen there (not later)?
Quick Reference: Meta News Feed Data Pipeline
- Fan-out hybrid: Regular users → write (pre-compute). Celebrities (> 10K followers) → read (merge at query time). Inactive users → pull on return.
- Four ranking stages: Candidate generation → Lightweight scoring (logistic regression) → Heavyweight scoring (deep NN, 2500 features) → Post-ranking adjustments (diversity, ads, freshness).
- Features split: Online (Flink, RocksDB, < 30 sec) for rapidly changing signals. Offline (Spark, Hive, daily) for stable historical patterns.
- Privacy layers: PII tokenized at ingestion (never enters pipeline). Access control by layer. Minimum aggregation thresholds. Right-to-deletion via Iceberg row deletes.
- The gold layer: fact_feed_engagements (transaction fact), experiment_assignments + experiment_metrics (A/B testing), wide silver table for ML training.
- Meta’s stack: Scribe → Flink → RocksDB (online features) + Hive/Iceberg (offline) → Scuba (real-time analytics) + Presto (ad-hoc).
Tomorrow’s Preview
Day 33: Netflix Data Infrastructure & Interview Patterns — Netflix’s Iceberg-first architecture, the culture of freedom and responsibility, ownership expectations, how Netflix interviews emphasize architectural trade-off fluency over memorized solutions, and the specific stack (S3, Spark, Flink, Iceberg, Trino, Druid) you must know cold going into a Netflix interview.