Day 37 — Find Median from Data Stream
Coding problem
| Problem | Find Median from Data Stream |
| LeetCode ID(s) | LeetCode #295 |
| Difficulty | Hard |
| Pattern | Two Heaps |
| Company tags | Meta, Google |
| Suggested time | 25m |
Solution outline (coding)
- Use two heaps: max-heap for lower half, min-heap for upper half; sizes differ by at most 1.
- After each insert, rebalance so max-heap has equal or one more element.
- Median = top of larger heap (or average of both tops if equal size).
Time complexity: O(log n) per addNum; O(1) for median.
Space complexity: O(n) — storing all elements.
Show Python solution
import heapq
class MedianFinder:
def __init__(self):
self.low = []
self.high = []
def addNum(self, num: int) -> None:
heapq.heappush(self.low, -num)
heapq.heappush(self.high, -heapq.heappop(self.low))
if len(self.high) > len(self.low):
heapq.heappush(self.low, -heapq.heappop(self.high))
def findMedian(self) -> float:
if len(self.low) > len(self.high):
return -self.low[0]
return (-self.low[0] + self.high[0]) / 2.0
# Input: addNum 1,2 -> median 1.5
# Output: as aboveSQL interview practice
1. Interview question
Companies / track: Meta, Google
Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.
What you are asked to write (SQL prompt):
Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
streaming_payments(user_id, ts, amount). BigQuery: maintain a daily table with median payment amount per user; design incremental query using partitions.
Tables implied by the prompt:
streaming_payments(user_id, ts, amount)
Engine: BigQuery — use its date, array, and approximate functions as documented.
2. Solution outline
- Clarify out loud: result grain (one row per what?), join keys, time zone, and any
ORDER BY/LIMIT/ tie-breakers. - Map Two Heaps to SQL: say the relational equivalent (e.g. hash map →
GROUP BY+ key; two pointers → ordered window + filter). - Filter time first: predicate on
DATE(ts)/ partition column before heavy joins; state the window in plain English. - Percentiles: approx vs exact; say so if you use
APPROX_QUANTILES. - Structure: CTEs (
WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)
Main query
SELECT user_id, DATE(ts) AS d,
APPROX_QUANTILES(amount, 100)[OFFSET(50)] AS median_amt
FROM streaming_payments
GROUP BY user_id, d;