Day 37 — Find Median from Data Stream

Coding problem

ProblemFind Median from Data Stream
LeetCode ID(s)LeetCode #295
DifficultyHard
PatternTwo Heaps
Company tagsMeta, Google
Suggested time25m

Solution outline (coding)

  • Use two heaps: max-heap for lower half, min-heap for upper half; sizes differ by at most 1.
  • After each insert, rebalance so max-heap has equal or one more element.
  • Median = top of larger heap (or average of both tops if equal size).

Time complexity: O(log n) per addNum; O(1) for median.

Space complexity: O(n) — storing all elements.

Show Python solution
import heapq


class MedianFinder:
  def __init__(self):
    self.low = []
    self.high = []

  def addNum(self, num: int) -> None:
    heapq.heappush(self.low, -num)
    heapq.heappush(self.high, -heapq.heappop(self.low))
    if len(self.high) > len(self.low):
      heapq.heappush(self.low, -heapq.heappop(self.high))

  def findMedian(self) -> float:
    if len(self.low) > len(self.high):
      return -self.low[0]
    return (-self.low[0] + self.high[0]) / 2.0


# Input:  addNum 1,2 -> median 1.5
# Output: as above

SQL interview practice

1. Interview question

Companies / track: Meta, Google

Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.

What you are asked to write (SQL prompt):

Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
streaming_payments(user_id, ts, amount). BigQuery: maintain a daily table with median payment amount per user; design incremental query using partitions.

Tables implied by the prompt:

  • streaming_payments(user_id, ts, amount)

Engine: BigQuery — use its date, array, and approximate functions as documented.

2. Solution outline

  • Clarify out loud: result grain (one row per what?), join keys, time zone, and any ORDER BY / LIMIT / tie-breakers.
  • Map Two Heaps to SQL: say the relational equivalent (e.g. hash map → GROUP BY + key; two pointers → ordered window + filter).
  • Filter time first: predicate on DATE(ts) / partition column before heavy joins; state the window in plain English.
  • Percentiles: approx vs exact; say so if you use APPROX_QUANTILES.
  • Structure: CTEs (WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)

Main query

SELECT user_id, DATE(ts) AS d,
  APPROX_QUANTILES(amount, 100)[OFFSET(50)] AS median_amt
FROM streaming_payments
GROUP BY user_id, d;