Day 66 — Median of Two Sorted Arrays

Coding problem

ProblemMedian of Two Sorted Arrays
LeetCode ID(s)LeetCode #4
DifficultyHard
PatternBinary Search
Company tagsGoogle
Suggested time25m

Solution outline (coding)

  • Binary search the partition of the two arrays so left halves contain half the total elements.
  • Ensure left max ≤ right min on both sides; derive median from boundary values.
  • Handle odd/even total length carefully.

Time complexity: O(log(min(m,n))).

Space complexity: O(1).

Show Python solution
from typing import List


class Solution:
  def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
    if len(nums1) > len(nums2):
      nums1, nums2 = nums2, nums1
    m, n = len(nums1), len(nums2)
    lo, hi = 0, m
    half = (m + n + 1) // 2
    while lo <= hi:
      i = (lo + hi) // 2
      j = half - i
      left1 = float('-inf') if i == 0 else nums1[i - 1]
      right1 = float('inf') if i == m else nums1[i]
      left2 = float('-inf') if j == 0 else nums2[j - 1]
      right2 = float('inf') if j == n else nums2[j]
      if left1 <= right2 and left2 <= right1:
        if (m + n) % 2:
          return float(max(left1, left2))
        return (max(left1, left2) + min(right1, right2)) / 2.0
      if left1 > right2:
        hi = i - 1
      else:
        lo = i + 1
    return 0.0


# Input:  nums1 = [1,3], nums2 = [2]
# Output: 2.0

SQL interview practice

1. Interview question

Companies / track: Google

Google: SQL screens usually assume BigQuery, Ads / Search / YouTube-style fact tables, and talking through bytes processed and partition pruning.

What you are asked to write (SQL prompt):

Frame this as metrics work for **Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
two_sorted_series(a_index, a_value, b_index, b_value). BigQuery: join and compute median of union; ensure scalability for large series via partitioned processing.

Tables implied by the prompt:

  • two_sorted_series(a_index, a_value, b_index, b_value)

Engine: BigQuery — use its date, array, and approximate functions as documented.

2. Solution outline

  • Clarify out loud: result grain (one row per what?), join keys, time zone, and any ORDER BY / LIMIT / tie-breakers.
  • Map Binary Search to SQL: say the relational equivalent (e.g. hash map → GROUP BY + key; two pointers → ordered window + filter).
  • Filter time first: predicate on DATE(ts) / partition column before heavy joins; state the window in plain English.
  • Joins: name keys explicitly; watch fan-out from duplicate IDs.
  • Percentiles: approx vs exact; say so if you use APPROX_QUANTILES.
  • Structure: CTEs (WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)

Main query

WITH u AS (
  SELECT a_value AS v, 'a' AS src FROM two_sorted_series
  UNION ALL
  SELECT b_value, 'b' FROM two_sorted_series
),
ranked AS (
  SELECT v, ROW_NUMBER() OVER (ORDER BY v) AS rn, COUNT(*) OVER () AS cnt FROM u
)
SELECT APPROX_QUANTILES(v, 100)[OFFSET(50)] AS median_union FROM ranked;