Top K Frequent Elements · SecondChair.dev

Day 4 — Top K Frequent Elements

Coding problem

Problem	Top K Frequent Elements
LeetCode ID(s)	LeetCode #347
Difficulty	Medium
Pattern	Heap / Bucket Sort
Company tags	Meta, Google
Suggested time	20m

Solution outline (coding)

Count frequency of each distinct element with a hash map.
Either push frequencies into a min-heap of size k (keep k largest freqs) or use bucket sort by frequency.
Extract the k most frequent keys from the heap or buckets.

Time complexity: O(n log k) with a heap; O(n) with bucket sort on frequencies.

Space complexity: O(n) — frequency map plus heap or buckets.

Show Python solution

import heapq
from collections import Counter
from typing import List


class Solution:
  def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    cnt = Counter(nums)
    return heapq.nlargest(k, cnt.keys(), key=cnt.get)


# Input:  nums = [1,1,1,2,2,3], k = 2
# Output: [1,2]

SQL interview practice

1. Interview question

Companies / track: Meta, Google

Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.

What you are asked to write (SQL prompt):

Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
impressions(ad_id, user_id, ts), clicks(ad_id, user_id, ts). BigQuery: compute CTR per ad over last 7 days, require impressions >= 1000, return top 10 by CTR with tie-breaker on impressions.

Tables implied by the prompt:

impressions(ad_id, user_id, ts)
clicks(ad_id, user_id, ts)

Engine: BigQuery — use its date, array, and approximate functions as documented.

2. Solution outline

Clarify out loud: result grain (one row per what?), join keys, time zone, and any ORDER BY / LIMIT / tie-breakers.
Map Heap / Bucket Sort to SQL: say the relational equivalent (e.g. hash map → GROUP BY + key; two pointers → ordered window + filter).
Filter time first: predicate on DATE(ts) / partition column before heavy joins; state the window in plain English.
Rates: SAFE_DIVIDE or NULLIF; define numerator and denominator.
Structure: CTEs (WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).

Show SQL solution (BigQuery)

Main query

WITH imps AS (
  SELECT ad_id, COUNT(*) AS impressions
  FROM impressions
  WHERE ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  GROUP BY ad_id
),
clks AS (
  SELECT ad_id, COUNT(*) AS clicks
  FROM clicks
  WHERE ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  GROUP BY ad_id
)
SELECT
  i.ad_id,
  SAFE_DIVIDE(IFNULL(c.clicks, 0), i.impressions) AS ctr,
  i.impressions
FROM imps AS i
LEFT JOIN clks AS c USING (ad_id)
WHERE i.impressions >= 1000
ORDER BY ctr DESC, i.impressions DESC
LIMIT 10;

Day 4: Top K Frequent Elements