Day 4 — Top K Frequent Elements
Coding problem
| Problem | Top K Frequent Elements |
| LeetCode ID(s) | LeetCode #347 |
| Difficulty | Medium |
| Pattern | Heap / Bucket Sort |
| Company tags | Meta, Google |
| Suggested time | 20m |
Solution outline (coding)
- Count frequency of each distinct element with a hash map.
- Either push frequencies into a min-heap of size k (keep k largest freqs) or use bucket sort by frequency.
- Extract the k most frequent keys from the heap or buckets.
Time complexity: O(n log k) with a heap; O(n) with bucket sort on frequencies.
Space complexity: O(n) — frequency map plus heap or buckets.
Show Python solution
import heapq
from collections import Counter
from typing import List
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
cnt = Counter(nums)
return heapq.nlargest(k, cnt.keys(), key=cnt.get)
# Input: nums = [1,1,1,2,2,3], k = 2
# Output: [1,2]SQL interview practice
1. Interview question
Companies / track: Meta, Google
Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.
What you are asked to write (SQL prompt):
Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
impressions(ad_id, user_id, ts), clicks(ad_id, user_id, ts). BigQuery: compute CTR per ad over last 7 days, require impressions >= 1000, return top 10 by CTR with tie-breaker on impressions.
Tables implied by the prompt:
impressions(ad_id, user_id, ts)clicks(ad_id, user_id, ts)
Engine: BigQuery — use its date, array, and approximate functions as documented.
2. Solution outline
- Clarify out loud: result grain (one row per what?), join keys, time zone, and any
ORDER BY/LIMIT/ tie-breakers. - Map Heap / Bucket Sort to SQL: say the relational equivalent (e.g. hash map →
GROUP BY+ key; two pointers → ordered window + filter). - Filter time first: predicate on
DATE(ts)/ partition column before heavy joins; state the window in plain English. - Rates:
SAFE_DIVIDEorNULLIF; define numerator and denominator. - Structure: CTEs (
WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)
Main query
WITH imps AS (
SELECT ad_id, COUNT(*) AS impressions
FROM impressions
WHERE ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY ad_id
),
clks AS (
SELECT ad_id, COUNT(*) AS clicks
FROM clicks
WHERE ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY ad_id
)
SELECT
i.ad_id,
SAFE_DIVIDE(IFNULL(c.clicks, 0), i.impressions) AS ctr,
i.impressions
FROM imps AS i
LEFT JOIN clks AS c USING (ad_id)
WHERE i.impressions >= 1000
ORDER BY ctr DESC, i.impressions DESC
LIMIT 10;