Day 34 — Accounts Merge
Coding problem
| Problem | Accounts Merge |
| LeetCode ID(s) | LeetCode #721 |
| Difficulty | Medium |
| Pattern | Union-Find / DFS |
| Company tags | Meta, Google |
| Suggested time | 25m |
Solution outline (coding)
- Build graph: email connects accounts; DFS each component to collect emails, sort, assign id.
- Or Union-Find on email indices; merge accounts sharing any email.
- Sort each merged email list for stable output.
Time complexity: O(N α(N)) with UF; roughly O(N log N) if sorting dominates.
Space complexity: O(N).
Show Python solution
from typing import List
from collections import defaultdict
class Solution:
def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
parent = {}
def find(x: str) -> str:
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a: str, b: str) -> None:
ra, rb = find(a), find(b)
if ra != rb:
parent[rb] = ra
email_to_name = {}
for acc in accounts:
name = acc[0]
first = acc[1]
for e in acc[1:]:
if e not in parent:
parent[e] = e
email_to_name[e] = name
union(first, e)
groups = defaultdict(list)
for e in parent:
groups[find(e)].append(e)
return [[email_to_name[min(g)]] + sorted(g) for g in groups.values()]
# Input: accounts with shared emails merged by connected component
# Output: [name, ...emails sorted] per merged accountSQL interview practice
1. Interview question
Companies / track: Meta, Google
Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.
What you are asked to write (SQL prompt):
Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
accounts_merge(user_id, email). BigQuery: use connected components logic (emails shared across users) to produce canonical_account_id and member user_ids.
Tables implied by the prompt:
accounts_merge(user_id, email)logic(emails shared across users)
Engine: BigQuery — use its date, array, and approximate functions as documented.
2. Solution outline
- Clarify out loud: result grain (one row per what?), join keys, time zone, and any
ORDER BY/LIMIT/ tie-breakers. - Map Union-Find / DFS to SQL: say the relational equivalent (e.g. hash map →
GROUP BY+ key; two pointers → ordered window + filter). - Structure: CTEs (
WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)
Main query
WITH edges AS (
SELECT a.user_id AS u1, b.user_id AS u2, a.email
FROM accounts_merge AS a
JOIN accounts_merge AS b ON a.email = b.email AND a.user_id < b.user_id
)
SELECT email, MIN(u1) AS canonical_user, ARRAY_CONCAT(ARRAY_AGG(DISTINCT u1), ARRAY_AGG(DISTINCT u2)) AS members
FROM edges
GROUP BY email;