Day 34 — Accounts Merge

Coding problem

ProblemAccounts Merge
LeetCode ID(s)LeetCode #721
DifficultyMedium
PatternUnion-Find / DFS
Company tagsMeta, Google
Suggested time25m

Solution outline (coding)

  • Build graph: email connects accounts; DFS each component to collect emails, sort, assign id.
  • Or Union-Find on email indices; merge accounts sharing any email.
  • Sort each merged email list for stable output.

Time complexity: O(N α(N)) with UF; roughly O(N log N) if sorting dominates.

Space complexity: O(N).

Show Python solution
from typing import List
from collections import defaultdict


class Solution:
  def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
    parent = {}

    def find(x: str) -> str:
      if parent[x] != x:
        parent[x] = find(parent[x])
      return parent[x]

    def union(a: str, b: str) -> None:
      ra, rb = find(a), find(b)
      if ra != rb:
        parent[rb] = ra

    email_to_name = {}
    for acc in accounts:
      name = acc[0]
      first = acc[1]
      for e in acc[1:]:
        if e not in parent:
          parent[e] = e
        email_to_name[e] = name
        union(first, e)

    groups = defaultdict(list)
    for e in parent:
      groups[find(e)].append(e)
    return [[email_to_name[min(g)]] + sorted(g) for g in groups.values()]


# Input:  accounts with shared emails merged by connected component
# Output: [name, ...emails sorted] per merged account

SQL interview practice

1. Interview question

Companies / track: Meta, Google

Meta and Google both lean on ads and product surfaces: event streams, CTR-style rates, windowed metrics, and careful handling of identity joins.

What you are asked to write (SQL prompt):

Frame this as metrics work for **Meta, Google**-scale surfaces (ads, product, or engagement — as the tables suggest).
accounts_merge(user_id, email). BigQuery: use connected components logic (emails shared across users) to produce canonical_account_id and member user_ids.

Tables implied by the prompt:

  • accounts_merge(user_id, email)
  • logic(emails shared across users)

Engine: BigQuery — use its date, array, and approximate functions as documented.

2. Solution outline

  • Clarify out loud: result grain (one row per what?), join keys, time zone, and any ORDER BY / LIMIT / tie-breakers.
  • Map Union-Find / DFS to SQL: say the relational equivalent (e.g. hash map → GROUP BY + key; two pointers → ordered window + filter).
  • Structure: CTEs (WITH) — one step per CTE; validate on a tiny slice (counts, nulls, duplicates).
Show SQL solution (BigQuery)

Main query

WITH edges AS (
  SELECT a.user_id AS u1, b.user_id AS u2, a.email
  FROM accounts_merge AS a
  JOIN accounts_merge AS b ON a.email = b.email AND a.user_id < b.user_id
)
SELECT email, MIN(u1) AS canonical_user, ARRAY_CONCAT(ARRAY_AGG(DISTINCT u1), ARRAY_AGG(DISTINCT u2)) AS members
FROM edges
GROUP BY email;