Format:
First Last per line. Commas, middle names, and suffixes are tolerated.Estimate how rare a person's name is against the current US adult population.
First Last per line. Commas, middle names, and suffixes are tolerated.Self-contained, single-file HTML. All scoring runs in your browser against bundled census + birth-record tables. No server, no network, no data leaves the page.
For a person whose name we know, how rare is that name among US adults alive today? Not "rare in 1985" or "rare among newborns" — rare against the pooled blob of every American currently old enough to be in our investigation set.
Equivalently: if you picked a random US adult, what fraction of them would have a name more common than this one? That fraction is the score we report (high = rare).
SSA publishes one file per birth year (yob1880.txt through yob2024.txt) with the count of every name given to ≥5 babies that year. We use all 145 yearly files.
Census Bureau publishes a list of every surname held by ≥100 people in each decennial. We pool all four:
Union after dedup: 167,464 unique surnames. Pooling smooths single-decade noise (Nguyen, Garcia trending up) and covers immigrants the SSA file misses.
SSA births include people who died decades ago. To approximate living adults, each year's counts are multiplied by the probability a person born that year is still alive in 2026, derived from the SSA actuarial life table:
| Born | Age in 2026 | P(alive) | What this fixes |
|---|---|---|---|
| 2010 | 16 | 0.99 | Recent names (Aiden) at full weight |
| 1985 | 41 | 0.97 | Brittany peak, all alive |
| 1955 | 71 | 0.71 | Boomer names taper |
| 1935 | 91 | 0.18 | Mildred mostly gone |
| 1920 | 106 | ~0 | Old names zeroed out |
Without this step, including the full 1880-2024 window would inflate Mildred/Eugene/Gladys with millions of dead Silent Generation births. With it, they get correctly small.
Sanity: weighted total ≈ 270M, real US adult population ≈ 258M. Surnames are not survivorship-weighted (Census already snapshots living people).
1. Sort all names by weighted count, descending
2. Walk the sorted list, accumulating cumulative count
3. For each name N:
own_share = N.count / total
more_common_share = (cumulative_at_N - own_count) / total
Reported score (p) = more_common_share, range 0..1
So p=0.0 means "no name is more common" (James, Mary). p=0.99 means "99% of US adults have a name more common than this." Both first and last get their own score.
combined = (first_p + last_p) / 2 — simple mean. Treats first and last as equally informative. If only one half is found, the missing half is currently treated as p=1.0 (max rare) — this is a known weakness: it inflates scores for typo'd or foreign names.
| Combined p | Verdict |
|---|---|
| < 0.30 | Common |
| 0.30 – 0.65 | Uncommon |
| 0.65 – 0.90 | Rare |
| ≥ 0.90 | Very Rare |
Cutoffs are arbitrary — easy to tune. Both halves missing → "Unknown" (sorted to top of CSV output for review).
Names are normalized: uppercased, stripped of everything but A-Z, hyphens, apostrophes. Compound surnames are tried multiple ways:
"Yeh Liu" → try "YEHLIU" (whole), "YEH", "LIU"
pick the most-common hit (lowest rank)
"Clark-Smith" → try "CLARKSMITH", "CLARK", "SMITH"
"Jr.", "Sr.", "II", "III"... → stripped before parsing
This raised the LAPROB CSV last-name hit rate from 85% → 94%.
build_data.py Builds firstnames.json + lastnames.json from raw SSA + Census build_standalone.py Inlines the JSONs into this single-file HTML rank_csv.py Headless CLI: reads a CSV, writes a sorted-ranked CSV index.html The web UI (served version, fetches JSONs) unique-name-ranker-standalone.html This file (everything bundled)
1 - (1-p_first)(1-p_last) (joint rarity)