The palette is not a search engine

Spotlight, Alfred, the Slack quick switcher, vim’s ex-mode, the Bloomberg command line. Every tool a professional operator reaches for fifty times a day is the same shape: a text input, a list, Enter bound to a thing. They look like search boxes. They aren’t.

A search engine maximises recall. Surface anything that might be relevant, let the user pick. A palette has the opposite job: when the user types bc and hits Enter, the thing that happens has to be the thing they meant, every time. The first is a ranking problem. The second is a contract.

We spent two months treating Cmd+K as the first kind of problem before we noticed it was the second. The palette is now about 1,460 lines across four files, no runtime dependencies, talking to the router through a three-method interface. Almost every line is downstream of that one realisation.

Determinism, not relevance

The first version used Fuse.js. It worked usefully on the easy cases and badly on the hard ones. The hard cases all looked the same. A user types bc, the dialog returns the Business Case row plus two rows whose descriptions happen to contain the letters b and c in order, all three scored within five points of each other, and the user can’t tell which one Enter will pick. The matcher is doing its job. The job is wrong.

The shape of the problem is structural, not parametric. Weighted-feature scoring (the standard search-engine trick of summing a label score, an alias score, a description score, each with a coefficient) cannot be deterministic, because every row has a non-zero score on most fields and the sum is always a near-tie. Tuning the weights moves the tie around. It doesn’t remove it.

What we wanted is the opposite of a sum: a strict hierarchy where a match in a higher band can never lose to a match in a lower one. Eleven bands, first match wins, later bands can never overwrite an earlier one.

const SCORE_EXACT = 100;
const SCORE_SHORTCUT_EXACT = 95;
const SCORE_ALIAS_EXACT = 90;
const SCORE_SHORTCUT_PREFIX = 80;
const SCORE_PREFIX = 70;
const SCORE_ALIAS_PREFIX = 60;
const SCORE_WORD_BOUNDARY = 40;
const SCORE_CONTAINS = 30;
const SCORE_SUBSEQUENCE = 20;
const SCORE_DESCRIPTION = 10;

scoreString walks the bands top to bottom and returns the first hit. scoreItem runs it against the label, every alias, the shortcut, and the description, returning the max. An exact alias match (90) can never lose to a subsequence match (20), no matter how tight the subsequence. The user’s mental model (type the alias, get the thing) becomes arithmetic.

Weighted sums make every result feel probabilistic. Banding makes the top result feel chosen.

Users notice that difference long before they could articulate it. They stop hovering to confirm what Enter is about to do.

Banding alone wasn’t enough. Three bugs lived inside the bands themselves.

The first was the "a" bug. A naive subsequence check matches any candidate whose query characters appear in order. Typing a single "a" matches every candidate that contains an a, which on our set was essentially every candidate. The dialog filled with garbage. The fix is one line:

if (q.length >= 3 || c.length <= 10) { /* subsequence */ }

Short queries fall through to higher-signal bands or score zero. Short candidates still get subsequence-matched, because short strings can’t generate spurious hits. The shape is the one we’d keep writing: a sharp constraint that’s invisible until you write it down.

The second was description overrun. The naive scoreItem returns the maximum across all fields, and descriptions are long. A description like “Create a new business case projection” contains-matches the query "new" at score 30, while the row whose label is “New Country” only subsequence-matches at 20. The wrong row wins, on the wrong field, with strong confidence. Descriptions inform; they shouldn’t compete with labels. We cap them at the lowest band:

if (item.description) {
  const descScore = scoreString(query, item.description);
  if (descScore > 0) {
    best = Math.max(best, Math.min(descScore, SCORE_DESCRIPTION));
  }
}

A description can pull a row into the result set when nothing else matches, but it can never outrank a real label hit.

The third was tightness. Within the subsequence band, "cou" matching “Country” (3 of 7 characters) and "cou" matching “Country Attractiveness Heatmap” (3 of 29) are both worth SCORE_SUBSEQUENCE, leaving the order to whatever array iteration produces. Users see the longer phrase win sometimes and the shorter phrase win other times and conclude, correctly, that the palette is unreliable.

const tightness = q.length / c.length;
return SCORE_SUBSEQUENCE + Math.round(tightness * 10);

Shorter candidates rank above longer ones within the band, deterministically.

The lesson the three bugs taught us was that determinism isn’t a property of the top of the algorithm. It has to hold at every level: across bands, within bands, across fields. Anywhere two rows can tie is somewhere the user learns not to trust the result.

The matcher is downstream of the vocabulary

The second realisation is the one most palettes never reach. No scoring algorithm can match a query to a row whose label doesn’t contain the query’s intent. If the user types npv and the row is labelled “Business Case”, the matcher has nothing to work with. The right answer to that problem is not a better algorithm. It is telling the matcher that NPV is a synonym for Business Case.

We have sixty aliases across sixty-nine items. "bc", "nbc", "projection", and "npv" all resolve to Business Case. "market" is a synonym for Country. "reg" reaches the Regulatory workspace. Aliases live alongside the item itself:

{
  id: "modal:business-case",
  label: "New Business Case",
  aliases: ["bc", "nbc", "projection", "npv"],
  shortcut: "n",
  description: "Create a new business case projection",
  dispatch: "modal:business-case",
}

Writing the synonym list is unglamorous, repetitive product work, closer to writing taxonomy than writing code, and there is no algorithmic substitute for it. The mistake we made for two months (and the one we see most often in other people’s palettes) is to ship with no aliases and assume the matcher will figure it out. It won’t. It will feel acceptable forever and good never.

The matcher is downstream of the vocabulary. No scoring trick recovers a missing synonym.

Aliases get reviewed in PRs the same way we review copy. When a domain term enters the product vocabulary (attractiveness, COGS, what-if), we add the alias in the same PR that ships the feature, before any user files a bug about Cmd+K being “weird”. The palette config is a product surface.

Intent lives in the DOM

A palette has three latencies worth measuring, and only one is interesting. Keystroke-to-results is solved by not being asynchronous: sixty-nine static items get scored synchronously on every keystroke, no debounce, no transition, results render in the same React commit as the keystroke. Commit-to-render is solved by the router’s loader cache; navigation is a memo-and-swap.

Highlight-to-route-ready is the one worth a paragraph, because the naive approaches all fail for the same structural reason: they don’t reflect intent.

Prefetch every visible result on render. Wasteful: twelve rows, twelve requests, one selection.
Prefetch on Enter. Too late. The point is to hide the request behind the navigation.
Prefetch on hover. Half-right, but it doesn’t work for keyboard users, who are most palette users.

The right signal is the user is leaning toward this option, and the library already exposes it as aria-selected="true" on the currently highlighted row. We piggyback:

const observer = new MutationObserver(() => {
  const selected = node.querySelector<HTMLElement>(
    '[aria-selected="true"][data-path]',
  );
  if (!selected) return;
  const path = selected.dataset.path;
  if (path && !prefetchedRef.current.has(path)) {
    prefetchedRef.current.add(path);
    router.preload(path);
  }
});

Twenty-five lines. No parallel React state, no library fork, no useEffect chasing a moving target. When the library and the DOM already agree on the source of truth, the React way is to observe it, not mirror it.

Three small bugs lived in those twenty-five lines.

Re-prefetching. Arrowing up and down ten times must not fire ten requests for the same row. prefetchedRef dedupes within a session.

Stale dedupe. If we never reset the set, a row previously prefetched in a long-lived tab is never prefetched again, even if the underlying data has since mutated. The fix is to clear the set when the dialog closes, in the same callback ref that mounts the observer. Observed state still needs a defined lifetime; dialog-open is the right one.

Fighting the library. The wrong impulse, the one we tried first, was to hoist the library’s selection model into our own React state so we could react with useEffect. It lost to the observer on every axis: fewer lines, no coupling, fewer renders, no stale state. Mirroring meant we had to write a synchroniser; observing meant we didn’t.

Worth being explicit: the prefetch only feels free because the router’s loader cache turns preload(path) into a shelf the navigation later picks up. Without that, the palette’s prefetch hook would be firing requests into a vacuum. The palette’s job is to fire the prefetch at the right moment. The router’s job is to honour it.

Decoupling at the event boundary

The palette is the entry point to every action in the product. The wrong architecture is to make it the owner of those actions: import every modal, hold every open/close state, pass every prop. We tried it. It scales to about ten modals before the palette is a thousand-line prop pipe.

The right architecture is a CustomEvent on window. The palette dispatches "modal:country"; the country page’s modal host listens for "modal:country" and opens itself. The palette imports zero modal components and knows zero modal states. Thirty-plus modals later the palette is still the same size.

The cost is type safety, and the cost is real:

// This used to compile.
dispatchModalEvent("modal:contry");

dispatchModalEvent("modal:contry") used to compile, dispatch, and silently do nothing. No listener matched the typo, the CustomEvent fell through, the modal never opened. Three of these shipped to master and survived code review. The failure was invisible: clicking the quick action did nothing, and nobody hit it often enough to notice for weeks. The bug class is the worst kind. Wrong, low-frequency, never throws.

The fix is module augmentation. The library declares an empty interface; each modal host augments it from its own file:

// In the library
export interface RegisteredModals {}
export type ModalEventType = keyof RegisteredModals;
export type ModalPayload<T extends ModalEventType> = RegisteredModals[T];

export function dispatchModalEvent<T extends ModalEventType>(
  type: T,
  payload?: ModalPayload<T>,
) { /* ... */ }

// In each modal host
declare module "@/hooks/use-modal-registry" {
  interface RegisteredModals {
    "modal:country": { countryId?: string };
    "modal:cogs": undefined;
  }
}

keyof RegisteredModals is now a string union narrowed to exactly the registered events. A typo is a compile error. A payload drift is a compile error. Deleting a host while leaving a dispatch call is a compile error. The runtime decoupling is unchanged. The constraint lives entirely in the types.

The reason this works is that the palette and the modals change at different rates. Palette entries are written by anyone; modal internals are owned by feature teams. A string-named event is the right shape for the boundary between two rates of change, and module augmentation gives back the compile-time safety the boundary would otherwise lose.

What TypeScript still can’t catch

Even with module augmentation, one class of bug survives: omission. Register a modal type, wire a listener, build the modal, dispatch from a button somewhere in the UI, and forget to add the entry to command-palette.config.ts. The modal works. It just isn’t reachable from Cmd+K. We had three of these in master, discovered only when we wrote the test that found them.

The test is a forty-line file-tree walker. It greps the source for every useModalState("modal:X") call, builds the set of registered palette events from the config, and fails on mismatches in either direction:

× These modal events have a useModalState listener but no command palette
  config entry. Users can't reach them via Cmd+K.

  Missing: ["modal:archive-formulation"]

× These command palette entries dispatch modal events that no component
  listens for. Selecting them silently does nothing.

  Orphaned: ["modal:legacy-thing"]

The whole guard runs in about four hundred milliseconds, on every CI build.

The reason this kind of test catches things the compiler can’t is that the invariant it asserts lives across files, often across layers. Every X should have a matching Y is an architectural claim, and architectural claims are invisible to a type system because the type system can’t see across the layers they straddle. Of the half-dozen bugs that would have caused us real pain over the year we shipped this, three were caught by module augmentation, three by the structural test, and none by code review.

What we’d tell someone building one

Decide what category of tool you’re building before you write any matcher. A relevance engine and an input device look identical and have opposite goals. The category decision determines whether you score by sums or by bands, whether you debounce or commit synchronously, whether you optimise for recall or for certainty. The implementation barely makes sense without it.

Write the synonyms. The matcher is downstream of the vocabulary. Sixty aliases across sixty-nine items is not over-engineering; it’s the actual product. Treat them as copy, review them in PRs, and don’t ship the palette assuming the algorithm will compensate. It won’t.

Put the boundaries where the rates of change live. Selection state changes on every keystroke and lives in the DOM, so observe it from the DOM rather than mirroring it. Modal contents change at feature-team cadence and palette entries change at copy cadence, so connect them with a string-named event and a typed registry, not an import. Most of the architecture of a palette is choosing the right cleavage planes; the code that follows is almost mechanical.

The palette is a fuzzy matcher and a <Dialog>. Most of the work that made it feel like a tool was the work of deciding what kind of thing it was.