How I Cut Page Load Time by 90%
I built Zugzwang, a browser-based chess puzzle app powered by Stockfish WASM. It worked, but the initial load was brutal—nearly 39 seconds on a throttled 4G connection before the board was even visible. Slower devices showed even worse numbers.
If you're shipping WASM or heavy client-side dependencies, you've probably hit this wall.
What was the holdup? Everything was on the critical path. A modal the user hadn't opened yet, CSS for themes the user hadn't selected, and a 7.3 MB chess engine the user didn't need until their first move.
Three commits later, board-visible time dropped from 38.7s to 3.7s (p50). Here's exactly what I did.
Defining the Metrics
Before making any changes, I needed clear targets.
Board-visible time is my primary metric: the moment the chessboard element (.ui-board-root) first renders with a non-zero layout box. This is custom instrumentation via Playwright's MutationObserver, and it directly measures "when can the user see the puzzle and start thinking."
LCP (Largest Contentful Paint) is the standard Web Vital that captures when the largest element finishes rendering. In this app, LCP closely tracks board-visible time but includes additional paint work. I used LCP as a supporting metric via PerformanceObserver.
FCP (First Contentful Paint) measures when any content first appears. This stayed relatively stable across optimizations—the big wins came from what happened after first paint.
Cloudflare supports LCP and FCP out of the box and both Chrome DevTools and Lighthouse also report LCP and FCP. For all measurements, I used the throttled 4G simulation (1.6 Mbps down, 750 ms RTT) on the same machine with Playwright automation.
Commit 1: Lazy-Load the Menu Modal
Problem: The MenuModal component (stats, settings, theme pickers) was statically imported in the main app. Its code shipped in the main bundle and was parsed on every page load, even though most users don't open the menu immediately.
Fix: React's lazy() + Suspense.
const LazyMenuModal = lazy(async () => {
const module = await import("@/components/MenuModal");
return { default: module.MenuModal };
});
A shouldRenderMenu state gate ensures the component tree doesn't even mount <Suspense> until the user first opens the menu. This avoids the lazy chunk being prefetched by React before it's wanted:
const [shouldRenderMenu, setShouldRenderMenu] = useState(false);
useEffect(() => {
if (isMenuOpen) setShouldRenderMenu(true);
}, [isMenuOpen]);
Result:
| Metric | Before | After | Delta |
|---|---|---|---|
| Main JS (initial route) | 427.60 kB | 415.20 kB | -12.40 kB (-2.9%) |
| Main JS gzip | 133.74 kB | 130.99 kB | -2.75 kB (-2.1%) |
| MenuModal chunk (loaded on demand) | — | 16.88 kB (4.70 kB gzip) | — |
This change barely moved the needle. But it established the mental model for the bigger wins: if a component is behind a user interaction, it doesn't belong in your initial bundle.
Commit 2: Load Only the Selected Board and Piece Styles
Problem: The app ships multiple chessboard themes (blue, brown, gray, green) and piece sets. All of them were statically imported as CSS—meaning every user downloaded every theme on first load, even though they can only see one at a time.
Fix: A tiny style loader (chessground-style-loader.ts) that dynamically imports only the active theme's CSS:
const boardThemeLoaders: Record<string, () => Promise<unknown>> = {
blue: () => import("@/styles/chessground-board-theme-blue.css"),
brown: () => import("@/styles/chessground-board-theme-brown.css"),
// ...
};
const loadedBoardThemes = new Set<string>();
const loadingBoardThemes = new Map<string, Promise<void>>();
async function loadStyleOnce(
name: string,
loaded: Set<string>,
loading: Map<string, Promise<void>>,
loaders: Record<string, () => Promise<unknown>>
) {
if (loaded.has(name)) return;
if (loading.has(name)) return loading.get(name);
const promise = loaders[name]()
.then(() => { loaded.add(name); })
.finally(() => { loading.delete(name); });
loading.set(name, promise);
return promise;
}
The Board component calls ensureBoardThemeStyles(theme) and ensurePieceSetStyles(pieceSet) in useEffect hooks whenever the theme prop changes. The loaded set and loading map prevent duplicate requests.
UX guardrail: To avoid a flash of unstyled board, I load the user selected theme CSS synchronously in the initial bundle—only alternative themes load dynamically.
I also split the monolithic CSS file into per-theme files using CSS custom properties and gradients, which gave the bundler clean split points.
Result:
| Metric | Before | After | Delta |
|---|---|---|---|
| Initial app CSS | 159.51 kB | 81.15 kB | -78.36 kB (-49.1%) |
| Initial app CSS gzip | 33.02 kB | 12.92 kB | -20.10 kB (-60.9%) |
A 61% reduction in CSS over the wire. CSS is render-blocking by default—the browser won't paint anything until it's finished parsing all linked stylesheets. Cutting the CSS payload in half directly accelerated first paint.
Commit 3: Decouple Puzzle Render From Stockfish Startup
This was the big one. The app loads the Stockfish AI as WASM to simulate computer moves and provide user move feedback.
Problem: The app waited for the WASM to initialize before rendering the puzzle board. Stockfish's WASM binary is ~7.3 MB. On a slow connection, the user stared at a loading spinner for 36+ seconds before seeing a single chess piece.
But here's the thing: the user needs to see the board right away. The engine is only needed to validate moves. That's a meaningfully different moment in the user flow.
Deep Dive
These waterfall charts show exactly what changed. In the baseline, the board couldn't render until the 36-second WASM download completed:
Baseline: Board visibility blocked on Stockfish WASM (36.28s)
After decoupling, the board renders in ~3.7s while WASM downloads in the background:
Current: Board visible at 3.7s; WASM download continues in parallel
The striped bar in the current waterfall shows Stockfish still in-flight when the board becomes visible. That's the critical path fix visualized.
Fix: I restructured the initialization sequence so the puzzle data and board render proceed independently of the engine:
Fetch puzzles and render immediately. The puzzle JSON is small (~1.4s to load on throttled 4G). Once it arrives, mount the board.
Initialize Stockfish in the background. A
stockfishRefholds the engine instance; acreatePuzzleStrategy()function lazily initializes it on the first move that actually requires engine evaluation.Show engine state only when relevant. A new
isAwaitingEngineMoveflag drives a "Loading engine..." indicator inPuzzleInfo, but only when the user has made a move and the engine hasn't finished loading. Before that, the user sees the board and can think about the position.
// Lazy engine initialization — only when we actually need evaluation
function createPuzzleStrategy() {
if (stockfishRef.current) return engineStrategy(stockfishRef.current);
beginEngineWait();
return initStockfish()
.then(engine => {
stockfishRef.current = engine;
endEngineWait();
return engineStrategy(engine);
})
.catch(() => {
endEngineWait();
return solutionBasedStrategy(); // graceful fallback
});
}
Graceful Degradation
The fallback to solutionBasedStrategy() is a deliberate architectural choice. If Stockfish fails to load—network timeout, WASM unsupported, whatever—the app remains functional. It checks moves against the known solution line instead of running a full evaluation. Users lose engine analysis for alternative lines, but they can still solve puzzles. This matters for offline scenarios and older devices where WASM might be flaky.
I also added tests for the loading states (PuzzleInfo.test.tsx) covering the three key scenarios: active play, engine loading during validation, and puzzle completion.
Result:
| Metric | Baseline p50 | Current p50 | Delta |
|---|---|---|---|
| Board visible | 38,669 ms | 3,673 ms | -34,996 ms (-90.5%) |
| LCP | 38,688 ms | 4,488 ms | -34,200 ms (-88.4%) |
| FCP | 2,376 ms | 2,268 ms | -108 ms (-4.5%) |
The board now renders as soon as the puzzle data arrives. Stockfish loads in the background. The user starts thinking about the position 35 seconds earlier.
What I Considered But Didn't Ship
I also evaluated SSR, WASM streaming, and service workers.
SSR didn't address this bottleneck. It could pre-render markup, but Stockfish still downloads and initializes on the client regardless of how the markup arrives. The dominant cost—7.3 MB of WASM—remains unchanged. SSR would add architectural complexity without fixing the actual critical path.
WASM streaming was already present. Stockfish's runtime uses instantiateStreaming with a fallback path. I tested forcing non-streaming, and it barely moved the needle—engine readiness changed by ~35ms. Not worth pursuing.
Service workers are the one meaningful follow-up. They won't help cold-load board visibility, but they should dramatically reduce repeat-visit engine startup by caching the WASM binary. I'll try this next.
This investigation reinforced the core lesson: measure before you architect. SSR and streaming sounded like obvious wins until I traced the actual bottleneck.
Summary
| Change | What Moved Off the Critical Path | Key Savings |
|---|---|---|
| Lazy-load menu modal | 16.88 kB of JS (modal code) | -2.9% initial JS |
| Dynamic theme loading | Unused CSS themes and piece sets | -61% initial CSS (gzip) |
| Decouple Stockfish | 7.3 MB WASM binary | -90.5% board-visible time |
Full Distribution Results
| Metric | Baseline p50 | Current p50 | Delta | Baseline p95 | Current p95 |
|---|---|---|---|---|---|
| Board visible | 38,669 ms | 3,673 ms | -90.5% | 38,687 ms | 3,675 ms |
| LCP | 38,688 ms | 4,488 ms | -88.4% | 38,707 ms | 4,495 ms |
| FCP | 2,376 ms | 2,268 ms | -4.5% | 2,384 ms | 2,268 ms |
Methodology
All measurements from 7 cold-load runs per revision using Playwright with Chromium. Network throttled via CDP to simulate Slow 4G (1.6 Mbps down, 0.75 Mbps up, 750 ms RTT). Fresh browser context per run with cache disabled. Board-visible time measured via MutationObserver watching for .ui-board-root with non-zero layout box. LCP and FCP captured via PerformanceObserver.
Takeaways
Audit your critical path, not your bundle size. Bundle size is a proxy metric. The real question is: what does the user need to see and interact with right now, and what can wait? In this case, the largest payload (Stockfish WASM) wasn't even render-blocking by nature—I had just wired it up that way.
CSS is the silent blocker. JavaScript gets all the performance discourse, but CSS is render-blocking by default. Shipping 160 kB of CSS when the user only needs 80 kB means the browser is parsing themes the user will never see before it paints anything.
Lazy loading is a spectrum. React.lazy() is the obvious tool, but the same principle applies to CSS, WASM, and any asset. The pattern is always the same: identify the trigger (user interaction, route change, first move), load the asset at that trigger, and handle the loading state gracefully.
Measure before you architect. The "obvious" optimizations (SSR, streaming) weren't the right levers for this problem. The bottleneck wasn't server rendering or download efficiency—it was blocking on resources that weren't needed yet.
The full source is at github.com/sanjitsaluja/zugzwang-puzzle-trainer. If you're building with heavy WASM dependencies, I'd be curious to hear how you've handled the initialization tradeoff.