One process owns port 443. Two pinned worker threads terminate TLS 1.3 inline, parse HTTP/1.1, dispatch through a precompiled jump table, and send prebuilt response bytes straight back to the kernel.
Picoweb is not faster because it has a clever router. It is faster because the usual route is gone. There is no nginx terminator in front, no app server behind, no second parser, no middleware chain, no runtime allocator on the serving path.
The recipe, distilled:
| Compile the wwwroot | Hash every route into a flat jump table at startup. Dispatch is hash(path) → slot. |
| Store the wire form | Headers, 304s, ETags and Brotli bodies are wire-ready before the listen socket opens; rare identity clients decode from Brotli through worker scratch. |
| Keep TLS inline | Kernel socket → TLS records → HTTP parser in one address space. No terminator hop. |
| Shard by core | Two pinned workers, each with its own epoll loop, SO_REUSEPORT listener and connection table. |
| Never allocate hot | Connection tables and arenas are mmap'd, populated, touched and kept resident at startup. |
| Send resident bytes | Small bodies use cheap writes; large bodies can use MSG_ZEROCOPY so DMA does the copy-free transfer. |
The result is not a general application platform. It is a static HTTPS serving machine: minimal HTTP/1.1, a byte-driven TLS 1.3 engine, NEON/SHA/AES acceleration, SIMD parsing where it matters, and no dynamic work that can be done once at boot.
Numbers
| Run | Req/sec | p50 | p99 | RSS / RAM |
|---|---|---|---|---|
| Production picoweb Brotli, c=50 | 29,798 | 2.54 ms | 16.75 ms | 23.0 MiB RSS |
| Production picoweb Brotli, c=100 | 41,222 | 3.71 ms | 21.37 ms | 23.0 MiB RSS |
| Production picoweb identity, c=100 | 13,920 | 8.15 ms | 35.20 ms | 23.0 MiB RSS |
| Azure Blob static website, c=100 | 13,472 | 6.84 ms | 25.01 ms | managed service |
| One-worker picoweb harness | 32,091 | 667 µs | 8.2 ms | 13.3 MiB |
| nginx 1.27 comparison | 18,710 | 1.64 ms | 3.9 ms | 10.5 MiB |
| .NET 9 Kestrel comparison | 8,548 | 3.57 ms | 11.1 ms | 63.1 MiB |
The production and Blob rows are fresh cross-node runs from the load-test node. The harness rows are controlled one-worker comparisons on the earlier AKS Arm hardware. The current site runs on two burstable Standard_B2pls_v2 Arm nodes: one hard-tainted for picoweb, one carrying the load-test pod and the low-traffic co-tenants. The nodepool stays at two nodes because that is the cost limit.
The wild row is picoweb Brotli versus the current Azure Blob static website path: 41,222 requests/sec from one tiny picoweb pod versus 13,472 requests/sec from Blob. That is not a claim that picoweb beats Azure's storage platform on every axis. Identity versus identity is roughly tied. The win is that picoweb made the browser-native Brotli representation the resident wire form, while the Blob static website path is serving the larger identity HTML.
Why this matters
A typical stack buys flexibility by keeping layers alive: proxy, TLS terminator, app server, framework, router, allocator, compressor. Picoweb buys speed by refusing those layers on the serving path. The bytes are known, resident, compressed and indexed before traffic arrives; Brotli is the stored representation, not a sidecar duplicate.
That does not make nginx, Apache or Kestrel bad. It means they are solving a larger problem than this page needs solved. For this site, the useful work is narrower: accept HTTPS, find a static resource, return it quickly and consistently.
The split: picoweb on the server, BareMetalJsTools in the browser
TLS 1.3 inline, jump-table routing, pre-rendered heads, Brotli-primary bodies, arena allocation, zero-copy sends, NEON/SHA/AES crypto.
Plain CSS and JavaScript primitives. No framework, no bundler runtime, no virtual DOM. The same subtractive rule on the browser side.
The system described above is what serves wavefunctionlabs.com. Take the pieces, ignore the philosophy, or steal the diagram. The whole trick is visible: remove every layer that is not doing useful work for this request.