Why I retired my custom transform codec for VP9
NVC's first five base codecs (BAS0–BAS5) were custom from scratch:
- BAS0: RLE
- BAS1: 4×4 Hadamard transform + zero-run + signed varints
- BAS2: + spatial intra prediction
- BAS3: + previous-frame motion modes
- BAS4: + Huffman entropy coding
- BAS5: + packetized GOPs for HTTP range loading
Each of these took real work. The codec design space is genuinely fun.
Then I ran a benchmark across five synthetic content types — testsrc2, cellauto rule 110, Conway's life, gradients, and noise — encoded each with BAS5 (NVC W1 profile) and with plain VP9 at matched byte budgets. Scored every output with VMAF + PSNR + SSIM against the source.
Plain VP9 won every single sample, often by 10–70 VMAF points, while using fewer bytes than BAS5.
The ugly numbers
| sample | BAS5 W1 size | BAS5 VMAF | plain-VP9 matched-byte VMAF |
|---|---|---|---|
| testsrc2 | 1.59 MB | 85.6 | 98.6 (1.16 MB) |
| cellauto | 9.6 MB | 20.9 | 99.9 (1.68 MB) |
| life | 13.4 MB | 55.0 | 86.3 (11.8 MB) |
| gradients | 0.75 MB | 94.5 | 96.0 (354 KB, half the bytes) |
| noise | 10.8 MB | 49.3 | 66.3 (9.9 MB) |
The cellauto and life results were the wake-up call: BAS5 was 30× less efficient than VP9 on high-frequency content, despite being designed for it.
Why VP9 wins
VP9 has had ~15 years of human optimization across motion search, intra-prediction modes, transform variants, loop filtering, and context-adaptive entropy coding. My BAS5 had four weekends. The math wasn't going to favor me.
But the more interesting observation is why this matters less than I thought it would.
A neural codec doesn't replace the base codec — it augments it. The base just needs to encode small downscaled frames efficiently. That's exactly what VP9 is best at. So pivoting NVC's base to VP9 doesn't undermine the project; it lets the neural parts (super-resolution, per-clip distillation, side-data conditioning) carry the load they're meant to carry.
What changed
BAS6 is now NVC's default base format: a libvpx-vp9 IVF bitstream wrapped in our chunk header. The encoder shells to ffmpeg+libvpx-vp9 (no new C dependencies), and a 56-byte header carries dimensions, fps, and CRF.
After the swap:
- W1 BAS6 .nvc files are 3–13× smaller than the equivalent BAS5
- The VP9 IVF inside .nvc plays in the browser via WebCodecs
VideoDecoderdirectly — no custom decoder - The neural side now does what it should: contribute quality on top of an efficient base, not paper over a leaky one
The lesson
When a deeply-optimized commodity component does the boring 80%, your novel work goes much further. NVC's interesting bits are the model bundling and the side-data composition. Those don't need a heroic custom transform codec; they need a clean place to plug in.
BAS0–BAS5 are still there in the spec — older .nvc files keep playing — but the encoder writes BAS6 by default and that's where the project's energy goes now.