Why I retired my custom transform codec for VP9

NVC's first five base codecs (BAS0–BAS5) were custom from scratch:

BAS0: RLE
BAS1: 4×4 Hadamard transform + zero-run + signed varints
BAS2: + spatial intra prediction
BAS3: + previous-frame motion modes
BAS4: + Huffman entropy coding
BAS5: + packetized GOPs for HTTP range loading

Each of these took real work. The codec design space is genuinely fun.

Then I ran a benchmark across five synthetic content types — testsrc2, cellauto rule 110, Conway's life, gradients, and noise — encoded each with BAS5 (NVC W1 profile) and with plain VP9 at matched byte budgets. Scored every output with VMAF + PSNR + SSIM against the source.

Plain VP9 won every single sample, often by 10–70 VMAF points, while using fewer bytes than BAS5.

The ugly numbers

sample	BAS5 W1 size	BAS5 VMAF	plain-VP9 matched-byte VMAF
testsrc2	1.59 MB	85.6	98.6 (1.16 MB)
cellauto	9.6 MB	20.9	99.9 (1.68 MB)
life	13.4 MB	55.0	86.3 (11.8 MB)
gradients	0.75 MB	94.5	96.0 (354 KB, half the bytes)
noise	10.8 MB	49.3	66.3 (9.9 MB)

The cellauto and life results were the wake-up call: BAS5 was 30× less efficient than VP9 on high-frequency content, despite being designed for it.

Why VP9 wins

VP9 has had ~15 years of human optimization across motion search, intra-prediction modes, transform variants, loop filtering, and context-adaptive entropy coding. My BAS5 had four weekends. The math wasn't going to favor me.

But the more interesting observation is why this matters less than I thought it would.

A neural codec doesn't replace the base codec — it augments it. The base just needs to encode small downscaled frames efficiently. That's exactly what VP9 is best at. So pivoting NVC's base to VP9 doesn't undermine the project; it lets the neural parts (super-resolution, per-clip distillation, side-data conditioning) carry the load they're meant to carry.

What changed

BAS6 is now NVC's default base format: a libvpx-vp9 IVF bitstream wrapped in our chunk header. The encoder shells to ffmpeg+libvpx-vp9 (no new C dependencies), and a 56-byte header carries dimensions, fps, and CRF.

After the swap:

W1 BAS6 .nvc files are 3–13× smaller than the equivalent BAS5
The VP9 IVF inside .nvc plays in the browser via WebCodecs VideoDecoder directly — no custom decoder
The neural side now does what it should: contribute quality on top of an efficient base, not paper over a leaky one

The lesson

When a deeply-optimized commodity component does the boring 80%, your novel work goes much further. NVC's interesting bits are the model bundling and the side-data composition. Those don't need a heroic custom transform codec; they need a clean place to plug in.

BAS0–BAS5 are still there in the spec — older .nvc files keep playing — but the encoder writes BAS6 by default and that's where the project's energy goes now.

Why I retired my custom transform codec (BAS5) for plain VP9

Why I retired my custom transform codec for VP9

The ugly numbers

Why VP9 wins

What changed

The lesson

Discussion0

Why I retired my custom transform codec for VP9

The ugly numbers

Why VP9 wins

What changed

The lesson

Related posts