May 5, 20262 min readnvcvideocodecvp9

Why I retired my custom transform codec (BAS5) for plain VP9

I spent weeks building a hand-rolled tiled-transform-Huffman-motion-compensated codec. A 5-content-type benchmark showed plain libvpx-vp9 was 3–13× more efficient. Here's what I learned.

Why I retired my custom transform codec for VP9

NVC's first five base codecs (BAS0–BAS5) were custom from scratch:

Each of these took real work. The codec design space is genuinely fun.

Then I ran a benchmark across five synthetic content types — testsrc2, cellauto rule 110, Conway's life, gradients, and noise — encoded each with BAS5 (NVC W1 profile) and with plain VP9 at matched byte budgets. Scored every output with VMAF + PSNR + SSIM against the source.

Plain VP9 won every single sample, often by 10–70 VMAF points, while using fewer bytes than BAS5.

The ugly numbers

sampleBAS5 W1 sizeBAS5 VMAFplain-VP9 matched-byte VMAF
testsrc21.59 MB85.698.6 (1.16 MB)
cellauto9.6 MB20.999.9 (1.68 MB)
life13.4 MB55.086.3 (11.8 MB)
gradients0.75 MB94.596.0 (354 KB, half the bytes)
noise10.8 MB49.366.3 (9.9 MB)

The cellauto and life results were the wake-up call: BAS5 was 30× less efficient than VP9 on high-frequency content, despite being designed for it.

Why VP9 wins

VP9 has had ~15 years of human optimization across motion search, intra-prediction modes, transform variants, loop filtering, and context-adaptive entropy coding. My BAS5 had four weekends. The math wasn't going to favor me.

But the more interesting observation is why this matters less than I thought it would.

A neural codec doesn't replace the base codec — it augments it. The base just needs to encode small downscaled frames efficiently. That's exactly what VP9 is best at. So pivoting NVC's base to VP9 doesn't undermine the project; it lets the neural parts (super-resolution, per-clip distillation, side-data conditioning) carry the load they're meant to carry.

What changed

BAS6 is now NVC's default base format: a libvpx-vp9 IVF bitstream wrapped in our chunk header. The encoder shells to ffmpeg+libvpx-vp9 (no new C dependencies), and a 56-byte header carries dimensions, fps, and CRF.

After the swap:

The lesson

When a deeply-optimized commodity component does the boring 80%, your novel work goes much further. NVC's interesting bits are the model bundling and the side-data composition. Those don't need a heroic custom transform codec; they need a clean place to plug in.

BAS0–BAS5 are still there in the spec — older .nvc files keep playing — but the encoder writes BAS6 by default and that's where the project's energy goes now.



Discussion0

Markdown supported. Rate-limited to 5 / minute. 0/5000

Be the first to comment.