Sunday, March 5, 2017

Virtual global selector codebooks in Basis

Much of this information will be present in .basis's open source transcoder, so here's a little brain dump of how it works.

Selectors in DXT1 and ETC1 are 2-bits each. These values select which block color is used to represent each texel in a block. The block size in these formats is 4x4 texels, so each selector can be treated as a 16D vector. The uncompressed size of a selector vector is 32-bits (4*4*2).

One way of compressing these selector vectors is vector quantization (VQ). One problem with straightforward VQ is the expense of codebook storage. crunch stores selector codebooks in a compressed form, using order-1 delta compression on each vector's component, and it tries to rearrange the codebook entry order to minimize these deltas. With large codebooks the amount of bits needed to store the compressed selector vectors becomes prohibitive relative to the compressed texture data.

An alternative technique is to switch to a multi-level codebook scheme, where each .basis file has a "local" selector codebook which refers into a much larger constant "global" selector codebook. (This is somewhat inspired by Graham Devine's ROQ video file format used in Quake 3, which used multilevel codebooks.) Now we've got a serious memory problem, because the global (constant) selector codebook is going to require several megabytes of memory. A 1024^2 global selector codebook requires a 4MB table, which is undesirable on many platforms (i.e. WebGL).

.basis works around this problem by using a very small (256 entry) global selector vector codebook which is procedurally "amplified" using a series of small functions. These functions can rotate the vector by 90, 180, or 270 degrees, vertically flip the vector, change its contrast, add Gaussian noise to the vector, invert the vector, etc. It turns out that this method is surprisingly powerful and simple, and lends itself well to a hardware implementation.

To select a "virtual" selector codebook entry in this scheme, the encoder first selects a "seed" vector from the small global codebook (which requires 8-bits), then it specifies a series of control bits (typically 6-12) to select which procedural routines are used to modify the codebook entry. The control bits can be optionally arithmetically coded, or stored as-is. It's also possible to completely discard with the seed codebook and just use a PRNG (with some post processing) to generate the initial selector entries. In this case, the seed bits are used to prime the PRNG.

Function usage statistics in kodim18 (on each ETC1 block):

shift_x: Samples: 24576, Total: 12737.000000, Avg: 0.518270, Std Dev: 0.499666
shift_y: Samples: 24576, Total: 10472.000000, Avg: 0.426107, Std Dev: 0.494510
flip: Samples: 24576, Total: 10811.000000, Avg: 0.439901, Std Dev: 0.496375
rot: Samples: 24576, Total: 18679.000000, Avg: 0.760050, Std Dev: 0.427052
erode: Samples: 24576, Total: 6115.000000, Avg: 0.248820, Std Dev: 0.432329
dilate: Samples: 24576, Total: 7067.000000, Avg: 0.287557, Std Dev: 0.452623
high_pass: Samples: 24576, Total: 10076.000000, Avg: 0.409993, Std Dev: 0.491832
rand: Samples: 24576, Total: 12522.000000, Avg: 0.509521, Std Dev: 0.499909
div: Samples: 24576, Total: 10053.000000, Avg: 0.409058, Std Dev: 0.491660
shift: Samples: 24576, Total: 8016.000000, Avg: 0.326172, Std Dev: 0.468811
contrast: Samples: 24576, Total: 6080.000000, Avg: 0.247396, Std Dev: 0.431499
inv: Samples: 24576, Total: 12075.000000, Avg: 0.491333, Std Dev: 0.499925
median: Samples: 24576, Total: 7947.000000, Avg: 0.323364, Std Dev: 0.467760

rot's usage was ~76% because it was used 3/4 times. ~25% of the time the vector wasn't rotated. I've been continually surprised how easy it has been to find useful functions.

Here's a description of the current set of procedural functions in .basis:

  • shift_x/y: Shifts the block's selectors up or left by 1 row/column
  • flip: Vertical flip
  • rotate: Rotates by 0, 90, 180, or 270 degrees
  • erode: 3x3 erosion morphological  operator
  • dilate: 3x3 dilation morphological operator
  • high pass: 3x3 high-pass filter
  • rand: Adds Gaussian noise to the selectors, using the selectors themselves as a seed for the PRNG.
  • div: Selector remapping through table { 2, 0, 3, 1 }
  • shift: Adds 1 to the selectors with clamping
  • contrast: Boosts contrast of selectors by remapping them through 1 of 3 tables: { 0, 0, 3, 3 }, { 1, 1, 2, 2 }, { 1, 1, 3, 3 }
  • inv: Inverts the selectors
  • median: 3x3 selector median filter

The order that these functions are applied matters, and I'm still figuring out the optimal order. The control bits select which combination of the above functions is used to modify the selectors, and for a couple functions (like rot and contrast) multiple control bits are needed.

With a method like this it's possible to compress a selector vector down to 14-16 bits. Quality is extremely good. The biggest problem I'm working on solving now is how to efficiently search such large virtual codebooks during encoding without sacrificing too much quality or introducing artifacts. Full codebook searches are very slow.

No comments:

Post a Comment