Skip to content

XYUV File format

Stian Valentin Svedenborg edited this page Mar 2, 2016 · 5 revisions

The XYUV File Format Version 1

The core feature of libxyuv is the xyuv file format. It is designed to be a format able to encode almost any YUV (YCbCr) image format you can imagine, but can also be used for other raw formats. Given the following assumptions:

  • The encoded format has at most 4 color components (including alpha).
  • Supported color spaces can be converted back to RGB using a 3x3 matrix.
  • Sample values are encoded using unorm values.

The primary goal of the library is to be able to encode all YUVA and RGBA raw images that are "out there", if you have another raw format, YUV, RGB or possibly another that could fit into this library without too much redesign. Feel free to drop me a comment and I'll consider supporting it for a later version of the format.

A note of caution: The details below can be daunting, the description of an image is abstract and at times hard to wrap your head around. Don't worry to much, there is already quite a number of formats supported by the library and that can be both generated and rad without having to worry too much about the nitty-gritty details of the file format. For a higher level view, please consult the create your own formats page

Into the Details.

The xyuv file format contains a variable length header and has the following overall structure:

  • A .xyuv file consists of one or more frames. Each frame consists of:
    • Frame header
    • One or more plane descriptors
    • Exactly 4 channel-descriptors
    • Raw Image Data

All multi-byte fields in the file format are encoded as BIG endian.

The Frame Header

char magic[8];           // Must be: "XYUV_FMT" (no '\0')
uint32_t checksum;       // Currently reserved, must be 0.
uint16_t version;        // Version of fileformat, currently 0x0001.
uint16_t offset_to_data; // Offset from start of frame to start of raw data.
uint64_t payload_size;   // Size of this frame, size of header + raw data.

char fourcc[4];          // FOURCC.org code, if any, this field has no semantic meaning.
uint32_t reserved;       // Reserved, should be zero.
uint8_t origin;          // Origin of image position of coordinate (0,0), may be either 0 (Upper left) or 1 (Lower left).


uint32_t width;          // Frame width, for YUV images this is the size in luma samples. (Even if the actual data has no luma channel).
uint32_t height;         // Frame height. For YUV images, height in luma samples.
uint8_t n_planes;        // Number of separate planes. This decides how many plane descriptors follow.

// Chroma siting information is primarily present for YUV images and other sub-sampled formats. 
// It defines how many luma samples are present per chroma sample. e.g. YUV 4:2:0 will have wxh = 2x2.
uint8_t macro_px_w;      // Width of macro pixel. 
uint8_t macro_px_h;      // Height of macro pixel.
float u_x;               // Horisontal position of the chroma siting of the U channel. Must be in the range [0.0, 1.0]
float u_y;               // Vertical siting position of U. Must be in [0.0, 1.0]  
float v_x;               // Horisontal siting position of V. Must be in [0.0, 1.0]  
float v_y;               // Vertical siting position of V. Must be in [0.0, 1.0]  

// Fields describing YUV to RGB color conversion
float rgb_to_yuv[9];     // 3x3 RGB -> YUV conversion coefficients matrix, row major order.
float yuv_to_rgb[9];     // 3x3 YUV -> RGB conversion coefficients matrix, row major order.

float y_range_min;       // Minimum value of Y channel, given an RGB value in [0.0, 1.0].  
float y_range_max;       // Maximum value of Y channel, given an RGB value in [0.0, 1.0].  
float u_range_min;       // Minimum value of U channel, given an RGB value in [0.0, 1.0].  
float u_range_max;       // Maximum value of U channel, given an RGB value in [0.0, 1.0].
float v_range_min;       // Minimum value of V channel, given an RGB value in [0.0, 1.0].
float v_range_max;       // Maximum value of V channel, given an RGB value in [0.0, 1.0].

// These values are here to support both 8.0, 8.2 and 8.8 bit unorm output values for narrow/studio range YUV images. In short it defines which values are the minimum and maximum values of each encoded sample.
// e.g. For bt601 studio range, the values for Y would be {16/255, 235/255} which is {0.062745098, 0.921568627}.
// Unless you are encoding very special formats or narrow range, just set minimum to 0.0 and max to 1.0.
float y_packed_range_min; 
float y_packed_range_max;
float u_packed_range_min; 
float u_packed_range_max;
float v_packed_range_min; 
float v_packed_range_max;

That concludes the file header. I will create further wiki-pages detailing some of the issues especially around the encoding and packing of values. But this should do for now to understand a bit more of what information is encoded.

The Plane Descriptor

After the frame header comes n_planes plane descriptors. These aim to describe the macroscopic view of your raw image. The idea is that if all you want to do is load a raw image to use it in an existing graphics library, all the information you need is encoded in these descriptors.

uint64_t offset_to_plane;  // Offset from the start of the raw data to this plane.
uint64_t plane_size;       // Size (in bytes) of the plane. 
uint32_t line_stride;      // The stride of a line of blocks (measured in bytes) in this plane.
uint32_t block_stride;     // The stride of a block of pixels (in bits) in this plane. 
uint8_t interleave_mode;   // Interleave mode, may be 0 (not interleaved), 1 (Line 0 then 2 then 4 ... then 1 then 3...), or 2 (Line 1 then 3 then 5 ... then 0 then 2...).
uint32_t mega_block_width; // Width of mega block, must be 1 if no reordering
uint32_t mega_block_height;// Height of mega block, must be 1 if no reordering
uint8_t x_mask[32];        // x_mask, see create your own format documentation
uint8_t y_mask[32];        // x_mask, see create your own format documentation

The Channel Descriptors

Next up is exactly 4 channel descriptors, one for Y, U, V and A respectively (or R, G, B, A if that is your flavour).

The channel descriptors are here to encode the location of each sample on the bit level. The xyuv file format currently only supports samples encoded in unorm formats, that is as an unsigned integer where the largest value represents 1.0 and the lowest value represents 0.0.

NB, the attentive reader may notice that this can potentially collide for chroma planes, which typically have a range from roughly [-0.5, 0.5] we solve this by asserting that even these formats are unorm and offset by 0.5. (i.e. we will encode -0.5 as unorm(0).) Future versions of the file format may support snorm if required.

Each channel descriptors describe the layout of the samples in a "block". The "block" concept points back to the plane descriptors and can be seen as the smallest number of samples that constitute a building block of your image. Say for instance that you divide your image into blocks of 4x4 pixels such that (in the data) all samples for a 4x4 pixel region come before the samples of the next 4x4 block. Then 4x4 is the size you add here.

Another important concept is that all planes are traversed at the same time. What I mean by this is that it is perfectly legal to specify that the Y channel in a 2x2 block has it's first sample on plane 0, the second on plane 4 and the third and forth on plane 2. As long as your blocks are sound, this should be no problem. You might think this is an over-complication, but believe me, there are formats out there that need this abstraction.

uint16_t block_w;                 // Width of block of samples represented in a block. 
uint16_t block_h;                 // Height of block of samples in a block.  
uint32_t n_continuation_samples;  // This is the number of _continuation samples_, see sample descriptor below.

Sample Descriptors

The last piece of the puzzle is the sample descriptors. Each channel descriptor is followed by exactly block_w*block_h + n_continuation_samples sample descriptors.

Each sample descriptor describe where to read some bits from the target image. Samples are always listed with origin in the top-left corner, i.e. the first sample is the top left pixel in your block.

uint8_t plane;             // The plane this sample lives in.
uint8_t integer_bits;      // The number of integer bits to read from the offset. 
uint8_t fractional_bits;   // The number of fractional bits to read from the offset.
bool has_continuation;     // If this is true, then the bits in this sample with be concatenated with the bits in the next sample. This effectively allows you to split bits inside a block (and even across planes).
uint16_t offset;           // Offset to the least significant bit in the sample.

A note on the offsets. Blocks in xyuv are little endian. That is, if you imagine the memory addresses increasing from the right to the left (... 0x4, 0x3, 0x2, 0x1). Each sample offset is now the number of bits from the right-hand side of your block. i.e. from the least significant bit in the byte with the largest address.