DVID Flexibility and Comparisons

In this section, we'll look at DVID's flexibility, how it works with storage systems, and how you might tune DVID to work for your needs.

Flexibility through datatypes and storage backends

DVID is a data service that requires little administration and provides extreme flexibility at two levels: the datatypes and the underlying storage systems. HTTP requests are either handled by core DVID (e.g., requests to get repo metadata or the system load) or routed to a datatype for handling. For example, a request to get a 3d subvolume of grayscale data gets routed to the imageblk datatype, which breaks the requested subvolume into a series of range query calls to the storage system underlying that particular data instance.

DVID Datatypes using different storage backends

In the figure above, the datatypes labelblk and googlevoxels implement identical HTTP APIs for overlapping functionality like retrieving a subvolume of label data. However, the labelblk implementation uses an ordered key-value database backend while the googlevoxels instance proxies to a Google data service. DVID allows assignment of a storage backend to each data instance or all instances of a datatype. This flexibility permits tailoring of storage to the type of data we are processing.

For example, suppose we have dozens of terabytes of grayscale EM images that will have a corresponding segmentation volume of 64-bit labels. We create a "grayscale" data instance of the datatype uint8blk and have it use a local store like a leveldb on a local RAID-6 drive system, or a cloud store like Google Cloud Storage. Grayscale data tends to be immutable and its translation to key-value pairs via chunking is very simple, as we will explore later. For our segmentation, we create a "segmentation" data instance of the datatype labelblk and use a very fast leveldb on a NVMe SSD because label data, even though it is 64 bits vs 8 bits per voxel, is extremely compressible. This arrangement allows us to use smaller, high-speed storage (NVMe SSD) for our highly compressible and mutable segmentation, and slower, large-scale storage for our less compressible and mostly immutable grayscale images.

Case Study: Image Server

Let's explore how we might implement an image data HTTP service. First, consider the case of only needing a multi-scale tile server useful for browsing data.

The simplest solution is to use a fast, tested web server like nginx and a simple HTTP API where you specify the tile file name. The nginx server then sends the contents of the tile file.

You could write the web server using languages optimal for developing HTTP servers like the Go language, where creation of a production HTTP server takes little more than a few lines of code. This would allow you to provide a more subtle mapping between the exact HTTP API format and more control over how the data is stored on disk and sent to your clients.

DVID provides a tile serving implementation similar to the above Go server via its imagetile datatype, where the tile data is now stored in a database, cloud service, or in the future, even a generic file system. The imagetile implementation also lets you make arbitrarily-sized 2d image requests (within the canonical orientations) that will be automatically fulfilled by gathering intersecting tiles.

In each of the above cases, a tile generation system would store tile data into specific files or via a POST /tile call to DVID. (DVID can generate tiles directly, but for large-scale production, we recommend doing tile generation externally and then storing the results via a POST /tile.

Now, lets consider the case of accessing arbitrary image subvolumes or a cut plane at an angle. If you are using nginx, you now have to either extend the server or write some code that gets called from nginx. If you've built your own HTTP server from scratch, you can extend it to handle retrieval of all intersecting tiles and compute the requested data.

The DVID uint8blk datatype provides arbitrary image subvolumes or cut planes at the cost of ingesting the same data again, but in slightly different format. This denormalization, where grayscale image data is stored both as blocks and tiles, is necessary for low latency responses. If tile retrieval doesn't require extremely low latency, you can use just uint8blk data instance and make 2d image HTTP requests that will return the same data as the tiles, but computed on-the-fly from intersecting grayscale blocks.

Table of Contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DVID Flexibility and Comparisons

Flexibility through datatypes and storage backends

Case Study: Image Server

Clone this wiki locally