Skip to content

Commit

Permalink
Add initial draft for gzip codec (#55)
Browse files Browse the repository at this point in the history
  • Loading branch information
jrbourbeau authored Mar 31, 2020
1 parent 3d4869e commit c7b83a9
Show file tree
Hide file tree
Showing 4 changed files with 136 additions and 2 deletions.
11 changes: 11 additions & 0 deletions docs/codecs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
======
Codecs
======

Under construction.

.. toctree::
:maxdepth: 1
:caption: Contents:

codecs/gzip/v1.0
122 changes: 122 additions & 0 deletions docs/codecs/gzip/v1.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
========================
Gzip Codec (version 1.0)
========================
-----------------------------
Editor's draft 31 March 2020
-----------------------------

Specification URI:
https://purl.org/zarr/spec/codecs/gzip/1.0
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codecs-gzip-v1.0>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/codecs/gzip/v1.0.rst>`_

Copyright 2020 `Zarr core development
team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO
list institutions?). This work is licensed under a `Creative Commons
Attribution 3.0 Unported
License <https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

This specification defines an codec for chunk compression using Gzip


Status of this document
=======================

This document is a **Work in Progress**. It may be updated, replaced
or obsoleted by other documents at any time. It is inappapropriate to
cite this document as other than work in progress.

Comments, questions or contributions to this document are very
welcome. Comments and questions should be raised via `GitHub issues
<https://github.com/zarr-developers/zarr-specs/labels/codecs-gzip-v1.0>`_. When
raising an issue, please add the label "codecs-gzip-v1.0".

This document was produced by the `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.


Document conventions
====================

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Chunk encoding/decoding with Gzip
=================================

@@TODO define how chunks are encoded and decoded
@@TODO be sure to clarify that the encoded data should conform to the gzip file format

Chunks are encoded and decoded using the compression algorithm defined in
[RFC1951]_ and encoded data should conform to the Gzip file format [RFC1952]_.
The compression level is an integer from 0 to 9 which controls the speed and
level of compression. A level of 1 is the fastest compression method and
produces the least compressions, while 9 is slowest and produces the most
compression. Compression is turned off completely when level is 0.


Configuring codec in array metadata
===================================

@@TODO define how to specify in array metadata documents.

The Gzip codec can be specified as a compressor for a Zarr array under the
``compressor`` name in the corresponding array metadata document. The URI for
the Gzip codec defined in this specification is
https://purl.org/zarr/spec/codecs/gzip/1.0.

Additionally, the compression level must be specified as the value of the
``level`` name in the ``configuration`` metadata name. For example, the array
metadata document below specifies a Gzip codec configured with a compression
level of 1::


{
"compressor": {
"codec": "https://purl.org/zarr/spec/codecs/gzip/1.0",
"configuration": {
"level": 1
}
},
}


References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119
.. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version
1.3. Requirement Levels. May 1996. Informational. URL:
https://tools.ietf.org/html/rfc1951
.. [RFC1952] P. Deutsch. GZIP file format specification version 4.3.
Requirement Levels. May 1996. Informational. URL:
https://tools.ietf.org/html/rfc1952
Change log
==========

@@TODO
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Under construction.

protocol
stores
codecs


Indices and tables
Expand Down
4 changes: 2 additions & 2 deletions docs/protocol/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -807,7 +807,7 @@ compressed using gzip compression prior to storage::
},
"chunk_memory_layout": "C",
"compressor": {
"codec": "https://purl.org/zarr/spec/codec/gzip",
"codec": "https://purl.org/zarr/spec/codec/gzip/1.0",
"configuration": {
"level": 1
}
Expand Down Expand Up @@ -837,7 +837,7 @@ chunking as above, but using an extension data type::
},
"chunk_memory_layout": "C",
"compressor": {
"codec": "https://purl.org/zarr/spec/codec/gzip",
"codec": "https://purl.org/zarr/spec/codec/gzip/1.0",
"configuration": {
"level": 1
}
Expand Down

0 comments on commit c7b83a9

Please sign in to comment.