Requirement: Include a CRC (cyclic redundancy check) of the complete record #12

krischer · 2018-01-03T23:34:27Z

Include a CRC (cyclic redundancy check) of the complete record.

chad-earthscope · 2018-01-06T02:10:48Z

This requirement was discussed in change proposal 6 to the 2016 strawman. The Quaterra CRC algorithm was offered for use in change proposal 25 to the 2016 strawman.

In the previous format specifications from IRIS I suggested the adoption of the CRC-32C (Castagnoli) algorithm for the following reasons:

It is well understood, standardized and documented in multiple RFCs, e.g. RFC 4960 and has been adopted by standardized transmission protocols (iSCSI, SCTP), file systems (Btrfs, ext4) and beyond.
Support is ubiquitous across many programming languages. Heck, many modern CPUs have hardware support for it as it is included in SSE 4.2, which is supported by both Intel and AMD.
It is really not that complex to code a simple version should that ever be needed.

andres-h · 2018-01-06T16:01:01Z

I think the type of CRC is at this stage an irrelevant implementation detail. The question is rather if there should be a CRC of complete record, partial record or no record-level CRC at all.

I'm in favor of partial record CRC.

krischer · 2018-01-08T09:33:52Z

The type of CRC should probably also be discussed as this stage as stage 3 of the design hopefully only deals with complete implementations and the type of CRC is a detail that could be sorted out by then. Thus we have to things to discuss here: Which type of checksum calculation and what parts should be checksummed.

Excerpt from @andres-h link:

All proposals add some sort of record checksum. The problem with #1 and #2 is that adding anything to the record (for example a blockette or CBOR object describing the quality control procedure that was made at the data centre), even increasing the data publication version, just to indicate that the data passed quality control procedures, invalidates the checksum. Due to a hardware or software glitch, it can happen that data was corrupted, but the (new) checksum is correct.

The concept of the current proposal (#3) is that data should be modified as little as possible. In particular, adding "quality control passed" should not invalidate the primary checksum if no other changes were made to the record. Multiple checksums and hashes are supported (CRC-32, MD5, SHA, etc.)

I personally disagree with this and think that everything should be checksummed - its cheap and also serves as a safety mechanism when touching any part of a record to some extent. It could also be two separate 16bit checksums - one for the data, one for the header. The risk of false positives might still be small enough to have to worry about it.

andres-h · 2018-01-08T14:22:04Z

I agree that everything should be checksummed, but having a single checksum does not match with the requirement (?) of being able to modify the records in the datacentre (adding QC, etc.). As a user, I want to check if the CRC matches the one that was generated by the digitizer or not. Also, I want to know which changes, if any, were done in the datacentre. Assuming that digitizers support NGF directly, similar to mseed2/seedlink (requirement?).

crotwell · 2018-01-08T16:42:04Z

I feel the checksum should be over the whole record and modification of a record should force a checksum recalculation. The purpose of this checksum is simply to detect corruption of data during transmission or storage, not to provide "providence" of the data back to the digitizer. While that is an important concept, it is a much larger problem and to do it correctly it needs to be done separately from the timeseries format.

For this purpose I think simpler is better, and so the CRC-32 makes most sense.

chad-earthscope · 2018-01-09T01:46:09Z

I agree with @crotwell. The most value of adding a CRC is to check whether the record has been corrupted during transmission or storage. Even if multiple CRCs could be used in a provenance scheme, requiring a reader to calculate multiple CRCs to do the "has this record been corrupted" check is not justified.

krischer · 2018-01-29T19:24:26Z

Summary

(Please let me know if I missed a point or misunderstood something)

There is agreement that we want CRC but not what algorithm or what "type" of CRC. Technically it is also clear that the CRC field must be set to some pre-determined value (or ignored) for the actual calculation of the CRC. Thus please vote on:

What should the CRC include? (complete record / partial record / no record-level CRC)
What algorithms should be used? (CRC-32/other suggestion)

crotwell · 2018-01-29T20:54:56Z

1 complete record
2 crc-32c

chad-earthscope · 2018-01-29T22:27:31Z

Complete record with the CRC field set to 0's.
CRC-32C

kaestli · 2018-01-30T10:12:59Z

complete record
crc-32

ozym · 2018-01-30T11:08:34Z

complete record as per @chad-iris
crc-32

claudiodsf · 2018-01-31T10:03:17Z

What should the CRC include? (complete record / partial record / no record-level CRC)

Complete record

What algorithms should be used? (CRC-32/other suggestion)

CRC-32 or any other lightweight algorithm

ihenson-bsl · 2018-01-31T17:38:22Z

1 complete record
2 crc-32c

ValleeMartin · 2018-02-02T14:18:24Z

Complete record
CRC-32 (or similar lightweight algorithm)

JoseAntonioJara · 2018-02-02T16:57:56Z

Complete record
CRC-32

krischer added the additional requirement label Jan 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirement: Include a CRC (cyclic redundancy check) of the complete record #12

Requirement: Include a CRC (cyclic redundancy check) of the complete record #12

krischer commented Jan 3, 2018

chad-earthscope commented Jan 6, 2018

andres-h commented Jan 6, 2018

krischer commented Jan 8, 2018

andres-h commented Jan 8, 2018

crotwell commented Jan 8, 2018

chad-earthscope commented Jan 9, 2018 •

edited

Loading

krischer commented Jan 29, 2018

crotwell commented Jan 29, 2018

chad-earthscope commented Jan 29, 2018

kaestli commented Jan 30, 2018

ozym commented Jan 30, 2018

claudiodsf commented Jan 31, 2018

ihenson-bsl commented Jan 31, 2018

ValleeMartin commented Feb 2, 2018

JoseAntonioJara commented Feb 2, 2018

Requirement: Include a CRC (cyclic redundancy check) of the complete record #12

Requirement: Include a CRC (cyclic redundancy check) of the complete record #12

Comments

krischer commented Jan 3, 2018

chad-earthscope commented Jan 6, 2018

andres-h commented Jan 6, 2018

krischer commented Jan 8, 2018

andres-h commented Jan 8, 2018

crotwell commented Jan 8, 2018

chad-earthscope commented Jan 9, 2018 • edited Loading

krischer commented Jan 29, 2018

Summary

crotwell commented Jan 29, 2018

chad-earthscope commented Jan 29, 2018

kaestli commented Jan 30, 2018

ozym commented Jan 30, 2018

claudiodsf commented Jan 31, 2018

ihenson-bsl commented Jan 31, 2018

ValleeMartin commented Feb 2, 2018

JoseAntonioJara commented Feb 2, 2018

chad-earthscope commented Jan 9, 2018 •

edited

Loading