Fix reading formats 8, 310, and 311 #327

bemoody · 2021-09-24T17:06:37Z

Format 8 (not to be confused with format 80) is a format in which each sample is stored as an 8-bit difference from the previous sample. This was handled incorrectly; for example:

$ seq 1 5 | wrsamp -o x8 -O 8
$ rdsamp -r x8
              0       1
              1       2
              2       3
              3       4
              4       5
$ python3 -q
>>> import wfdb
>>> wfdb.rdrecord('x8', physical=0).d_signal
array([[0],
       [1],
       [1],
       [1],
       [1]])

This format is rarely used (should not be used) nowadays, but it remains supported by WFDB, so handling it incorrectly will give misleading results. Note that wr_dat_file doesn't support writing this format and I'm inclined to keep it that way.

Formats 310 and 311 are two different formats that store 10-bit samples as four-byte blocks of three samples each. These formats were handled correctly for the most part, but they would crash if sampto was not divisible by 3; for example:

>>> wfdb.rdrecord('sample-data/310derive', sampto=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benjamin/wfdb-python/wfdb/io/record.py", line 3542, in rdrecord
    return_res=return_res)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1010, in _rd_segment
    sampfrom, sampto, smooth_frames)[:, r_w_channel[fn]]
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1169, in _rd_dat_signals
    signal = sig_data.reshape(-1, n_sig)
ValueError: cannot reshape array of size 3 into shape (2)
>>> wfdb.rdrecord('sample-data/310derive', sampto=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benjamin/wfdb-python/wfdb/io/record.py", line 3542, in rdrecord
    return_res=return_res)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1010, in _rd_segment
    sampfrom, sampto, smooth_frames)[:, r_w_channel[fn]]
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1208, in _rd_dat_signals
    _check_sig_dims(signal, read_len, n_sig, samps_per_frame)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1643, in _check_sig_dims
    raise ValueError('Samples were not loaded correctly')
ValueError: Samples were not loaded correctly

wr_dat_file doesn't support writing either of these formats, but would be worth adding them in the future.

Mark Python source files as "diff=python" to show the correct function names, and mark various types of binary files as binary to avoid attempting to diff/merge them.

In each branch, the calls to _rd_dat_signals with and without 'no_file=True, sig_data=sig_data' were otherwise identical. sig_data is ignored if no_file is false, so this is equivalent to simply passing 'no_file=no_file, sig_data=sig_data'. Furthermore, since _rd_dat_signals takes a huge number of arguments, convert all of them to keyword style to avoid confusion.

All four of the calls to _rd_segment were identical apart from the 'no_file' and 'sig_data' arguments. Rearrange the code so there is only one call to _rd_segment, to avoid redundancy. Furthermore, since _rd_segment takes a huge number of arguments, convert all of them to keyword style to avoid confusion.

This should be a list containing the initial sample value for each signal; this is required in order to correctly read format-8 dat files.

In signal format 8, each sample is stored as an 8-bit signed difference from the previous sample. This means that after reading the raw byte values, they must be translated to absolute sample values by calling cumsum() and adding the initial value.

In formats 310 and 311, each block of three samples is written as four bytes. _rd_dat_signals will retrieve the minimum range of bytes (as determined by _dat_read_params and _required_byte_num) that are needed in order to decode the desired samples; thus, the data passed to _blocks_to_samples may include an incomplete block at the end. The previous implementation of _blocks_to_samples was meant to pad the input data to a multiple of four bytes. However, this logic was wrong: added_samps was always set to zero, so the intended extra bytes were not appended, and (if the lack of extra bytes didn't cause an error) the wrong number of samples was returned to the caller. In fact, the subsequent statements for decoding blocks into samples already worked correctly for an unpadded input array (since each input slice is correctly truncated to the length of the output slice.) So remove the padding logic entirely.

The record "binformats" contains one signal in each of the ten WFDB binary formats (8, 16, 61, 80, 160, 212, 310, 311, 24, and 32.) In this record, sample j of signal i is equal to: (i + 16843019 * j) % ((1 << adcres) - 1) + 1 - (1 << (adcres - 1))) Note that the length of the record is 499 samples, so each of the bit-packed data files ends with an incomplete data block. Use this record to test that it is possible to read all of the formats correctly, including when we skip one or two samples from the start and/or end of the record. (Skipping samples is expected to give incorrect results for format 8, so that signal is not required to match.) We do not test writing, since not all formats are currently supported by wr_dat_file.

tompollard · 2021-10-14T20:35:59Z

thanks @bemoody, looks good to me.

Benjamin Moody added 8 commits September 24, 2021 12:39

Add .gitattributes file.

f6e5179

Mark Python source files as "diff=python" to show the correct function names, and mark various types of binary files as binary to avoid attempting to diff/merge them.

_rd_segment: add init_value argument.

4d18c53

This should be a list containing the initial sample value for each signal; this is required in order to correctly read format-8 dat files.

_rd_dat_signals: add init_value argument.

ce1834d

This should be a list containing the initial sample value for each signal; this is required in order to correctly read format-8 dat files.

bemoody force-pushed the more-signal-fmts branch from d16dd96 to 14048dd Compare September 24, 2021 19:22

tompollard merged commit 8fc10ab into master Oct 14, 2021

tompollard deleted the more-signal-fmts branch October 14, 2021 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reading formats 8, 310, and 311 #327

Fix reading formats 8, 310, and 311 #327

bemoody commented Sep 24, 2021

tompollard commented Oct 14, 2021

Fix reading formats 8, 310, and 311 #327

Fix reading formats 8, 310, and 311 #327

Conversation

bemoody commented Sep 24, 2021

tompollard commented Oct 14, 2021