Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading formats 8, 310, and 311 #327

Merged
merged 8 commits into from
Oct 14, 2021
Merged

Fix reading formats 8, 310, and 311 #327

merged 8 commits into from
Oct 14, 2021

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Sep 24, 2021

Format 8 (not to be confused with format 80) is a format in which each sample is stored as an 8-bit difference from the previous sample. This was handled incorrectly; for example:

$ seq 1 5 | wrsamp -o x8 -O 8
$ rdsamp -r x8
              0       1
              1       2
              2       3
              3       4
              4       5
$ python3 -q
>>> import wfdb
>>> wfdb.rdrecord('x8', physical=0).d_signal
array([[0],
       [1],
       [1],
       [1],
       [1]])

This format is rarely used (should not be used) nowadays, but it remains supported by WFDB, so handling it incorrectly will give misleading results. Note that wr_dat_file doesn't support writing this format and I'm inclined to keep it that way.

Formats 310 and 311 are two different formats that store 10-bit samples as four-byte blocks of three samples each. These formats were handled correctly for the most part, but they would crash if sampto was not divisible by 3; for example:

>>> wfdb.rdrecord('sample-data/310derive', sampto=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benjamin/wfdb-python/wfdb/io/record.py", line 3542, in rdrecord
    return_res=return_res)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1010, in _rd_segment
    sampfrom, sampto, smooth_frames)[:, r_w_channel[fn]]
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1169, in _rd_dat_signals
    signal = sig_data.reshape(-1, n_sig)
ValueError: cannot reshape array of size 3 into shape (2)
>>> wfdb.rdrecord('sample-data/310derive', sampto=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benjamin/wfdb-python/wfdb/io/record.py", line 3542, in rdrecord
    return_res=return_res)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1010, in _rd_segment
    sampfrom, sampto, smooth_frames)[:, r_w_channel[fn]]
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1208, in _rd_dat_signals
    _check_sig_dims(signal, read_len, n_sig, samps_per_frame)
  File "/home/benjamin/wfdb-python/wfdb/io/_signal.py", line 1643, in _check_sig_dims
    raise ValueError('Samples were not loaded correctly')
ValueError: Samples were not loaded correctly

wr_dat_file doesn't support writing either of these formats, but would be worth adding them in the future.

Benjamin Moody added 8 commits September 24, 2021 12:39
Mark Python source files as "diff=python" to show the correct function
names, and mark various types of binary files as binary to avoid
attempting to diff/merge them.
In each branch, the calls to _rd_dat_signals with and without
'no_file=True, sig_data=sig_data' were otherwise identical.  sig_data
is ignored if no_file is false, so this is equivalent to simply
passing 'no_file=no_file, sig_data=sig_data'.

Furthermore, since _rd_dat_signals takes a huge number of arguments,
convert all of them to keyword style to avoid confusion.
All four of the calls to _rd_segment were identical apart from the
'no_file' and 'sig_data' arguments.  Rearrange the code so there is
only one call to _rd_segment, to avoid redundancy.

Furthermore, since _rd_segment takes a huge number of arguments,
convert all of them to keyword style to avoid confusion.
This should be a list containing the initial sample value for each
signal; this is required in order to correctly read format-8 dat
files.
This should be a list containing the initial sample value for each
signal; this is required in order to correctly read format-8 dat
files.
In signal format 8, each sample is stored as an 8-bit signed
difference from the previous sample.  This means that after reading
the raw byte values, they must be translated to absolute sample values
by calling cumsum() and adding the initial value.
In formats 310 and 311, each block of three samples is written as four
bytes.  _rd_dat_signals will retrieve the minimum range of bytes (as
determined by _dat_read_params and _required_byte_num) that are needed
in order to decode the desired samples; thus, the data passed to
_blocks_to_samples may include an incomplete block at the end.

The previous implementation of _blocks_to_samples was meant to pad the
input data to a multiple of four bytes.  However, this logic was
wrong: added_samps was always set to zero, so the intended extra bytes
were not appended, and (if the lack of extra bytes didn't cause an
error) the wrong number of samples was returned to the caller.

In fact, the subsequent statements for decoding blocks into samples
already worked correctly for an unpadded input array (since each input
slice is correctly truncated to the length of the output slice.)  So
remove the padding logic entirely.
The record "binformats" contains one signal in each of the ten WFDB
binary formats (8, 16, 61, 80, 160, 212, 310, 311, 24, and 32.)  In
this record, sample j of signal i is equal to:

  (i + 16843019 * j) % ((1 << adcres) - 1) + 1 - (1 << (adcres - 1)))

Note that the length of the record is 499 samples, so each of the
bit-packed data files ends with an incomplete data block.

Use this record to test that it is possible to read all of the formats
correctly, including when we skip one or two samples from the start
and/or end of the record.  (Skipping samples is expected to give
incorrect results for format 8, so that signal is not required to
match.)

We do not test writing, since not all formats are currently supported
by wr_dat_file.
@tompollard
Copy link
Member

thanks @bemoody, looks good to me.

@tompollard tompollard merged commit 8fc10ab into master Oct 14, 2021
@tompollard tompollard deleted the more-signal-fmts branch October 14, 2021 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants