Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read multi-frequency multi-segment records #331

Merged
merged 9 commits into from
May 4, 2022

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Oct 13, 2021

A multi-segment record can contain signals with multiple sampling frequencies, as in the MIMIC DB or the upcoming MIMIC-IV Waveform DB.

By default, rdrecord will convert the input to a uniform 2D array of samples; in the case of multi-frequency records it downsamples all signals to the frame frequency ("smooth frames"), and in the case of multi-segment records it stitches the input arrays together into one ("multi-to-single"). However, both these options can be turned off.

In particular, smooth frames mode is simply not a good idea; when actually analyzing a multi-frequency record we need to be able to read each signal at its original sampling frequency or better.

Currently, rdrecord disallows this:

>>> r = wfdb.rdrecord('041', pn_dir='mimicdb/041', sampto=10, smooth_frames=False)
...
ValueError: This package version cannot expand all samples when reading multi-segment records. Must enable frame smoothing.

It also fails in "non-multi-to-single" mode:

>>> r = wfdb.rdrecord('041', pn_dir='mimicdb/041', sampto=10, smooth_frames=False, m2s=False)
...
ValueError: This package version cannot expand all samples when reading multi-segment records. Must enable frame smoothing.

These changes fix both cases.

This is based on and depends on pull #313 (it might possibly work for mimicdb without those changes, FSVO "work", but both sets of changes are definitely needed for MIMIC-IV.)

@bemoody bemoody changed the title Read multi-frequency multi-segment records [WIP] Read multi-frequency multi-segment records Oct 27, 2021
@bemoody bemoody force-pushed the multisegment-multifrequency branch from e746de3 to c4a1257 Compare April 25, 2022 15:52
@bemoody
Copy link
Collaborator Author

bemoody commented Apr 25, 2022

This was a straightforward rebase and the changes themselves I think are fairly straightforward, but it's been a while since I looked at this code, so careful reviews would be welcome.

@bemoody bemoody changed the title [WIP] Read multi-frequency multi-segment records Read multi-frequency multi-segment records Apr 25, 2022
Copy link
Member

@cx1111 cx1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good, thanks for the new test cases. Please apply formatting before merge.

We really should rename the tests so that we can more easily see the various scenarios being covered.

@@ -1152,7 +1162,16 @@ def multi_to_single(self, physical, return_res=64):
reference_fields[field][ch] = item_ch
# mismatch case
elif reference_fields[field][ch] != item_ch:
if physical:
if field == 'samps_per_frame':
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fstrings please.

physical=False, smooth_frames=False)

np.testing.assert_array_equal(sig, sig_target)
assert record.__eq__(record_named)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does assert record == record_named work? Also, please add a comment above each rdrecord call in this test to explain the intention/difference.

@bemoody
Copy link
Collaborator Author

bemoody commented Apr 29, 2022 via email

Benjamin Moody added 7 commits May 3, 2022 16:01
Since commit 21764cf, the optional
parameter to wfdb.io.record.BaseRecord.check_field is called
"required_channels", not "channels".  The semantics appear equivalent.

Several callers were still referring to the old name "channels", while
most callers simply use positional arguments.  Change all callers to
use positional arguments for consistency.
When reading a multi-segment record, the samps_per_frame attribute
must be copied from the layout segment into the resulting flattened
record.

(Unlike most other signal attributes, samples per frame must be
uniform for a particular signal.  The layout header must be used to
determine the correct number of samples per frame for signals that are
not present in the selected segments.)
When reading a multi-segment record, the multi_to_single function is
used to stitch the segments together into a virtual single-segment
record.

To enable this to work in expanded (non-smooth-frames) mode, we want
to combine the 'e_p_signal' or 'e_d_signal' arrays from the individual
segments, rather than 'p_signal' or 'd_signal'.
When reading a multi-segment, multi-frequency record, we want to have
the option of reading each signal at its original sampling frequency,
which requires using smooth_frames=False.  Previously this simply
wasn't allowed, either with or without multi-to-single conversion.

To do this, we need to ensure each segment is loaded in the
appropriate (smooth or non-smooth) mode (which formerly would have
failed if certain segments *didn't* contain multiple samples per
frame.)

After loading the segments, we must invoke multi_to_single, if
desired, in the appropriate mode.
This test case uses an excerpt of record mimicdb/041/, which is a
fixed-layout record with three signals at 500 Hz (four samples per
frame) and four signals at 125 Hz (one sample per frame).
Modify the existing test case test_multi_variable_c to check that we
can read a single-frequency variable-layout record using
smooth_frames=False (which should retrieve the same data as
smooth_frames=True, but represented as a list of arrays rather than a
single array.)
@bemoody bemoody force-pushed the multisegment-multifrequency branch from c4a1257 to f1f805b Compare May 3, 2022 20:40
@bemoody
Copy link
Collaborator Author

bemoody commented May 3, 2022

Rebased to apply black formatting. Here's how:

rm -f .git/info/attributes
git checkout multisegment-multifrequency
git rebase 1e695017fa4722c8008e02ad514086ca8eb77d76
git config --local filter.black.clean 'black -tpy37 -l80 -q -'
git config --local filter.black.smudge 'black -tpy37 -l80 -q -'
echo '*.py filter=black' > .git/info/attributes
git rebase -Xrenormalize 3b408f23f50889c7d1c6fee9329c76b739b9f667
rm -f .git/info/attributes

Commit d25923b became a no-op and was therefore automatically dropped.

@tompollard tompollard merged commit efca603 into master May 4, 2022
@tompollard tompollard deleted the multisegment-multifrequency branch May 4, 2022 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants