Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read FLAC signal files #372

Merged
merged 11 commits into from
Jun 22, 2022
Merged

Read FLAC signal files #372

merged 11 commits into from
Jun 22, 2022

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented May 5, 2022

WFDB 10.7 defines new signal formats 508, 516, and 524. Signal files in these formats are compressed using the FLAC algorithm.

Here, we add support for reading these formats using the soundfile package, which is a wrapper around libsndfile. As such, you'll need to have libsndfile (and libFLAC.) The soundfile wheel packages for Windows and MacOS include a copy of libsndfile that's statically linked with libFLAC. On Linux or any other platform you will need to install libsndfile yourself, though there's a good chance you already have it.

Other implementation strategies are possible but this seems simplest.

The implementation in _signal.py is messier than I would like, because the _rd_dat_signals function combines low-level details (byte offsets, format-specific transformations) with high-level semantics (deskewing, length checking.) Preferably the _skew_sig/_check_sig_dims stuff would be moved into _rd_segment instead.

Writing FLAC files is not implemented yet, and I think I'd like to try to address some of the other issues with wrsamp first (#333, #336). Of course if somebody else wants to take a stab at it, feel free ;)

@bemoody bemoody force-pushed the flac branch 2 times, most recently from 26e94ce to f71f2a5 Compare May 5, 2022 20:26
@cx1111
Copy link
Member

cx1111 commented May 13, 2022

I see that this is still WIP. Can we all add type annotations to affected code from now on?

@bemoody
Copy link
Collaborator Author

bemoody commented May 17, 2022

Can we all add type annotations to affected code from now on?

It sounds like a good idea in principle, though it adds quite a bit of visual noise and we don't have an established style for doing so. Could we take this discussion to a separate issue?

@bemoody
Copy link
Collaborator Author

bemoody commented May 26, 2022

Can we all add type annotations to affected code from now on?

See issue #378.

@bemoody bemoody changed the title [WIP] Read FLAC signal files Read FLAC signal files Jun 2, 2022
pyproject.toml Outdated
@@ -10,6 +10,7 @@ python = "^3.7"
numpy = "^1.10.1"
scipy = "^1.0.0"
pandas = "^1.0.0"
soundfile = "^0.10.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble installing this with poetry install, which results in: SolverProblemError Because wfdb depends on soundfile (^0.10.0) which doesn't match any versions, version solving failed.. Running poetry add soundfile@^0.10.0 results in SoundFile = "^0.10.0" being added

https://pypi.org/project/SoundFile/

Copy link
Member

@cx1111 cx1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean. Please add a note in the README about installing WFDB and libsndfile.

wfdb/io/_signal.py Show resolved Hide resolved
import soundfile

for spf in samps_per_frame[1:]:
assert spf == samps_per_frame[0], "spf mismatch"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better error to use here?

else:
raise ValueError(f"unknown subtype in {fp.name} ({sf.subtype})")

max_bits = int(fmt) - 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the WFDB dat formats not map 1:1 with the flac subtype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this logic?

check_access=False,
):
"""
Open a database file as a random-access file object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for this function would be nice. "Database file" doesn't convey much meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps this function should be in its own module, as download.py doesn't quite make sense. eg. Module coreio.py and function open_file.

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 6, 2022 via email

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 6, 2022 via email

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 6, 2022 via email

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 7, 2022

Bug: these formats need to be handled in _digi_nan.

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 7, 2022

I am completely lost when it comes to poetry.

If I check out 63c8039, run rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install, I get the error you mentioned:

Installing dependencies from lock file
Warning: The lock file is not up to date with the latest changes in pyproject.toml. You may be getting outdated dependencies. Run update to update them.

  SolverProblemError

  Because wfdb depends on soundfile (^0.10.0) which doesn't match any versions, version solving failed.

If I do v1/bin/poetry update soundfile; v1/bin/poetry install, I get this nonsense:

Updating dependencies
Resolving dependencies... (0.2s)

Writing lock file

No dependencies to install or update
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: wfdb (4.0.0a0)

(it adds a bunch of junk to poetry.lock but it doesn't actually install any of the dependencies.)

The same happens if I do git checkout pyproject.toml poetry.lock; git checkout 3b408f23f50889c7d1c6fee9329c76b739b9f667; rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install, so clearly something was broken there to begin with.

If I rebase flac onto main (which has no poetry.lock) and do the same (rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install), it spins and spins.

In conclusion, I have no idea how to make poetry work, and I have no idea whether poetry will throw a fit if SoundFile changes its official spelling to soundfile.

If you have some idea how to make poetry work, please let me know the exact commands I ought to be running.

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 7, 2022

If I rebase flac onto main (which has no poetry.lock) and do the same (rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install), it spins and spins.

Ooh, that command finally finished running after 14 minutes and it still didn't install anything.

@cx1111
Copy link
Member

cx1111 commented Jun 7, 2022

I always remove poetry.lock before running poetry install, which may indicate other issues with me or poetry... I've also experienced 20m of spinning in many occasions...

Anyway, I did manage to successfully install this config (removed big packages to save time) by running poetry install in a new virtual env without a poetry.lock file:

[tool.poetry.dependencies]
python = "^3.7"
numpy = "^1.10.1"
SoundFile = "^0.10.0"
matplotlib = "^3.3.4"
requests = "^2.8.1"

Running pip install . also worked.

I think if they change their name back to soundfile then we can update the name in our dependency file too.

@cx1111
Copy link
Member

cx1111 commented Jun 8, 2022

Oh, on M1 Mac when trying to import soundfile: OSError: sndfile library not found. Looks like a new release should support it: bastibe/python-soundfile#310
Yeah, this is an annoyance for sure. Did you try the (hot off the presses) python-soundfile 0.11.0b5?

I did not. Figured it wouldn't make a difference to this PR even if it works. I only tested your changes with the specified soundfile on amd64 Linux.

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 8, 2022

Thanks - it would be nice to check it works but not a top priority if that's difficult.

I haven't forgotten about the other issues you raised above and I'll try to address them tomorrow, but for now I've uploaded a short example of some real-world data you can play around with if you like.

Benjamin Moody added 10 commits June 10, 2022 16:01
This defines the function _open_file, which opens an input data file
as a random-access file object.

This module is intended to eventually replace the "streaming I/O"
functions in the wfdb.io.download module (_remote_file_size,
_stream_header, _stream_dat, and _stream_annotation.)

Some notes on the implementation:

- Contrary to many existing functions, I've deliberately made pn_dir
  the first argument rather than the second, since putting the prefix
  first seems more natural.

- There's no dir_name argument; instead, callers should include the
  local directory name as part of file_name.  I consider it a bug that
  some functions have a separate dir_name argument that is ignored
  when pn_dir is set.

- This function does not do automatic version number resolution for
  PhysioNet projects.  That's something the caller (still) needs to do
  and should be handled elsewhere.
The soundfile library provides a python wrapper for libsndfile, which
in turn provides an interface to libFLAC for reading and writing FLAC
files.
"build": pip install will install the Python soundfile package.  On
Windows/MacOS, this will install a binary wheel package that includes
a private copy of libsndfile; on Ubuntu, it will install a generic
package that requires us to install libsndfile separately.

"test-deb10-i386": python3-soundfile installs the Python soundfile
package and also depends on libsndfile1.
The _digi_bounds function previously did not handle formats 310, 311,
61, 160, or 8 (which are not supported by wrsamp).  Add these formats
to SAMPLE_VALUE_RANGE for consistency even though they're currently
not used.

This will also raise an exception for unknown formats, rather than
silently returning None.
This will also raise an exception for unknown formats, rather than
silently returning None.
These format codes are used to designate FLAC signal files.  Format
508 indicates 8-bit, format 516 indicates 16-bit, and format 524
indicates 24-bit resolution.  Other format codes between 500 and 532
(inclusive) are reserved.

A FLAC signal file contains between one and eight channels, which are
sampled at the same frequency.  Therefore, all signals stored in the
same signal file must have the same number of samples per frame.  (If
a record contains more than eight signals, or if they have differing
sampling frequencies, they will require more than one signal file.)

Note that the number of samples per frame has no effect on the
encoding or decoding of the FLAC signal file; there is no interleaving
like there is in a binary signal file.  There is also no requiremement
that WFDB frame boundaries match up with FLAC compression-block
boundaries.

(Perhaps confusingly, the FLAC specification uses the word "frame" to
refer to the encoded form of a block of samples; this has nothing to
do with WFDB frames, and unless otherwise noted, the word "frame"
refers to a WFDB frame.)

To simplify this initial implementation, the function
_rd_compressed_file is nearly a drop-in replacement for _rd_dat_file,
which means that this function must rearrange the input data into the
format that _rd_dat_signals expects.
The fmt argument is the format of the singular dat file.  It is not a
list and that wouldn't make any sense here.
The record "flacformats" contains one signal in each of the three WFDB
FLAC signal formats.  As with the "binformats" record, sample j of
signal i is equal to:

   (i + 16843019 * j) % ((1 << adcres) - 1) + 1 - (1 << (adcres - 1)))

Use this record to test that it is possible to read all of the formats
correctly, including when we skip one or two samples from the start
and/or end of the record.
We want to be sure that the soundfile library is compatible with
wfdb.io._url.NetFile, since wfdb.io._signal._rd_compressed_file relies
on this.

This isn't a complete test of WFDB/FLAC functionality, in part because
the sample files are very small so we can't really exercise the
seeking functionality.  However, this should be sufficient to verify
that soundfile doesn't make silly assumptions (like assuming the input
file corresponds to an OS file descriptor.)
@bemoody
Copy link
Collaborator Author

bemoody commented Jun 10, 2022

Updated with the following changes:

  • Add a note to README.md.

  • Fix capitalization of SoundFile and add compatibility with 0.11.0 (soon to be released.)

  • Rename wfdb.io.download._open_db_file to wfdb.io._coreio._open_file.

  • Rewrite _digi_nan and _digi_bounds for clarity/completeness/efficiency and add FLAC formats.

Correctly handling invalid samples makes this example look better:

>>> import wfdb
>>> r = wfdb.rdrecord('sample-data/mixedsignals', sampto=1000)
>>> wfdb.plot_wfdb(r)

It still looks a little odd because the channels aren't synchronized (see pull #390).

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 13, 2022

By the way, I was poking around the other day and I saw that both pulseaudio and pipewire-bin packages on Debian have a dependency on libsndfile1 - so you're very unlikely to find a desktop system that doesn't have libsndfile! It's still worth mentioning in the README, though, because people will want to run the package on headless servers.

README.md Outdated
@@ -35,6 +35,8 @@ pip install wfdb
poetry add wfdb
```

On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see https://https://github.com/bastibe/python-soundfile/issues/310 and https://https://github.com/bastibe/python-soundfile/issues/325).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see https://https://github.com/bastibe/python-soundfile/issues/310 and https://https://github.com/bastibe/python-soundfile/issues/325).
On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see <https://github.com/bastibe/python-soundfile/issues/310> and <https://github.com/bastibe/python-soundfile/issues/325>).

@@ -0,0 +1,7 @@
mixedsignals 6 62.4725/999.56 14400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you haven't added a test case for these signals?

else:
raise ValueError(f"unknown subtype in {fp.name} ({sf.subtype})")

max_bits = int(fmt) - 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this logic?

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 15, 2022 via email

@tompollard
Copy link
Member

@cx1111 are we good to merge this?

Copy link
Member

@cx1111 cx1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if something got lost in communication about changes needing to get pushed. I did leave the note about the test cases for the multiplexed record case, and updating the README URL typo.

The main logic looks good, from what I understand. @bemoody should be good to merge when ready. Thanks!

@bemoody
Copy link
Collaborator Author

bemoody commented Jun 22, 2022

Thanks, Chen! I fixed the typo in the README. It would definitely be a good idea to add a test case for multiplexed/multi-frequency signals in the future but for right now I just want to merge the code as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants