Read FLAC signal files #372

bemoody · 2022-05-05T19:47:16Z

WFDB 10.7 defines new signal formats 508, 516, and 524. Signal files in these formats are compressed using the FLAC algorithm.

Here, we add support for reading these formats using the soundfile package, which is a wrapper around libsndfile. As such, you'll need to have libsndfile (and libFLAC.) The soundfile wheel packages for Windows and MacOS include a copy of libsndfile that's statically linked with libFLAC. On Linux or any other platform you will need to install libsndfile yourself, though there's a good chance you already have it.

Other implementation strategies are possible but this seems simplest.

The implementation in _signal.py is messier than I would like, because the _rd_dat_signals function combines low-level details (byte offsets, format-specific transformations) with high-level semantics (deskewing, length checking.) Preferably the _skew_sig/_check_sig_dims stuff would be moved into _rd_segment instead.

Writing FLAC files is not implemented yet, and I think I'd like to try to address some of the other issues with wrsamp first (#333, #336). Of course if somebody else wants to take a stab at it, feel free ;)

cx1111 · 2022-05-13T03:18:30Z

I see that this is still WIP. Can we all add type annotations to affected code from now on?

bemoody · 2022-05-17T19:05:28Z

Can we all add type annotations to affected code from now on?

It sounds like a good idea in principle, though it adds quite a bit of visual noise and we don't have an established style for doing so. Could we take this discussion to a separate issue?

bemoody · 2022-05-26T18:33:16Z

Can we all add type annotations to affected code from now on?

See issue #378.

cx1111 · 2022-06-05T03:57:26Z

pyproject.toml

@@ -10,6 +10,7 @@ python = "^3.7"
 numpy = "^1.10.1"
 scipy = "^1.0.0"
 pandas = "^1.0.0"
+soundfile = "^0.10.0"


I have trouble installing this with poetry install, which results in: SolverProblemError Because wfdb depends on soundfile (^0.10.0) which doesn't match any versions, version solving failed.. Running poetry add soundfile@^0.10.0 results in SoundFile = "^0.10.0" being added

https://pypi.org/project/SoundFile/

sample-data/flacformats.hea

cx1111

Looks clean. Please add a note in the README about installing WFDB and libsndfile.

wfdb/io/_signal.py

cx1111 · 2022-06-05T04:14:40Z

wfdb/io/_signal.py

+    import soundfile
+
+    for spf in samps_per_frame[1:]:
+        assert spf == samps_per_frame[0], "spf mismatch"


Is there a better error to use here?

cx1111 · 2022-06-05T04:46:17Z

wfdb/io/_signal.py

+            else:
+                raise ValueError(f"unknown subtype in {fp.name} ({sf.subtype})")
+
+            max_bits = int(fmt) - 500


Should the WFDB dat formats not map 1:1 with the flac subtype?

Could you explain this logic?

cx1111 · 2022-06-05T04:52:44Z

wfdb/io/download.py

+    check_access=False,
+):
+    """
+    Open a database file as a random-access file object.


A better name for this function would be nice. "Database file" doesn't convey much meaning.

Also, perhaps this function should be in its own module, as download.py doesn't quite make sense. eg. Module coreio.py and function open_file.

bemoody · 2022-06-06T16:17:44Z

I have trouble installing this with `poetry install`, which results in: `SolverProblemError Because wfdb depends on soundfile (^0.10.0) which doesn't match any versions, version solving failed.`. Running `poetry add soundfile@^0.10.0` results in `SoundFile = "^0.10.0"` being added

Yeesh. Can I just call that a bug in poetry? Also, it looks like soundfile 0.11.0b5 changed the name in setup.py from SoundFile to soundfile. I don't know if the name on PyPI follows the name in setup.py or what.

bemoody · 2022-06-06T16:19:18Z

Do you have any multiplexed signal files for testing?

That's an excellent point, I'll dig out a small real-world example and add it here.

bemoody · 2022-06-06T16:20:43Z

Oh, on M1 Mac when trying to `import soundfile`: `OSError: sndfile library not found`. Looks like a new release should support it: bastibe/python-soundfile#310

Yeah, this is an annoyance for sure. Did you try the (hot off the presses) python-soundfile 0.11.0b5?

bemoody · 2022-06-07T19:13:26Z

Bug: these formats need to be handled in _digi_nan.

bemoody · 2022-06-07T20:59:36Z

I am completely lost when it comes to poetry.

If I check out 63c8039, run rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install, I get the error you mentioned:

Installing dependencies from lock file
Warning: The lock file is not up to date with the latest changes in pyproject.toml. You may be getting outdated dependencies. Run update to update them.

  SolverProblemError

  Because wfdb depends on soundfile (^0.10.0) which doesn't match any versions, version solving failed.

If I do v1/bin/poetry update soundfile; v1/bin/poetry install, I get this nonsense:

Updating dependencies
Resolving dependencies... (0.2s)

Writing lock file

No dependencies to install or update
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: wfdb (4.0.0a0)

(it adds a bunch of junk to poetry.lock but it doesn't actually install any of the dependencies.)

The same happens if I do git checkout pyproject.toml poetry.lock; git checkout 3b408f23f50889c7d1c6fee9329c76b739b9f667; rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install, so clearly something was broken there to begin with.

If I rebase flac onto main (which has no poetry.lock) and do the same (rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install), it spins and spins.

In conclusion, I have no idea how to make poetry work, and I have no idea whether poetry will throw a fit if SoundFile changes its official spelling to soundfile.

If you have some idea how to make poetry work, please let me know the exact commands I ought to be running.

bemoody · 2022-06-07T21:08:49Z

If I rebase flac onto main (which has no poetry.lock) and do the same (rm -rf v1; virtualenv -ppython3 v1; v1/bin/pip3 install poetry; v1/bin/poetry install), it spins and spins.

Ooh, that command finally finished running after 14 minutes and it still didn't install anything.

cx1111 · 2022-06-07T21:31:50Z

I always remove poetry.lock before running poetry install, which may indicate other issues with me or poetry... I've also experienced 20m of spinning in many occasions...

Anyway, I did manage to successfully install this config (removed big packages to save time) by running poetry install in a new virtual env without a poetry.lock file:

[tool.poetry.dependencies]
python = "^3.7"
numpy = "^1.10.1"
SoundFile = "^0.10.0"
matplotlib = "^3.3.4"
requests = "^2.8.1"

Running pip install . also worked.

I think if they change their name back to soundfile then we can update the name in our dependency file too.

cx1111 · 2022-06-08T20:17:41Z

Oh, on M1 Mac when trying to import soundfile: OSError: sndfile library not found. Looks like a new release should support it: bastibe/python-soundfile#310
Yeah, this is an annoyance for sure. Did you try the (hot off the presses) python-soundfile 0.11.0b5?

I did not. Figured it wouldn't make a difference to this PR even if it works. I only tested your changes with the specified soundfile on amd64 Linux.

bemoody · 2022-06-08T21:54:56Z

Thanks - it would be nice to check it works but not a top priority if that's difficult.

I haven't forgotten about the other issues you raised above and I'll try to address them tomorrow, but for now I've uploaded a short example of some real-world data you can play around with if you like.

This defines the function _open_file, which opens an input data file as a random-access file object. This module is intended to eventually replace the "streaming I/O" functions in the wfdb.io.download module (_remote_file_size, _stream_header, _stream_dat, and _stream_annotation.) Some notes on the implementation: - Contrary to many existing functions, I've deliberately made pn_dir the first argument rather than the second, since putting the prefix first seems more natural. - There's no dir_name argument; instead, callers should include the local directory name as part of file_name. I consider it a bug that some functions have a separate dir_name argument that is ignored when pn_dir is set. - This function does not do automatic version number resolution for PhysioNet projects. That's something the caller (still) needs to do and should be handled elsewhere.

The soundfile library provides a python wrapper for libsndfile, which in turn provides an interface to libFLAC for reading and writing FLAC files.

"build": pip install will install the Python soundfile package. On Windows/MacOS, this will install a binary wheel package that includes a private copy of libsndfile; on Ubuntu, it will install a generic package that requires us to install libsndfile separately. "test-deb10-i386": python3-soundfile installs the Python soundfile package and also depends on libsndfile1.

The _digi_bounds function previously did not handle formats 310, 311, 61, 160, or 8 (which are not supported by wrsamp). Add these formats to SAMPLE_VALUE_RANGE for consistency even though they're currently not used. This will also raise an exception for unknown formats, rather than silently returning None.

This will also raise an exception for unknown formats, rather than silently returning None.

These format codes are used to designate FLAC signal files. Format 508 indicates 8-bit, format 516 indicates 16-bit, and format 524 indicates 24-bit resolution. Other format codes between 500 and 532 (inclusive) are reserved. A FLAC signal file contains between one and eight channels, which are sampled at the same frequency. Therefore, all signals stored in the same signal file must have the same number of samples per frame. (If a record contains more than eight signals, or if they have differing sampling frequencies, they will require more than one signal file.) Note that the number of samples per frame has no effect on the encoding or decoding of the FLAC signal file; there is no interleaving like there is in a binary signal file. There is also no requiremement that WFDB frame boundaries match up with FLAC compression-block boundaries. (Perhaps confusingly, the FLAC specification uses the word "frame" to refer to the encoded form of a block of samples; this has nothing to do with WFDB frames, and unless otherwise noted, the word "frame" refers to a WFDB frame.) To simplify this initial implementation, the function _rd_compressed_file is nearly a drop-in replacement for _rd_dat_file, which means that this function must rearrange the input data into the format that _rd_dat_signals expects.

The fmt argument is the format of the singular dat file. It is not a list and that wouldn't make any sense here.

The record "flacformats" contains one signal in each of the three WFDB FLAC signal formats. As with the "binformats" record, sample j of signal i is equal to: (i + 16843019 * j) % ((1 << adcres) - 1) + 1 - (1 << (adcres - 1))) Use this record to test that it is possible to read all of the formats correctly, including when we skip one or two samples from the start and/or end of the record.

We want to be sure that the soundfile library is compatible with wfdb.io._url.NetFile, since wfdb.io._signal._rd_compressed_file relies on this. This isn't a complete test of WFDB/FLAC functionality, in part because the sample files are very small so we can't really exercise the seeking functionality. However, this should be sufficient to verify that soundfile doesn't make silly assumptions (like assuming the input file corresponds to an OS file descriptor.)

bemoody · 2022-06-10T20:48:10Z

Updated with the following changes:

Add a note to README.md.
Fix capitalization of SoundFile and add compatibility with 0.11.0 (soon to be released.)
Rename wfdb.io.download._open_db_file to wfdb.io._coreio._open_file.
Rewrite _digi_nan and _digi_bounds for clarity/completeness/efficiency and add FLAC formats.

Correctly handling invalid samples makes this example look better:

>>> import wfdb
>>> r = wfdb.rdrecord('sample-data/mixedsignals', sampto=1000)
>>> wfdb.plot_wfdb(r)

It still looks a little odd because the channels aren't synchronized (see pull #390).

bemoody · 2022-06-13T21:30:51Z

By the way, I was poking around the other day and I saw that both pulseaudio and pipewire-bin packages on Debian have a dependency on libsndfile1 - so you're very unlikely to find a desktop system that doesn't have libsndfile! It's still worth mentioning in the README, though, because people will want to run the package on headless servers.

cx1111 · 2022-06-14T04:35:36Z

README.md

@@ -35,6 +35,8 @@ pip install wfdb
 poetry add wfdb
 ```

+On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`.  Support for Apple M1 systems is a work in progess (see https://https://github.com/bastibe/python-soundfile/issues/310 and https://https://github.com/bastibe/python-soundfile/issues/325).


Suggested change

On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see https://https://github.com/bastibe/python-soundfile/issues/310 and https://https://github.com/bastibe/python-soundfile/issues/325).

On Linux systems, accessing *compressed* WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see <https://github.com/bastibe/python-soundfile/issues/310> and <https://github.com/bastibe/python-soundfile/issues/325>).

cx1111 · 2022-06-14T04:40:43Z

sample-data/mixedsignals.hea

@@ -0,0 +1,7 @@
+mixedsignals 6 62.4725/999.56 14400


Looks like you haven't added a test case for these signals?

cx1111 · 2022-06-14T04:42:13Z

wfdb/io/_signal.py

+            else:
+                raise ValueError(f"unknown subtype in {fp.name} ({sf.subtype})")
+
+            max_bits = int(fmt) - 500


Could you explain this logic?

bemoody · 2022-06-15T20:28:12Z

Looks like you haven't added a test case for these signals?

No, I didn't think of that. Somebody want to add that? :P

Could you explain this logic?

Well, one answer is just "trying to replicate the behavior of libFLAC by unwrapping the layers that are added on top of it by libsndfile and SoundFile." But why? libFLAC, like WFDB, deals with samples as integer values: an "8-bit" FLAC file will be decoded as a series of integers between -128 and 127. The "full range" is regarded as kind of an implementation detail (fun fact, libFLAC doesn't enforce range limits on the input samples, which can give weird results.) Also, FLAC compression itself is deeply tied to the quantization step. SoundFile, on the other hand, conceptually regards samples as fractions of the full range, and the quantization step is thought of as an implementation detail. If you ask for 32-bit integer data, SoundFile assumes what you want is a value scaled to the range -2^31 to 2^31, regardless of the original storage format. This is what you generally want when working with audio. So what we are doing is trying to persuade SoundFile to give us the most efficient format we can get, while undoing any automatic transformation it performs (which we can infer from the subtype attribute). As for the "max_bits" thing: the idea there is simply that you may have a collection of FLAC files *not* produced by WFDB, and possibly with unknown/varying bit depths, and you want to read them using WFDB. So the format code is an upper bound and not a strict requirement.

tompollard · 2022-06-22T16:24:47Z

@cx1111 are we good to merge this?

cx1111

I wasn't sure if something got lost in communication about changes needing to get pushed. I did leave the note about the test cases for the multiplexed record case, and updating the README URL typo.

The main logic looks good, from what I understand. @bemoody should be good to merge when ready. Thanks!

bemoody · 2022-06-22T18:59:11Z

Thanks, Chen! I fixed the typo in the README. It would definitely be a good idea to add a test case for multiplexed/multi-frequency signals in the future but for right now I just want to merge the code as-is.

bemoody force-pushed the flac branch 2 times, most recently from 26e94ce to f71f2a5 Compare May 5, 2022 20:26

bemoody changed the title ~~[WIP] Read FLAC signal files~~ Read FLAC signal files Jun 2, 2022

cx1111 reviewed Jun 5, 2022

View reviewed changes

sample-data/flacformats.hea Show resolved Hide resolved

cx1111 reviewed Jun 5, 2022

View reviewed changes

Benjamin Moody added 10 commits June 10, 2022 16:01

pyproject.toml: add dependency on soundfile.

0932a35

The soundfile library provides a python wrapper for libsndfile, which in turn provides an interface to libFLAC for reading and writing FLAC files.

_digi_nan: use a dictionary instead of "if" statements.

184f4fb

This will also raise an exception for unknown formats, rather than silently returning None.

_rd_dat_signals, _rd_dat_file: fix documentation.

9d3491c

The fmt argument is the format of the singular dat file. It is not a list and that wouldn't make any sense here.

Add an example of ICU waveform data in FLAC format.

9b5df00

bemoody force-pushed the flac branch from d0985cf to 600127d Compare June 10, 2022 20:45

cx1111 reviewed Jun 14, 2022

View reviewed changes

cx1111 approved these changes Jun 22, 2022

View reviewed changes

README.md: add note about installing libsndfile.

d49a5c3

bemoody force-pushed the flac branch from 600127d to d49a5c3 Compare June 22, 2022 19:00

bemoody merged commit 78a1a86 into main Jun 22, 2022

bemoody deleted the flac branch June 22, 2022 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read FLAC signal files #372

Read FLAC signal files #372

bemoody commented May 5, 2022 •

edited

Loading

cx1111 commented May 13, 2022

bemoody commented May 17, 2022

bemoody commented May 26, 2022 •

edited

Loading

cx1111 Jun 5, 2022

cx1111 left a comment

cx1111 Jun 5, 2022

cx1111 Jun 5, 2022

cx1111 Jun 14, 2022

cx1111 Jun 5, 2022

cx1111 Jun 8, 2022

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 7, 2022

bemoody commented Jun 7, 2022

bemoody commented Jun 7, 2022

cx1111 commented Jun 7, 2022

cx1111 commented Jun 8, 2022

bemoody commented Jun 8, 2022

bemoody commented Jun 10, 2022

bemoody commented Jun 13, 2022

cx1111 Jun 14, 2022

cx1111 Jun 14, 2022

cx1111 Jun 14, 2022

bemoody commented Jun 15, 2022 via email

tompollard commented Jun 22, 2022

cx1111 left a comment

bemoody commented Jun 22, 2022

	On Linux systems, accessing compressed WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see https://https://github.com/bastibe/python-soundfile/issues/310 and https://https://github.com/bastibe/python-soundfile/issues/325).
	On Linux systems, accessing compressed WFDB signal files requires installing `libsndfile`, by running `sudo apt-get install libsndfile1` or `sudo yum install libsndfile`. Support for Apple M1 systems is a work in progess (see <https://github.com/bastibe/python-soundfile/issues/310> and <https://github.com/bastibe/python-soundfile/issues/325>).

Read FLAC signal files #372

Read FLAC signal files #372

Conversation

bemoody commented May 5, 2022 • edited Loading

cx1111 commented May 13, 2022

bemoody commented May 17, 2022

bemoody commented May 26, 2022 • edited Loading

Choose a reason for hiding this comment

cx1111 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 6, 2022 via email

bemoody commented Jun 7, 2022

bemoody commented Jun 7, 2022

bemoody commented Jun 7, 2022

cx1111 commented Jun 7, 2022

cx1111 commented Jun 8, 2022

bemoody commented Jun 8, 2022

bemoody commented Jun 10, 2022

bemoody commented Jun 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bemoody commented Jun 15, 2022 via email

tompollard commented Jun 22, 2022

cx1111 left a comment

Choose a reason for hiding this comment

bemoody commented Jun 22, 2022

bemoody commented May 5, 2022 •

edited

Loading

bemoody commented May 26, 2022 •

edited

Loading