Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: allow iterating signals in chunks of dataframes #436

Open
thomasdziedzic-calmwave opened this issue Nov 29, 2022 · 3 comments

Comments

@thomasdziedzic-calmwave

I'm using the new to_dataframe() function that was implemented in #380

One issue that I'm seeing is that when loading some of the waveform signals from https://physionet.org/content/mimic3wdb-matched/1.0/ using to_dataframe() it eats up a lot of memory. Specifically, on the machine I'm running on which has 96gb of memory, reading the record and calling to_dataframe runs out of memory.

I would like to lazy load the signal data into a chunked dataframe which would allow me to process the waveform signals in parts that could fit into memory, rather than loading it all into memory.

@thomasdziedzic-calmwave
Copy link
Author

thomasdziedzic-calmwave commented Nov 30, 2022

I accomplished this by reading the header of the record, getting the signal length, and then building my own chunking process, by using rdrecord(sigfrom, sigto) which unblocks me, but before I close this, might be worth discussing what they think the solution is and if there should be a documented solution to this problem or approach.

@tompollard
Copy link
Member

Thanks @thomasdziedzic-calmwave, let's keep this issue open. I think it would be good to try to address the problem directly, perhaps as an argument to to_dataframe().

@tompollard
Copy link
Member

Should we consider adopting Dask dataframes? My understanding is that they are better able to handle datasets that are too large for RAM: https://docs.dask.org/en/stable/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants