-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xr.concat consuming too much resources #1379
Comments
Also, reading all Datasets into a list and then trying to concatenate this list of Datasets at once also blows memory up. |
I realised that some of the Datasets I was trying to concatenate had different coordinate values (for coordinates that I was assuming to be the same) so I guess xr.concat was trying to align these coordinates before concatenating and the resultant Dataset ended up being much larger than it should have been. When I ensure I only concatenate Datasets with consistent coordinates, I can do it. However still resource consumption is quite high compared to when I so the same thing with numpy arrays. The memory increased by 42% using xr.concat (against 6% using np.concatenate) and the whole processing took about 4 times longer. |
Alignment and broadcasting means that My guess is that some combination of automatic alignment and/or broadcasting in |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Hi,
I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all).
However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing).
I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end.
Thanks.
The text was updated successfully, but these errors were encountered: