Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage for metric calculations #269

Closed
3 tasks
jbteves opened this issue Apr 22, 2019 · 14 comments
Closed
3 tasks

Reduce memory usage for metric calculations #269

jbteves opened this issue Apr 22, 2019 · 14 comments
Labels
enhancement issues describing possible enhancements to the project help wanted issues where we're particularly excited for contributions

Comments

@jbteves
Copy link
Collaborator

jbteves commented Apr 22, 2019

Summary

RAM usage appears very high during metric calculations. In order to reduce memory usage, we should look for places where we can write images to disk and free the memory they were using.

Additional Detail

There are actually two places where memory usage is high per a quick test by @dowdlelt:

  1. PCA
  2. Metric Calculation.

However, as @tsalo has pointed out in Gitter, PCA is implemented in sklearn and its memory usage may be stuck high. Since we have more control over the metric calculation, we should attempt to curb its memory usage where possible, especially since as of issue #254 there is at least one user who cannot run tedana to completion since its memory usage was so large. (See figure here).

Next Steps

  • Identify where files can be written to disk and reloaded later
  • Implement these changes in code
  • Determine new memory requirements per these changes
@jbteves jbteves added enhancement issues describing possible enhancements to the project help wanted issues where we're particularly excited for contributions labels Apr 22, 2019
@tsalo
Copy link
Member

tsalo commented Jun 18, 2019

We've had another user report a memory problem (see here), so I'd like to move this issue up in terms of priority. I've asked him to give some information about his dataset so that we can simulate some data that are large enough to replicate the issue. Once I have that, I'll dig into this.

@dowdlelt
Copy link
Collaborator

I know tedana has masking and unmasking features - is it operating on a vector of values within the mask or is it still holding all the ~zero values on the outside of the mask/brain in memory? In the original MEICA the data was actually cropped around the edges to remove anything outside of the brain. It was annoying (output dims differed from input), but effective in reducing memory usage.

I know that the clustering steps require that the brain is organized as 3D, but perhaps the heaviest lifting steps could be done on Vox X Time data, if they are not already. I think the tradeoff for internally flipping the data back and forth would be worth it for less memory usage.

We could always reproduce the original MEICA implementation of cutting away slabs of data that is outside of the brain mask and then adding zero filler slabs back at the end so that the data is the same dimension again.

@tsalo
Copy link
Member

tsalo commented Jun 19, 2019

I believe that it operates on the masked data already. The mask is then used to unmask the data before writing it out, so that shouldn't be a problem here (and we shouldn't have to crop the unmasked images).

@jbteves
Copy link
Collaborator Author

jbteves commented Jun 19, 2019

From here it looks like it's just a simple 2D array, and it should be masked from here in the optimal combination step. So that's as memory-efficient as we can get unless python is passing copies, but I don't think numpy arrays allow that.

@tsalo
Copy link
Member

tsalo commented Jun 19, 2019

True, although I guess we could split arrays up for voxel-wise metric calculation or maybe do a better job at deleting unused variables. Plus we should make sure that int or bool arrays are actually stored as such. I'm not sure what else there is to do.

@jbteves
Copy link
Collaborator Author

jbteves commented Jun 19, 2019

I just quadruple-checked, and it looks like numpy arrays are indeed passed by reference, so that's not introducing any overhead; pointers are cheap. So the only options remaining are basically:

  1. See if the typing is correct; make it a smaller type if not
  2. Try to free memory as quickly as possible
  3. See if we can architect away some of the more expensive calculations (very difficult)
  4. Rewriting intensive parts in C/C++ (extremely difficult)
    I'll see if I can figure out some ways to free up the memory, but since I'm not as familiar with the code I'm not sure how well that will go.
    One thing worth noting is that we should try to keep copying to a minimum. I bet it takes a lot of time for the interpreter to find a nice big block of contiguous memory to copy into. In-place where possible will save memory and time. I'm not sure that we can improve on that compared to what we have already, but it's good to keep in mind for future development.
    @tsalo do you know if the user had the latest version, where the automask is used by default? That saves a lot of memory.

@emdupre
Copy link
Member

emdupre commented Jun 19, 2019

Let's not think about C/C++ quite yet, if possible 😸 Although cython is lovely, it'll increase the barrier to entry pretty dramatically.

I wonder if we could focus on the masking, first. I think passing arrays around will help quite a bit, particularly if we're zero-ing everything outside the brain. This will interact with the integration of post-tedana AROMA, but I think we can worry about that after memory usage is down.

If PCA is a problem, we can think about alternative decompositions. The simplest first step might just be setting copy=False in the PCA call.

@rmarkello
Copy link
Member

I agree with @emdupre that setting copy=False should help somewhat and that it's definitely worth trying to find areas of the code where copies, rather than views, of the imaging data are accidentally being made before diving into anything too crazy.

It's also worth noting that the imaging files in question from the Neurostars issue @tsalo linked to, when loaded into memory, are going to be ~4.25 GB each:

>>> import numpy as np
>>> np.zeros((92, 92, 50, 1350)).nbytes  / (1024 ** 3)`
4.2566657066345215

(Using np.zeros generates accurate estimates of memory usage without actually consuming the memory.)

That's already ~13GB for all three time series loaded into memory in the catd array. If even just one copy of that array is made you've already basically hit your maximum with 32GB RAM... These are pretty hefty data that are definitely pushing traditional use cases.

@jbteves
Copy link
Collaborator Author

jbteves commented Jun 19, 2019

I guess that didn't translate well; the C/C++ was 99% joking. Even if we had a bunch of programmers lying around, communally maintaining C/C++ is pretty difficult, and even more difficult to test effectively.

Masking already works pretty well; the user either gives a mask or a substantial amount of data gets "ignored" by the automask, it's quite efficient. The fact that the user's files are big points back to the related issue #267, so that users understand that files that large will require beefy machines. We can't really compress the data further.

PCA is a pretty memory-expensive operation if I recall the details correctly, but it's also a pretty widely used approach. I don't see a reason to ditch it without a very attractive alternative.

@emdupre
Copy link
Member

emdupre commented Jun 19, 2019

Sorry for being unclear -- I don't think we need to ditch it just yet. The copy=False option prevents a copy from being created in memory to run the PCA on. I vote we try that first.

Definitely think the tie-in with #267 is important, too !

@jbteves
Copy link
Collaborator Author

jbteves commented Jun 19, 2019

I didn't think you were suggesting dumping it right away, I'm just stating for the record that there are many compelling reasons to stick with PCA.

Trying copy=false sounds good. Documentation here for reference.

@jbteves
Copy link
Collaborator Author

jbteves commented Jun 19, 2019

Note: another option would be to detect if RAM usage will be high/catch memory errors and use memory mapped arrays, which I think is the strategy that AfNI employs when it runs into similar problems.

@eurunuela
Copy link
Collaborator

eurunuela commented Sep 15, 2019

I've been pretty busy and haven't taken the time to properly read the code. I hope I can do that before the hackathon. May I ask whether the calculation of the metrics is done voxelwise? Or is it a matrix computation?

In the development of my own algorithms I've had some memory issues with Python that I managed to solve by using memmaps. I believe this approach could be helpful. Basically, the idea is to save big matrices in ROM memory and not load them into Python, but access them. This can have a big impact when doing voxelwise analysis given that one could just access the data of one voxel in that huge matrix, having little to no impact at all in memory. The easiest way of working with this approach is to save the matrix with Numpy's save function and then load the matrix with the mmap_mode option in load. The files that are stored are actually not very big for what I have seen so far.

See the documentation here:

I hope it can be helpful!

Edit: Paragraphs.

@tsalo tsalo changed the title Reduce Memory Usage for Metric Calculations Reduce memory usage for metric calculations Nov 18, 2019
@stale
Copy link

stale bot commented Feb 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement issues describing possible enhancements to the project help wanted issues where we're particularly excited for contributions
Projects
None yet
Development

No branches or pull requests

6 participants