Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have kernel classes return Dataset instances #41

Closed
tyarkoni opened this issue Jan 14, 2019 · 2 comments · Fixed by #320
Closed

Have kernel classes return Dataset instances #41

tyarkoni opened this issue Jan 14, 2019 · 2 comments · Fixed by #320

Comments

@tyarkoni
Copy link
Contributor

The API might be more intuitive if the Kernel classes returned a Dataset instance, with the resulting images appended to the .images list of every Contrast. Per discussion with @tsalo, the internal logic of the .fit calls could move to module level for efficiency (e.g., if one needs to build up a null distribution of 10,000 sets off images without copying the Dataset that many times in memory).

@tsalo
Copy link
Member

tsalo commented Aug 14, 2020

This hasn't been a major concern up to this point, because in most cases the kernel convolution is (1) fast and (2) unique (i.e., MA maps cannot be recycled for Monte Carlo methods), but @Julio-a-yanes is coming up against an issue with the CorrelationDecoder when running it with the Neurosynth dataset. It looks like that is one case where having the MA maps precomputed is critical for speed.

Some thoughts on what this will take:

  • New return_type value for KernelTransformers. In addition to array and image, we'll have dataset.
  • For purely coordinate-based datasets, we'll need a better way to set the dataset path (i.e., location of images). We typically infer this information from the relative paths to the images, but we'll probably need a dedicated path attribute.
    • On the bright side, we could drop the "absolute path" columns from Dataset.images. The assumption would be that every absolute path is just Dataset.path + the image's relative path.
  • MA map filenames will need to include as many parameters as possible so we don't overwrite them or anything. This will entail a filename/column name construction method from KernelTransformer type and parameters.
  • Meta-analysis methods need to check for MA maps matching their criteria in the Dataset.images DataFrame before attempting convolution. This won't affect the Monte Carlo permutation code though.

@tsalo
Copy link
Member

tsalo commented Aug 27, 2020

Incorporating parameters works out well. The thing I've gotten stuck on is the masker. The masker doesn't have a unique identifier that can transform well to a filename for the MA map. One solution would be to generate the files without any masking, inferring the image template from... the Dataset.space maybe? I don't think the space can be inferred from the masker... unless we make a hash from the affine, I guess.

The filenames would be something like:

study-{id}_{param1}-{val1}_{param2}-{val2}_hash-{hex digest}_{TransformerName}.nii.gz

So an example with the pain dataset and a KDA kernel:
study-pain_01.nidm-1_r-5.0_value-1_hash-34d2ff913320e14f04a4746cfa875fcd_KDAKernel.nii.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants