Have kernel classes return `Dataset` instances #41

tyarkoni · 2019-01-14T21:13:44Z

The API might be more intuitive if the Kernel classes returned a Dataset instance, with the resulting images appended to the .images list of every Contrast. Per discussion with @tsalo, the internal logic of the .fit calls could move to module level for efficiency (e.g., if one needs to build up a null distribution of 10,000 sets off images without copying the Dataset that many times in memory).

The text was updated successfully, but these errors were encountered:

tsalo · 2020-08-14T15:58:27Z

This hasn't been a major concern up to this point, because in most cases the kernel convolution is (1) fast and (2) unique (i.e., MA maps cannot be recycled for Monte Carlo methods), but @Julio-a-yanes is coming up against an issue with the CorrelationDecoder when running it with the Neurosynth dataset. It looks like that is one case where having the MA maps precomputed is critical for speed.

Some thoughts on what this will take:

New return_type value for KernelTransformers. In addition to array and image, we'll have dataset.
For purely coordinate-based datasets, we'll need a better way to set the dataset path (i.e., location of images). We typically infer this information from the relative paths to the images, but we'll probably need a dedicated path attribute.
- On the bright side, we could drop the "absolute path" columns from Dataset.images. The assumption would be that every absolute path is just Dataset.path + the image's relative path.
MA map filenames will need to include as many parameters as possible so we don't overwrite them or anything. This will entail a filename/column name construction method from KernelTransformer type and parameters.
Meta-analysis methods need to check for MA maps matching their criteria in the Dataset.images DataFrame before attempting convolution. This won't affect the Monte Carlo permutation code though.

tsalo · 2020-08-27T18:18:57Z

Incorporating parameters works out well. The thing I've gotten stuck on is the masker. The masker doesn't have a unique identifier that can transform well to a filename for the MA map. One solution would be to generate the files without any masking, inferring the image template from... the Dataset.space maybe? I don't think the space can be inferred from the masker... unless we make a hash from the affine, I guess.

The filenames would be something like:

study-{id}_{param1}-{val1}_{param2}-{val2}_hash-{hex digest}_{TransformerName}.nii.gz

So an example with the pain dataset and a KDA kernel:
study-pain_01.nidm-1_r-5.0_value-1_hash-34d2ff913320e14f04a4746cfa875fcd_KDAKernel.nii.gz

tsalo mentioned this issue Dec 7, 2019

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

Closed

tsalo mentioned this issue Aug 14, 2020

Add path attribute to Dataset #307

Closed

tsalo mentioned this issue Aug 27, 2020

[ENH] Support Dataset transformations in kernel transformers #320

Merged

5 tasks

tsalo closed this as completed in #320 Aug 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have kernel classes return `Dataset` instances #41

Have kernel classes return `Dataset` instances #41

tyarkoni commented Jan 14, 2019

tsalo commented Aug 14, 2020 •

edited

Loading

tsalo commented Aug 27, 2020

Have kernel classes return Dataset instances #41

Have kernel classes return Dataset instances #41

Comments

tyarkoni commented Jan 14, 2019

tsalo commented Aug 14, 2020 • edited Loading

tsalo commented Aug 27, 2020

Have kernel classes return `Dataset` instances #41

Have kernel classes return `Dataset` instances #41

tsalo commented Aug 14, 2020 •

edited

Loading