Clean up masking/preprocessing in CBMAEstimator hierarchy #195

tyarkoni · 2019-12-05T20:56:22Z

There's some redundancy in the masking/preprocessing done inside the CBMAEstimator subclasses that makes maintenance more difficult than it could be. We should probably clean this up and push as much of this stuff as possible into the base CBMAEstimator class. I realize there's some heterogeneity in fit() signature that makes things a bit more complex, but probably we can handle that with either mixins or by adding something like a CBMAPairwiseEstimator child that expects two inputs.

Another thing I'm not super clear on is why we have a lot of calls of the form self.kernel_estimator.transform(self.dataset, mask=self.mask, masked=True). IMO the masking parameters should be set on the KernelTransformer at initialization time, as with other classes. I think we should apply the same principle as elsewhere: if the KernelTransformer isn't initialized with a masker, then it will always take it from the Dataset. Since the dataset is always passed through from the CBMAEstimator, I think the above call can then be reduced to self.kernel_estimator.transform(self.dataset), which is a much cleaner API and will cut down on a lot or redundancy. (As a minor aside, shouldn't it be CBMAEstimator.kernel_transformer rather than CBMAEstimator.kernel_estimator, for consistency with the KernelTransformer class?)

This probably shouldn't be touched until we merge #163 (which I'm working on right now), but just bringing it up here for comment.

The text was updated successfully, but these errors were encountered:

tsalo · 2019-12-07T17:48:32Z

I realize there's some heterogeneity in fit() signature that makes things a bit more complex, but probably we can handle that with either mixins or by adding something like a CBMAPairwiseEstimator child that expects two inputs.

Sounds good to me.

Another thing I'm not super clear on is why we have a lot of calls of the form self.kernel_estimator.transform(self.dataset, mask=self.mask, masked=True). IMO the masking parameters should be set on the KernelTransformer at initialization time, as with other classes.

KernelEstimators can return either lists of images (masked=False) or numpy arrays (masked=True) mostly for speed. That speed is important for the permutation-based FWE correction methods. The kernel estimator is initialized and stored as an Estimator parameter in the Estimator's initialization. We could move the masked argument from KernelTransformer.transform() to KernelTransformer.__init__(), but then we'd need to commit to having it return arrays throughout the Estimator rather than just at the FWE-correction stage.

Ultimately, I know you also want to allow KernelTransformers to return Datasets by default (per #41), which I think would be a good replacement for the list-of-imgs output type.

Since the dataset is always passed through from the CBMAEstimator, I think the above call can then be reduced to self.kernel_estimator.transform(self.dataset)

That is one way of calling the transformer right now. When a Dataset is passed in, the masker from the Dataset is used automatically. When a DataFrame is passed in, then the mask needs to be a separate argument because there's no way of inferring it.

The problem is that, in the CBMAEstimators' permutation-based FWE correctors, we're replacing the coordinates in the dataset with randomized ones. It's just faster and less memory-intensive to work with DataFrames at that point than Datasets.

As a minor aside, shouldn't it be CBMAEstimator.kernel_transformer rather than CBMAEstimator.kernel_estimator, for consistency with the KernelTransformer class?

I can do that.

tsalo · 2020-05-24T17:38:43Z

Minor update: I changed the masked parameter to return_type in #225 to clarify its purpose and to match it up with MetaResult.get_map(name, return_type).

tsalo · 2020-08-27T19:01:18Z

I think I can set the masker at initialization, although the masker will still need to be inferred from the dataset in cases where the kernel transformer isn't initialized with one, and that happens in transform().

I still don't want to set the masking procedure at initialization, for the reasons above.

Other than that, I think everything else that's been requested will be handled in #320.

tsalo · 2021-11-06T14:43:07Z

I think the situation has developed enough that this isn't applicable anymore, without a single PR being responsible for closing it, so I'm going to close. We can reopen if necessary.

tsalo mentioned this issue Dec 7, 2019

[REF] Rename kernel_estimator attribute to kernel_transformer #197

Merged

tsalo mentioned this issue May 26, 2020

[REF, ENH] Add CBMAEstimator base class #232

Merged

tsalo added the refactoring Requesting changes to the code which do not impact behavior label May 29, 2020

tsalo mentioned this issue Jun 4, 2020

Use datasets in ALESubtraction instead of meta-analyses #246

Closed

tsalo mentioned this issue Aug 27, 2020

[ENH] Support Dataset transformations in kernel transformers #320

Merged

5 tasks

tsalo added the cbma Issues/PRs pertaining to coordinate-based meta-analysis label Dec 28, 2020

tsalo closed this as completed Nov 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

tyarkoni commented Dec 5, 2019 •

edited

Loading

tsalo commented Dec 7, 2019

tsalo commented May 24, 2020

tsalo commented Aug 27, 2020

tsalo commented Nov 6, 2021

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

Comments

tyarkoni commented Dec 5, 2019 • edited Loading

tsalo commented Dec 7, 2019

tsalo commented May 24, 2020

tsalo commented Aug 27, 2020

tsalo commented Nov 6, 2021

tyarkoni commented Dec 5, 2019 •

edited

Loading