Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

Closed
tyarkoni opened this issue Dec 5, 2019 · 4 comments
Closed

Clean up masking/preprocessing in CBMAEstimator hierarchy #195

tyarkoni opened this issue Dec 5, 2019 · 4 comments
Labels
cbma Issues/PRs pertaining to coordinate-based meta-analysis refactoring Requesting changes to the code which do not impact behavior

Comments

@tyarkoni
Copy link
Contributor

tyarkoni commented Dec 5, 2019

There's some redundancy in the masking/preprocessing done inside the CBMAEstimator subclasses that makes maintenance more difficult than it could be. We should probably clean this up and push as much of this stuff as possible into the base CBMAEstimator class. I realize there's some heterogeneity in fit() signature that makes things a bit more complex, but probably we can handle that with either mixins or by adding something like a CBMAPairwiseEstimator child that expects two inputs.

Another thing I'm not super clear on is why we have a lot of calls of the form self.kernel_estimator.transform(self.dataset, mask=self.mask, masked=True). IMO the masking parameters should be set on the KernelTransformer at initialization time, as with other classes. I think we should apply the same principle as elsewhere: if the KernelTransformer isn't initialized with a masker, then it will always take it from the Dataset. Since the dataset is always passed through from the CBMAEstimator, I think the above call can then be reduced to self.kernel_estimator.transform(self.dataset), which is a much cleaner API and will cut down on a lot or redundancy. (As a minor aside, shouldn't it be CBMAEstimator.kernel_transformer rather than CBMAEstimator.kernel_estimator, for consistency with the KernelTransformer class?)

This probably shouldn't be touched until we merge #163 (which I'm working on right now), but just bringing it up here for comment.

@tsalo
Copy link
Member

tsalo commented Dec 7, 2019

I realize there's some heterogeneity in fit() signature that makes things a bit more complex, but probably we can handle that with either mixins or by adding something like a CBMAPairwiseEstimator child that expects two inputs.

Sounds good to me.

Another thing I'm not super clear on is why we have a lot of calls of the form self.kernel_estimator.transform(self.dataset, mask=self.mask, masked=True). IMO the masking parameters should be set on the KernelTransformer at initialization time, as with other classes.

KernelEstimators can return either lists of images (masked=False) or numpy arrays (masked=True) mostly for speed. That speed is important for the permutation-based FWE correction methods. The kernel estimator is initialized and stored as an Estimator parameter in the Estimator's initialization. We could move the masked argument from KernelTransformer.transform() to KernelTransformer.__init__(), but then we'd need to commit to having it return arrays throughout the Estimator rather than just at the FWE-correction stage.

Ultimately, I know you also want to allow KernelTransformers to return Datasets by default (per #41), which I think would be a good replacement for the list-of-imgs output type.

Since the dataset is always passed through from the CBMAEstimator, I think the above call can then be reduced to self.kernel_estimator.transform(self.dataset)

That is one way of calling the transformer right now. When a Dataset is passed in, the masker from the Dataset is used automatically. When a DataFrame is passed in, then the mask needs to be a separate argument because there's no way of inferring it.

The problem is that, in the CBMAEstimators' permutation-based FWE correctors, we're replacing the coordinates in the dataset with randomized ones. It's just faster and less memory-intensive to work with DataFrames at that point than Datasets.

As a minor aside, shouldn't it be CBMAEstimator.kernel_transformer rather than CBMAEstimator.kernel_estimator, for consistency with the KernelTransformer class?

I can do that.

@tsalo
Copy link
Member

tsalo commented May 24, 2020

Minor update: I changed the masked parameter to return_type in #225 to clarify its purpose and to match it up with MetaResult.get_map(name, return_type).

@tsalo tsalo added the refactoring Requesting changes to the code which do not impact behavior label May 29, 2020
@tsalo
Copy link
Member

tsalo commented Aug 27, 2020

I think I can set the masker at initialization, although the masker will still need to be inferred from the dataset in cases where the kernel transformer isn't initialized with one, and that happens in transform().

I still don't want to set the masking procedure at initialization, for the reasons above.

Other than that, I think everything else that's been requested will be handled in #320.

@tsalo tsalo added the cbma Issues/PRs pertaining to coordinate-based meta-analysis label Dec 28, 2020
@tsalo
Copy link
Member

tsalo commented Nov 6, 2021

I think the situation has developed enough that this isn't applicable anymore, without a single PR being responsible for closing it, so I'm going to close. We can reopen if necessary.

@tsalo tsalo closed this as completed Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cbma Issues/PRs pertaining to coordinate-based meta-analysis refactoring Requesting changes to the code which do not impact behavior
Projects
None yet
Development

No branches or pull requests

2 participants