Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling null findings in coordinate-based meta-analyses #294

Open
tsalo opened this issue Aug 6, 2020 · 5 comments
Open

Handling null findings in coordinate-based meta-analyses #294

tsalo opened this issue Aug 6, 2020 · 5 comments
Labels
cbma Issues/PRs pertaining to coordinate-based meta-analysis enhancement New feature or request question Further information is requested

Comments

@tsalo
Copy link
Member

tsalo commented Aug 6, 2020

I've been talking with @jessicabartley about how to handle experiments with null findings in coordinate-based meta-analyses.

There are two concerns:

  1. NiMARE's coordinate storage (i.e., a DataFrame with one coordinate per row) is not compatible with null findings. We could do a few things about this, I think:
    • Add one row per null experiment with NaNs in the x, y, z, i, j, and k columns of the DataFrame. We would need to refactor a number of things to work around this, but it would allow some algorithms, like MKDA Chi2, to include the null experiments as empty maps.
    • Add one row per null experiment with a coordinate somewhere outside the brain. This would trick the convergence-based approaches like ALE and MKDA, but is also dependent on the mask used for a given meta-analysis.
    • Assume any experiment under Dataset.ids without corresponding representation in Dataset.coordinates had null findings. Given that we plan to support mixed image- and coordinate-based datasets as NeuroStore is built, this seems like a bad assumption to make.
  2. Convergence-based methods (e.g., ALE and MKDA) are not designed to take null findings into account. I don't know if that's a major issue for the algorithms themselves. We can trick them (as described above) with coordinates outside the brain mask. As long as the kernels for those coordinates also don't touch the brain mask (since coordinates outside the mask aren't masked out before convolution- see ALEKernel masks data after convolution #37), that seems like a workable hack. Of course, there's also a question of how many dummy coordinates should be used.
@tsalo tsalo added enhancement New feature or request question Further information is requested labels Aug 6, 2020
@jdkent
Copy link
Member

jdkent commented Oct 30, 2020

I support the first option:

Add one row per null experiment with NaNs in the x, y, z, i, j, and k columns of the DataFrame. We would need to refactor a number of things to work around this, but it would allow some algorithms, like MKDA Chi2, to include the null experiments as empty maps.

I would suspect _validate_input would need to be modified (perhaps by adding an extra parameter like drop_na that would drop null studies from consideration because the estimators (e.g., ALE, KDA, MKDADensity) just ignore null studies AFAIK).

I worry about this option:

Add one row per null experiment with a coordinate somewhere outside the brain. This would trick the convergence-based approaches like ALE and MKDA, but is also dependent on the mask used for a given meta-analysis.

since people may choose a large radius/fwhm/mask (as you noted) and those values may have some influence on the final statistic.

I don't think I have enough context for option 3

Assume any experiment under Dataset.ids without corresponding representation in Dataset.coordinates had null findings. Given that we plan to support mixed image- and coordinate-based datasets as NeuroStore is built, this seems like a bad assumption to make.

But if a statistic image could theoretically be all zero, I think it's fair there is a representation of "no peaks".

@tyarkoni
Copy link
Contributor

Regarding (1), long term, I think the cleanest way to handle this internally would be to refactor the code so that we're passing around a list of Study containers instead of a single pandas DF. Then a Study can have an empty coordinates DF to represent null results (as opposed to None, which would imply the absence of any data). Admittedly this is much more work than the solutions proposed above, but we've talked about it in the past, and I think refactoring into a data model that more closely adheres to the (eventual) NIMADS spec would have some other things going for it (e.g., it would be easy to interact with individual chunks of data without a lot of getter logic, and we could make use of an eventual NeuroStore client library to do the data representation). I'm not suggesting we work on this right away, but more that we should probably not worry about it too much for now and just do the quickest, hackiest thing until we re-architect along the above lines.

WRT (2), what's the point of failure for CBMA methods? At least for MKDA, it doesn't seem like there should be a problem handling empty maps... computing the observed statistic (just the sum/mean of studies at each voxel) isn't a problem, and nulls also aren't a problem for computing p-values via permutation. For the histogram-based analytical approximation, I also don't think it's an issue, as you just have a single bin (for value=0) with P=1. It seems plausible that adding empty maps has such a negligible effect that they may as well be ignored (which would be the best-case scenario), but I don't see why that would require qualitatively different handling. And it's not obvious to me why ALE should have any more trouble. But probably I'm missing something.

@tsalo
Copy link
Member Author

tsalo commented Dec 17, 2020

Regarding (1), long term, I think the cleanest way to handle this internally would be to refactor the code so that we're passing around a list of Study containers instead of a single pandas DF.

I agree that refactoring how data are stored in the Dataset object would fix the storage issue, although I'm reticent to do so ahead of the full NiMADS integration. I have a feeling that the shift is going to mean major changes to wide swaths of the codebase, and I'd hate to do that twice.

WRT (2), what's the point of failure for CBMA methods?

I think that the issue is that, as you said, the effect is so negligible (or perhaps entirely nonexistent) that including null findings just doesn't impact the results. If you have a meta-analysis of 100 studies, with 80 of them being null findings, and the result is the same as if you just analyzed the 20 significant studies, then that might be a major source of confusion for users.

I can probably determine whether that's the case at some point soon though.

@tyarkoni
Copy link
Contributor

I think that the issue is that, as you said, the effect is so negligible (or perhaps entirely nonexistent) that including null findings just doesn't impact the results.

That's fine, I don't see that as a problem—we can just document the behavior. E.g., we could issue a gentle warning any time empty coordinate sets are detected, without doing anything differently. E.g., "X studies with empty coordinate lists were detected. No remedial steps need to be taken unless you think the absence of coordinates is an error. Note, however, that this coordinate-based meta-analysis method is (by design) unaffected by the addition of empty maps, and the inclusion of these studies will not change the results. (We nevertheless recommend keeping such studies in the dataset for informational purposes.)"

@tsalo tsalo added the cbma Issues/PRs pertaining to coordinate-based meta-analysis label Dec 28, 2020
@tsalo
Copy link
Member Author

tsalo commented Dec 18, 2021

I just came across a repo that implements something like we're talking about here, but in R: https://github.com/NeuroStat/GenerateNull. Just wanted to post it here for posterity in case we ever circle back to this idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cbma Issues/PRs pertaining to coordinate-based meta-analysis enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants