Added F statistics and options for different types of tests #14

handwerkerd · 2024-02-15T00:20:02Z

Proposes changes to ME-ICA#1021.

I merged my approach using F statistics for the external regressors into this PR.

Changes proposed in this pull request:

Added a CLI parameter external_regressor_dict which defines what test to use for fitting external regressors
- corr_no_detrend.json runs @tsalo's correlation method
- corr_detrend.json runs @tsalo's correlation method after detrending
- Fmodel.json run a single F test using all external regressors an adds metrics for F, R2, and p.
- Mot12_CSF.json is an example of running the F test on a full model and also outputting stats for partial models, like all motion parameters an a CSF ROI
Added a new demo decision tree minimal_external3.json which is the same as extern1 but it uses a full F test model with p<0.05 and R2>0.5 (i.e. statistically significant and over half the variance in a component). As expected, this rejects a component that was substantially modeled by two external regressors, but not each separately.
- Demo trees are now all names demo_ so that we can merge them and make clear they're demonstrations of functionality without recommending immediate usage.
external_regressor_fits.py is expanded to parse the above files and calculate the correlation and F metrics. As set up, it will be relatively easy to add more metrics for external regressors.
new intregration test for using the F statistic
external_regressor_dict is being passed to the component selector class so that a descriptive line is added to the report log.

Stuff left to do:

Add unit tests
Improve the LGR.info() messages
Add documentation for how to use this new functionality
Consider if additional information should be saved in the output directory. At minimum we might want to copy the external regressors and external_regressor_dict into the output directory.

This passes all linting and seems fairly stable when I've tried a bunch of different things, but it's still a work in progress. Your choice if you want to merge into your PR so we're working on only one version or wait until this is cleaner. We can discuss on the dev call tomorrow.

…E-ICA#1024) * Create dependabot.yml * Drop release drafter in favor of release config.

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v2...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v2...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Base versions on last CircleCI 3.8 env build. * Base max versions on 3.12 test. * Set minimum versions too. * Update pyproject.toml Co-authored-by: Dan Handwerker <[email protected]> --------- Co-authored-by: Dan Handwerker <[email protected]>

tsalo

I am very much sold on using F-statistics, but I think the regressor definitions should be merged into the decision tree configs. The main concern about putting regressor definitions into decision trees is that the column names may vary by the individual, but I think users can deal with that by either modifying the decision trees' necessary metrics definitions or by writing in regular expressions into the metric definitions.

docs/api.rst

tedana/metrics/collect.py

tsalo · 2024-02-16T20:00:55Z

tedana/resources/extern_regress_dicts/Fmodel.json

@@ -0,0 +1,7 @@
+{


I'd still rather merge these configs into the decision tree. Decision tree configs define necessary metrics, so we should be determining what metrics to calculate from them anyway (ME-ICA#921). If you are concerned about column names, we could include regular expressions to capture common cases, or just let users update the "info" to say that they modified the tree to match their column names, but that the didn't change the underlying logic of the tree.

Still thinking about this & leaning towards agreeing with you. I think we might still need an option for a regressor column config file, but if a decision tree is looking for specific column classes in order to function, then that info could be in the decision tree json. I'll play around to see how this looks in practice.

This is now done and works with regular expressions.

tedana/resources/extern_regress_dicts/Fmodel.json

tsalo · 2024-02-16T20:02:30Z

tedana/selection/component_selector.py

+        component_table,
+        cross_component_metrics={},
+        status_table=None,
+        external_regressor_dict=None,


I don't think the external regressor config needs to be passed in here.

The reason it's passed here is so that the LGR.report line for the regressor model appears along with the tree description. This wouldn't be an issue if the regressor model was in the decision tree json, but I think that would cause other issues... still thinking about this.

To move everything into the decision tree json file, I think I'd need move a light version of ME-ICA#969 into this PR. This isn't a bad thing, but the only way to get this to work would be to initialize the component selector and read in the external regressor fields much earlier in the code. There are benefits to this, but it would expand this PR beyond just adding correlations. I might still give it a try, but wanted to bring this up.

tedana/selection/selection_utils.py

tedana/metrics/external_regressor_fits.py

tsalo · 2024-02-16T20:07:53Z

tedana/metrics/external_regressor_fits.py

+    return comptable
+
+
+def make_detrend_regressors(n_time, polort=None):


Aren't some of the largest-variance components typically characterized by drifts? Are there some cases where we'd want to detrend regressors, but other cases where we wouldn't?

Thinking about this, I think I'm doing something non-ideal here and in my earlier non-tedana version of this analysis. We need to model slow drifts because, they can be a large source of variance and our external regressors will less often model a significant portion of the variance if we don't model the drift. That said, if we're calculating fits based on the full F statistic model then tedana is detrending data by default. This isn't good because a benefit of tedana is it can potentially separate and retain T2*-weight slow drifts in learning or pharmacology studies.

I think the better approach is to output a partial model with all nuissance regressors except the slow drifts and use that in the decision tree. Based on feedback from Logan & Gang Chen, I might also give an option to model task regressors so decision trees might skew towards keeping them.

I was confused in my above comment. The Full F test is comparing a model with all the external regressors and the detrending regressors to just the detrending regressors. That is, it's detrending the data and seeing if the remaining regressors model the signal above and beyond what is detrended. As such, this is what I think we want. A component won't be flagged and removed because of a large signal drift unless that componend is flagged for rejection based on kappa/rho criteria. If a drift component is T2* weighted, tedana should not remove it (but a user can remove it separately)

…4.2.0 (ME-ICA#1031) * Fix empty file * fix style * style fix 2 * csv file check * fix txt when empty and create test for empty files * Fix style * Run black * unused import remove * Explanation of function and remove unused variable * Fix No blank lines allowed after function docstring * use bare 'except' * ica_reclassify logging init higher * linting fixes * fixed a test * fixed linting and precommit black version --------- Co-authored-by: martinezeguiluz <[email protected]>

* docs: update README.md * docs: update .all-contributorsrc --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>

handwerkerd · 2024-02-23T20:47:02Z

I'm going to get back to working on this after ME-ICA#952 is merged (assuming no big concerns with that one). Do you want me to keep working on this PR to your PR or should you just close your PR & I'll open a new one? (Depends on how much I'm just taking the lead on this PR going forward vs we'll both keep contributing).

…#1048) Updates the requirements on [pandas](https://github.com/pandas-dev/pandas) to permit the latest version. - [Release notes](https://github.com/pandas-dev/pandas/releases) - [Commits](pandas-dev/pandas@v2.0.0...v2.2.1) --- updated-dependencies: - dependency-name: pandas dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Getting generate metrics fully running

* Add T2* and S0 figures. * Add seaborn as a dependency. * Update static_figures.py * Fix. * Update expected outputs. * Reclassify doesn't need T2*/S0 plots. * Add png_cmap to T2*/S0 plots. * Add type hints and fix test outputs. * Update html_report.py * Update report_body_template.html * Try improving the figures. * Update report_body_template.html * Drop seaborn. * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Document the figures. * Update docs/outputs.rst Co-authored-by: Dan Handwerker <[email protected]> * Switch to grayscale colormap. * Update tedana/reporting/static_figures.py Co-authored-by: Dan Handwerker <[email protected]> * Update tedana/reporting/static_figures.py Co-authored-by: Dan Handwerker <[email protected]> * Update static_figures.py * Update figure. --------- Co-authored-by: Dan Handwerker <[email protected]>

* Improve component plots maybe. * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Put title on top axis. * Use suptitle. * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Update static_figures.py * Add time ticks to time series subplot. * Update static_figures.py * Update static_figures.py

tsalo · 2024-02-29T17:24:22Z

I think I'll change the target for my PR to a new branch on ME-ICA, like we did with doc-tree. That way we can both contribute to it as needed before we merge into main.

…E-ICA#1055) Updates the requirements on [nibabel](https://github.com/nipy/nibabel) to permit the latest version. - [Release notes](https://github.com/nipy/nibabel/releases) - [Changelog](https://github.com/nipy/nibabel/blob/master/Changelog) - [Commits](nipy/nibabel@2.5.1...5.2.1) --- updated-dependencies: - dependency-name: nibabel dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

handwerkerd · 2024-03-07T20:19:22Z

I've addressed almost all the comments above. The main thing left to text and fix is I'm fairly sure I've messed up how it handles individually named regressors in the partial model. With the new system, I think I'll need to use regex which will require a few more changes.

Based on a suggestion from Logan, I also added an option for Task regressors which are excluded from the other model, but calculated separately, so that it will be possible to protect against rejecting components that follow the task design.

As for branching, when you made a new branch on main I still needed to open a new PR, so it would have been just as easy for me to open a PR from my branch (and give maintainers edit access if they want to make changes).

Also, when the big changes settle down, I'll add tests and documentation.

handwerkerd · 2024-03-12T15:51:26Z

@tsalo There's still some work to do, but I've addressed all of your above concerns and I was just about to open a new PR. I intended to merge into https://github.com/ME-ICA/tedana/tree/enh/external-regressors but I ended up creating https://github.com/ME-ICA/tedana/tree/external-regressors instead. Opinions on whether we should keep using the branch I created vs me trying to figure out how to push to the branch you created?

FWIW, the command I used was git push upstream external-regressors

git push upstream enh/external-regressors
outputted

error: src refspec enh/external-regressors does not match any
error: failed to push some refs to 'github.com:ME-ICA/tedana.git'

Either way, we should delete whichever one we aren't using. (There are also a bunch of all-contributors branches that should probably be deleted)

tsalo and others added 18 commits August 7, 2023 09:38

Get required metrics from decision tree.

d03ea9a

Continue changes.

c7577c7

More updates.

b866c1d

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f87c97f

Store necessary_metrics as a list.

a1cc401

Update selection_nodes.py

adb83a9

Update selection_utils.py

610ea5a

Update across the package.

10f7b37

Keep updating.

dd9cd25

Update tedana.py

1236509

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f83b4cd

Add extra metrics to list.

2a7460c

Update ica_reclassify.py

dd0f8d7

Add Dependabot config and switch Release Drafter to Changelog config (M…

899e4ce

…E-ICA#1024) * Create dependabot.yml * Drop release drafter in favor of release config.

Added F tests and options

2d3c1da

added corr_no_detrend.json

a574f71

tsalo self-requested a review February 15, 2024 15:02

tsalo and others added 2 commits February 15, 2024 17:36

updated names and reporting

7b0b348

tsalo requested changes Feb 16, 2024

View reviewed changes

tsalo and others added 8 commits February 16, 2024 15:22

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

bf90c80

Run black.

ab967aa

Address style issues.

cc4118e

Try fixing test bugs.

cd38577

Update test_component_selector.py

c4de5be

Update component_selector.py

79de1f3

docs: add martinezeguiluz as a contributor for code (ME-ICA#1032)

f1fbd02

* docs: update README.md * docs: update .all-contributorsrc --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>

handwerkerd added 2 commits February 26, 2024 10:42

responding to feedback from tsalo

1d182fc

Merge remote-tracking branch 'upstream/main' into correlation-metric-dh

b630418

handwerkerd mentioned this pull request Feb 26, 2024

Generate metrics based on decision tree ME-ICA/tedana#969

Merged

tsalo and others added 10 commits February 26, 2024 15:43

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

e45a49a

Update component_selector.py

0baccd9

Update test_component_selector.py

4b751eb

fixed some testing failures

b2b6067

fixed test_check_null_succeeds

6c34a5b

fixed ica_reclassify bug and selector_properties test

83911f5

ComponentSelector initialized before loading data

24445e0

fixed docstrings

51d285e

updated building decision tree docs

3fe2443

handwerkerd mentioned this pull request Feb 28, 2024

Getting generate metrics fully running #17

Merged

tsalo and others added 4 commits February 28, 2024 10:41

Merge pull request ME-ICA#17 from handwerkerd/gen-req-metrics-dh

8613505

Getting generate metrics fully running

Merge branch 'pr/tsalo/969' into correlation-metric-dh

6429398

dependabot bot and others added 4 commits March 4, 2024 14:05

using external regressors and most tests passing

4e8ca23

removed corr added tasks

e2cf60e

fit_model moved to stats

e1491b3

handwerkerd added 4 commits March 7, 2024 16:36

removed and cleaned up external_regressors_config option

57e72a0

Added task regressors and some tests. Now alll in decision tree

dfdf47e

Merge remote-tracking branch 'upstream/main' into correlation-metric-dh

a9275e6

cleaning up decision tree json files

28e6295

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added F statistics and options for different types of tests #14

Added F statistics and options for different types of tests #14

handwerkerd commented Feb 15, 2024 •

edited by tsalo

Loading

tsalo left a comment

tsalo Feb 16, 2024

handwerkerd Feb 26, 2024

handwerkerd Mar 12, 2024

tsalo Feb 16, 2024

handwerkerd Feb 26, 2024

handwerkerd Feb 26, 2024

handwerkerd Mar 7, 2024

tsalo Feb 16, 2024

handwerkerd Feb 26, 2024

handwerkerd Mar 7, 2024

handwerkerd commented Feb 23, 2024

tsalo commented Feb 29, 2024

handwerkerd commented Mar 7, 2024 •

edited

Loading

handwerkerd commented Mar 12, 2024

		return comptable


		def make_detrend_regressors(n_time, polort=None):

Added F statistics and options for different types of tests #14

Are you sure you want to change the base?

Added F statistics and options for different types of tests #14

Conversation

handwerkerd commented Feb 15, 2024 • edited by tsalo Loading

tsalo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

handwerkerd commented Feb 23, 2024

tsalo commented Feb 29, 2024

handwerkerd commented Mar 7, 2024 • edited Loading

handwerkerd commented Mar 12, 2024

handwerkerd commented Feb 15, 2024 •

edited by tsalo

Loading

handwerkerd commented Mar 7, 2024 •

edited

Loading