[REF] Modularize metric calculation #436

tsalo · 2019-11-08T21:24:08Z

Closes #320.

Changes proposed in this pull request:

Move individual metric calculations from one large-scale function into their own smaller functions.
Add wrapper function that flexibly allow users (or more likely decision trees) to request any set of metrics and to get back a component table with those metrics.

rmarkello

Initial pass (from my phone!). Will continue from the computer but didn't want to lose these initial comments. Overall it's looking good!!

rmarkello · 2019-11-10T17:05:54Z

tedana/metrics/collect.py

+                         'data_optcom ({1}), and mixing ({2}) do not '
+                         'match.'.format(data_cat.shape[2], data_optcom.shape[1], mixing.shape[0]))
+
+    mixing = mixing.copy()


I think this copy is unnecessary, here, since the next time it's used is providing mixing to flip_components () which will trigger a copy.

But we should add a check for if mixing_z is None and assign mixing to mixing_z.

rmarkello · 2019-11-10T17:08:19Z

tedana/metrics/collect.py

+
+
+def generate_metrics(comptable, data_cat, data_optcom, mixing, mask, tes, ref_img, mixing_z=None,
+                     metrics=['kappa', 'rho'], sort_by='kappa', ascending=False):


I think it's generally best to not have a mutable object as a default parameter. Can we set metrics to None and then do a check in the function?

rmarkello · 2019-11-10T17:13:25Z

tedana/metrics/dependence.py

+
+
+def calculate_betas(data_optcom, mixing):
+    assert data_optcom.shape[1] == mixing.shape[0]


Can we make this a ValueError with a brief message instead of an assertion error?

I'll add more informative checks (and tests) later on. I just wanted to throw a couple of checks in here as an example.

rmarkello · 2019-11-10T17:14:24Z

tedana/metrics/dependence.py

+
+
+def calculate_psc(data_optcom, optcom_betas):
+    assert data_optcom.shape[1] == optcom_betas.shape[0]


See comment above re: ValueError instead of an assert.

rmarkello · 2019-11-10T17:21:56Z

tedana/metrics/dependence.py

+        # index voxels significantly loading on component but not from clusters
+        comp_noise_sel = ((np.abs(Z_maps[:, i_comp]) > z_thresh) &
+                          (Z_clmaps[:, i_comp] == 0))
+        countnoise[i_comp] = np.array(comp_noise_sel, dtype=np.int).sum()


I don't think this needs to be a for loop. From what I can tell this would broadcast without any issue:

comp_noise_sel = (np.abs(Z_maps) > z_thresh) & (Z_clmaps == 0) countnoise = np.sum(comp_noise_sel).sum(axis=0)

Summing the boolean should automatically cast it to int, too. Let me know if I'm misunderstanding the dimensions of Z_maps and Z_clmaps though!

Nice catch! This was originally part of a much larger for loop, so I just copied the loop for all of the smaller functions. This makes things much cleaner.

Also, the comp_noise_sel and countnoise calculation appear in this function and the following two functions. If we consolidate this (removing the for loop, as above), what do you think of explicitly calling this in compute_signal_minus_noise_z() and compute_signal_minus_noise_t()?

I need the noise index in those two more than the counts, so I'll be duplicating most of the work there anyway.

tedana/metrics/dependence.py

rmarkello · 2019-11-10T17:45:26Z

tedana/metrics/dependence.py

+    Z_maps = np.zeros(weights.shape)
+    for i_comp in range(n_components):
+        # compute weights as Z-values
+        weights_z = (weights[:, i_comp] - weights[:, i_comp].mean()) / weights[:, i_comp].std()


Why not scipy.stats.zscore instead? Also, should we be using ddof=1 for the standard deviation calculation?

I don't think this step should even be done. computefeats2 should return z-values (Fisher's R-to-Z performed on parameter estimates like they were Pearson r-values), but not z-statistics. To get from the Fisher's R-to-Z values to test statistics, we would then divide by sqrt(n-3), rather than z-scoring the resulting values. Although, again, that requires r-values going in and not parameter estimates from a multiple regression.

tedana/metrics/dependence.py

rmarkello · 2019-11-10T17:47:59Z

tedana/metrics/dependence.py

+def calculate_varex_norm(weights):
+    n_components = weights.shape[1]
+    totvar_norm = (weights**2).sum()
+
+    varex_norm = np.zeros(n_components)
+    for i_comp in range(n_components):
+        varex_norm[i_comp] = (weights[:, i_comp]**2).sum() / totvar_norm
+    return varex_norm


This is the same as calculate_varex() but just doesn't multiply by 100? We should consolidate these functions.

I'll clean it up like calculate_varex(), but I'd rather wait on merging until I believe in the variance explained calculations a bit more (per #317).

tedana/metrics/collect.py

rmarkello · 2019-11-10T17:57:33Z

tedana/metrics/dependence.py

+    for i_comp in range(n_components):
+        dice_values[i_comp] = utils.dice(Br_clmaps[:, i_comp], F_clmaps[:, i_comp])


Any way we could make utils.dice broadcastable? I haven't looked at the function in a while but if it's possible it would be really nice!

Should be good now.

@rmarkello

Lots of great refactoring by @rmarkello.

tedana/utils.py

tsalo · 2019-11-10T22:45:52Z

Closing this WIP and reopening as a draft. @rmarkello I'm sorry to dismiss your review by doing this, but I think I've addressed almost everything you've brought up so far.

tsalo added 10 commits November 8, 2019 10:27

Reorganize metrics.

57f95f1

Merge remote-tracking branch 'ME-ICA/master' into modularize-metrics

2a27d7d

Some more work on organizing metrics.

7e136e8

Add signal_minus_noise_z metric.

aae8a94

Variable name change.

cca8d55

Move comptable.

2a78cb9

Move T2* cap into decay function.

02c377d

Merge remote-tracking branch 'ME-ICA/master' into modularize-metrics

66b8726

Split up metric files.

a175191

Adjust cluster-extent thresholding to match across maps.

870207e