Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Modularize metric calculation #436

Closed
wants to merge 15 commits into from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Nov 8, 2019

Closes #320.

Changes proposed in this pull request:

  • Move individual metric calculations from one large-scale function into their own smaller functions.
  • Add wrapper function that flexibly allow users (or more likely decision trees) to request any set of metrics and to get back a component table with those metrics.

Copy link
Member

@rmarkello rmarkello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass (from my phone!). Will continue from the computer but didn't want to lose these initial comments. Overall it's looking good!!

'data_optcom ({1}), and mixing ({2}) do not '
'match.'.format(data_cat.shape[2], data_optcom.shape[1], mixing.shape[0]))

mixing = mixing.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this copy is unnecessary, here, since the next time it's used is providing mixing to flip_components () which will trigger a copy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should add a check for if mixing_z is None and assign mixing to mixing_z.



def generate_metrics(comptable, data_cat, data_optcom, mixing, mask, tes, ref_img, mixing_z=None,
metrics=['kappa', 'rho'], sort_by='kappa', ascending=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's generally best to not have a mutable object as a default parameter. Can we set metrics to None and then do a check in the function?



def calculate_betas(data_optcom, mixing):
assert data_optcom.shape[1] == mixing.shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a ValueError with a brief message instead of an assertion error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more informative checks (and tests) later on. I just wanted to throw a couple of checks in here as an example.



def calculate_psc(data_optcom, optcom_betas):
assert data_optcom.shape[1] == optcom_betas.shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above re: ValueError instead of an assert.

# index voxels significantly loading on component but not from clusters
comp_noise_sel = ((np.abs(Z_maps[:, i_comp]) > z_thresh) &
(Z_clmaps[:, i_comp] == 0))
countnoise[i_comp] = np.array(comp_noise_sel, dtype=np.int).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a for loop. From what I can tell this would broadcast without any issue:

comp_noise_sel = (np.abs(Z_maps) > z_thresh) & (Z_clmaps == 0)
countnoise = np.sum(comp_noise_sel).sum(axis=0)

Summing the boolean should automatically cast it to int, too. Let me know if I'm misunderstanding the dimensions of Z_maps and Z_clmaps though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This was originally part of a much larger for loop, so I just copied the loop for all of the smaller functions. This makes things much cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the comp_noise_sel and countnoise calculation appear in this function and the following two functions. If we consolidate this (removing the for loop, as above), what do you think of explicitly calling this in compute_signal_minus_noise_z() and compute_signal_minus_noise_t()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the noise index in those two more than the counts, so I'll be duplicating most of the work there anyway.

tedana/metrics/dependence.py Outdated Show resolved Hide resolved
Z_maps = np.zeros(weights.shape)
for i_comp in range(n_components):
# compute weights as Z-values
weights_z = (weights[:, i_comp] - weights[:, i_comp].mean()) / weights[:, i_comp].std()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not scipy.stats.zscore instead? Also, should we be using ddof=1 for the standard deviation calculation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this step should even be done. computefeats2 should return z-values (Fisher's R-to-Z performed on parameter estimates like they were Pearson r-values), but not z-statistics. To get from the Fisher's R-to-Z values to test statistics, we would then divide by sqrt(n-3), rather than z-scoring the resulting values. Although, again, that requires r-values going in and not parameter estimates from a multiple regression.

Comment on lines 205 to 212
def calculate_varex_norm(weights):
n_components = weights.shape[1]
totvar_norm = (weights**2).sum()

varex_norm = np.zeros(n_components)
for i_comp in range(n_components):
varex_norm[i_comp] = (weights[:, i_comp]**2).sum() / totvar_norm
return varex_norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as calculate_varex() but just doesn't multiply by 100? We should consolidate these functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean it up like calculate_varex(), but I'd rather wait on merging until I believe in the variance explained calculations a bit more (per #317).

Comment on lines 146 to 147
for i_comp in range(n_components):
dice_values[i_comp] = utils.dice(Br_clmaps[:, i_comp], F_clmaps[:, i_comp])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we could make utils.dice broadcastable? I haven't looked at the function in a while but if it's possible it would be really nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now.

Lots of great refactoring by @rmarkello.
tedana/utils.py Outdated Show resolved Hide resolved
@tsalo tsalo changed the title [WIP, REF] Modularize metric calculation [REF] Modularize metric calculation Nov 10, 2019
@tsalo
Copy link
Member Author

tsalo commented Nov 10, 2019

Closing this WIP and reopening as a draft. @rmarkello I'm sorry to dismiss your review by doing this, but I think I've addressed almost everything you've brought up so far.

@tsalo tsalo closed this Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move component sign determination into a separate function
2 participants