-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REF] Modularize metric calculation #436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass (from my phone!). Will continue from the computer but didn't want to lose these initial comments. Overall it's looking good!!
'data_optcom ({1}), and mixing ({2}) do not ' | ||
'match.'.format(data_cat.shape[2], data_optcom.shape[1], mixing.shape[0])) | ||
|
||
mixing = mixing.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this copy is unnecessary, here, since the next time it's used is providing mixing
to flip_components ()
which will trigger a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we should add a check for if mixing_z is None
and assign mixing
to mixing_z
.
tedana/metrics/collect.py
Outdated
|
||
|
||
def generate_metrics(comptable, data_cat, data_optcom, mixing, mask, tes, ref_img, mixing_z=None, | ||
metrics=['kappa', 'rho'], sort_by='kappa', ascending=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's generally best to not have a mutable object as a default parameter. Can we set metrics
to None
and then do a check in the function?
|
||
|
||
def calculate_betas(data_optcom, mixing): | ||
assert data_optcom.shape[1] == mixing.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a ValueError
with a brief message instead of an assertion error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more informative checks (and tests) later on. I just wanted to throw a couple of checks in here as an example.
|
||
|
||
def calculate_psc(data_optcom, optcom_betas): | ||
assert data_optcom.shape[1] == optcom_betas.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above re: ValueError
instead of an assert.
tedana/metrics/dependence.py
Outdated
# index voxels significantly loading on component but not from clusters | ||
comp_noise_sel = ((np.abs(Z_maps[:, i_comp]) > z_thresh) & | ||
(Z_clmaps[:, i_comp] == 0)) | ||
countnoise[i_comp] = np.array(comp_noise_sel, dtype=np.int).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a for loop. From what I can tell this would broadcast without any issue:
comp_noise_sel = (np.abs(Z_maps) > z_thresh) & (Z_clmaps == 0)
countnoise = np.sum(comp_noise_sel).sum(axis=0)
Summing the boolean should automatically cast it to int, too. Let me know if I'm misunderstanding the dimensions of Z_maps
and Z_clmaps
though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! This was originally part of a much larger for loop, so I just copied the loop for all of the smaller functions. This makes things much cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the comp_noise_sel
and countnoise
calculation appear in this function and the following two functions. If we consolidate this (removing the for loop, as above), what do you think of explicitly calling this in compute_signal_minus_noise_z()
and compute_signal_minus_noise_t()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the noise index in those two more than the counts, so I'll be duplicating most of the work there anyway.
tedana/metrics/dependence.py
Outdated
Z_maps = np.zeros(weights.shape) | ||
for i_comp in range(n_components): | ||
# compute weights as Z-values | ||
weights_z = (weights[:, i_comp] - weights[:, i_comp].mean()) / weights[:, i_comp].std() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not scipy.stats.zscore
instead? Also, should we be using ddof=1
for the standard deviation calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this step should even be done. computefeats2
should return z-values (Fisher's R-to-Z performed on parameter estimates like they were Pearson r-values), but not z-statistics. To get from the Fisher's R-to-Z values to test statistics, we would then divide by sqrt(n-3), rather than z-scoring the resulting values. Although, again, that requires r-values going in and not parameter estimates from a multiple regression.
tedana/metrics/dependence.py
Outdated
def calculate_varex_norm(weights): | ||
n_components = weights.shape[1] | ||
totvar_norm = (weights**2).sum() | ||
|
||
varex_norm = np.zeros(n_components) | ||
for i_comp in range(n_components): | ||
varex_norm[i_comp] = (weights[:, i_comp]**2).sum() / totvar_norm | ||
return varex_norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as calculate_varex()
but just doesn't multiply by 100? We should consolidate these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean it up like calculate_varex()
, but I'd rather wait on merging until I believe in the variance explained calculations a bit more (per #317).
tedana/metrics/dependence.py
Outdated
for i_comp in range(n_components): | ||
dice_values[i_comp] = utils.dice(Br_clmaps[:, i_comp], F_clmaps[:, i_comp]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we could make utils.dice
broadcastable? I haven't looked at the function in a while but if it's possible it would be really nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now.
Lots of great refactoring by @rmarkello.
Closing this WIP and reopening as a draft. @rmarkello I'm sorry to dismiss your review by doing this, but I think I've addressed almost everything you've brought up so far. |
Closes #320.
Changes proposed in this pull request: