Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track selection criteria provenance across workflows #58

Closed
handwerkerd opened this issue May 17, 2018 · 15 comments
Closed

Track selection criteria provenance across workflows #58

handwerkerd opened this issue May 17, 2018 · 15 comments
Labels
enhancement issues describing possible enhancements to the project

Comments

@handwerkerd
Copy link
Member

This has been directly or indirectly discussed in a few other issues ( #16 #35 #50 ), but I think it's worth it's own issue. A problem I've repeatedly come across when trying to understand the output of the original tedana is: When a component is rejected or in midK for non-obvious reasons, it is sometimes very hard to figure out what selection criterion was used for it's placement without digging into the code. For example see "Reasons for Fork" at https://bitbucket.org/BenGutierrez/me-ica which was made because the default high variance selection criteria in meica meant that a component representing the visual cortex response to a block design flashing checkerboard was regularly getting rejected not because of it's TE dependence but because it's variance was above a hard-coded threshold.

As part of the #16 efforts to break up the selection components into multiple functions, it would be useful to create a data structure (or add fields to an existing data structure) that tracks which selection criteria are applied and where each component falls within each criteria. This might get a bit tricky because, for some selection methods, including the one in the original tedana, several selection criteria are combined across a decision tree, so it might be necessary to store the decision tree in some way.

I think there is a separate issue regarding how to give this information to the end user in an easily digestible way, but I wanted to put this idea out there as something to consider including at a low level rather than trying later retrofit this into the code.

I will comment here, and I may be able to personally work on this code, but don't count on me for substantive code changes before late June.

@emdupre emdupre added the enhancement issues describing possible enhancements to the project label May 27, 2018
@tsalo
Copy link
Member

tsalo commented Jul 24, 2018

@handwerkerd @emdupre I'm really not familiar with the component selection code (yet), but can we make comp_table.txt a csv or tab-delimited text file with this information? I think that a spreadsheet makes more sense to track metrics and selection criteria than the current text file format.

The header contains information, such as number of rejected and accepted components, that can mostly be easily extrapolated from the statistic table. The columns at the bottom of the file are "comp", "Kappa", "Rho", "Var", and "Var (norm)". If we add another column (say, "label" or "classification status" or whatever) with values of "accepted", "rejected", and "ignored", along with a column for the rationale, we could log all of the necessary information in a more concise and parsable format (especially if more metrics are ever added). Tracking the rationale will take some time to figure out and is probably better handled by someone other than me, but I'd be happy to get the rest of it working and to lay the groundwork for next steps. WDYT?

@handwerkerd
Copy link
Member Author

I see this as two separate parts. The more significant part is how do we set up a data structure to keep track of this information across multiple functions and workflows? Once that is figured out, deciding how to output the information will require discussion, but will be relatively easy.

I could see some information ending up in comp_table.txt. What makes this really tricky is that the current default workflow is a fairly complex decision tree where some selection decisions depend on earlier conditional clauses. Without a major re-working of the code, a variable could track where in the code a decision happened and that information could be outputted, but only power-users would be able to interpret this output.

If I was starting from scratch, I'd like to have every decision point be a separate function with a unique label. Then a selcomp-like function would only include a decision tree with calls to separate functions. The program output would include the decision tree. In the simplest form, this would be copying the decision tree function into the TED directory output, or there would be a wrapper function that turns the decision tree function into something that's slightly more human readable. At that point, comp_table.txt or some other document could list where in the decision tree a selection was made using only a single identifier per component.

Also "comp", "Kappa", "Rho", "Var", and "Var (norm)" are what's saved in the output, but there are over 10 more very distinct metrics that are included in the current decision tree. I could see benefits to adding these to comp_table.txt. That said, since the calculated metrics may vary across workflows, including all these data would make comp_table.txt less consistent and it would require anyone reading comp_table.txt to use column labels rather than just taking the values from column number X. I'd lean towards adding these values to comp_table.txt, but if they make that file get too messy, it might be worth re-considering.

@tsalo
Copy link
Member

tsalo commented Aug 1, 2018

As a start, I've been working on converting the component tables from tedpca and tedica to pandas DataFrames. I haven't started working on incorporating things from selcomps, but I plan to incorporate metrics computed there as well. I'm attaching two example files here. What do you think?

At minimum, it should make it easier to read in the component tables if they're spreadsheets, plus selecting columns based on names is, in my opinion, more robust than just using the column numbers. Column order can be changed without raising an error (resulting in the wrong metrics being used with the wrong criteria), but if the column names change (or differ between methods), then the workflow will break.

@handwerkerd
Copy link
Member Author

This probably is something you already know, but there is currently a variable seldict which contains some of the selection criteria information. That variable should be written out into compseldata.pklbz so some of this information is already being saved.
seldict is initialized here:

tedana/tedana/model/fit.py

Lines 230 to 235 in 0c5f93d

seldict = {}
selvars = ['Kappas', 'Rhos', 'WTS', 'varex', 'Z_maps', 'F_R2_maps',
'F_S0_maps', 'Z_clmaps', 'F_R2_clmaps', 'F_S0_clmaps',
'tsoc_B', 'Br_clmaps_R2', 'Br_clmaps_S0', 'PSC']
for vv in selvars:
seldict[vv] = eval(vv)

As of now, it's a dictionary, but I can see the benefits of turning it into a pandas DataFrame to allow some more structured manipulations. That said, seldict contains both single values/component, like Kappa, Rho & varex, and it also contains matrices of voxelwise information, like Z_maps & F_R2_maps
One option would be to keep seldict as is and then make a separate DataFrame that copies stuff form seldict and also includes other stuff or to break seldict into a data frame that just includes single-value metric and another structure that includes the volumes.

As for how they're saved, the current comp_table.txt is similar to your comp_table_ica.txt except it includes some comments at the top with added information. I think both approaches could be fine.

@tsalo
Copy link
Member

tsalo commented Aug 4, 2018

I'm not sure how to handle seldict, though I suspect that a spreadsheet would be a lot easier to make sense of for users than a dictionary. I don't know how to deal with the voxelwise information. Perhaps they could be saved to nifti files in a subdirectory? Certainly I like your second suggestion (i.e., split seldict into metrics in a DataFrame and maps in some other structure).

Yeah, for the ICA component table the exact selection criteria were a bit harder to track (as you know), so unfortunately I haven't incorporated something like "rationale" yet. All I really did was drop the header and move header-specific information (e.g., component classifications) to the table. If there's any important information in the header that I missed I can incorporate it into the DataFrame.

However, the PCA component table does include the rationale (although I did somehow manage to mess things up there, which is why I haven't opened a PR for my changes) as a semicolon-separated list of reasons in a single column.

@handwerkerd
Copy link
Member Author

As the efforts to track selection criteria move forward, I want to make a push to remove all references to "MidK" and "Ignored" components from the code. Those terms are dangerously confusing since MidK essentially means reject, for reasons besides the kappa/rho relationship. Ignore means keep because variance for the component "too low to matter" This is probably the main unneeded question I've had to answer from too many end users.

As per the discussion above, components should be classified only based on whether they were accepted or rejected along with provenance information on why.

@tsalo
Copy link
Member

tsalo commented Oct 29, 2018

I totally agree with this, and will make sure that #122 labels the components appropriately.

However, I believe that we do need to keep some kind of distinction between accepted and ignored, because the ignored (but not accepted) components should be retained during the minimum image regression GSR here. I have a PR open (#125) to fix an issue with the way it's currently implemented, but as far as I can tell, the ignored components should be treated differently from the accepted components.

@handwerkerd
Copy link
Member Author

However "ignored" are retained, they should not be called "ignored." I've seen too many situations where people very reasonably assume that something that is ignored is removed from the data. "LowVarianceRetained" is wordy, but more accurate. I'm very open to thoughts on less wordy labels. There's a longer discussion on whether anything should be retained merely because it's low variance, but I don't think we're quite at the point where that discussion is needed.

@tsalo tsalo mentioned this issue Nov 20, 2018
2 tasks
@tsalo
Copy link
Member

tsalo commented Dec 2, 2018

Okay, so we need at least three labels- BOLD-like components (accepted), non-BOLD components (rejected), and low variance components that are retained in the denoised, but not the high-Kappa, results (currently ignored). However, I wonder if this list is comprehensive. Is it possible for other people to come up with classification schemes that require more than these three labels?

What about "retained"?

One thing that we can do is to improve the documentation for the component tables. @emdupre brought up a good point in #119 here, that the rationales are hard to parse. We can replace my long-winded human-readable expressions with short codes, which we can document on the RTD site. If we want to build a table like the following, then that's also a great place to talk about what the different classifications (accepted, rejected, ignored/retained/LowVarianceRetained) mean.

Code Description Classification Algorithm(s)
R001 S0 dice is higher than R2 dice and high variance explained rejected Kundu v2.5 TEDICA
R002 Rho>Kappa or more significant voxels in S0 model than R2 model rejected Kundu v2.5 TEDICA
R003 Physiological artifact ignored/LowVarianceRetained Kundu v3.2 TEDICA
R004 Rho below fmin threshold rejected Kundu stabilized decision tree TEDPCA
Classification Description
accepted BOLD-like components retained in denoised and high-Kappa data
rejected non-BOLD components removed from denoised and high-Kappa data
ignored/LowVarianceRetained low-variance components retained in denoised, but not high-Kappa, data

@emdupre
Copy link
Member

emdupre commented Dec 2, 2018

@tsalo can you add in the screenshots you showed me, here ? They look lovely, and I think show that these codes are really nice to have !!

It might make sub-selecting components for visual inspection much easier, later. One idea for now: we could also have a think on what the best code-naming scheme is, if any.

@tsalo
Copy link
Member

tsalo commented Dec 2, 2018

I've added the screenshots below.

The naming convention I came up with is to use the prefix P for TEDPCA rationales and the prefix I for TEDICA rationales. Algorithm-specific codes are separated by 100, so the codes for rationales corresponding to v2.5 start at I001 (or I000 I guess) and conceivably could go to I099, while the ones for v3.2 start at I101 (I100) and could go to I199. Codes that are shared across methods are just set once at the first method, so that a rejection rationale like mid-Kappa, which can come from v2.5 or v3.2, is labeled I006.

We can obviously change that naming convention if anyone has any other preferences.

screen shot 2018-12-02 at 12 24 49 pm

screen shot 2018-12-02 at 12 24 57 pm

screen shot 2018-12-02 at 12 24 57 pm

@tsalo
Copy link
Member

tsalo commented Mar 20, 2019

These rationales and their codes were implemented in #164. Is this issue ready to close?

@emdupre
Copy link
Member

emdupre commented Mar 20, 2019

I'm ready to close if you and @handwerkerd are !

@handwerkerd
Copy link
Member Author

Looks good to me. I don't have a chance to dig into this part of the code right now, but the visual output looks really useful for trying to figure out what happened during a specific run. My one Q is whether this info is also stored in a way that facilitates easy comparisons across runs. For example, is there a saved dataframe or a readable CSV file with this info?

@tsalo
Copy link
Member

tsalo commented Mar 20, 2019

The component tables are currently outputted in a CSV (but will be replaced with jsons in #152 to follow BIDS convention). They have a rationale column with the code associated with a given reason for being rejected or ignored. It should be pretty easy for users to compare across runs.

@tsalo tsalo closed this as completed Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement issues describing possible enhancements to the project
Projects
None yet
Development

No branches or pull requests

3 participants