-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track selection criteria provenance across workflows #58
Comments
@handwerkerd @emdupre I'm really not familiar with the component selection code (yet), but can we make The header contains information, such as number of rejected and accepted components, that can mostly be easily extrapolated from the statistic table. The columns at the bottom of the file are "comp", "Kappa", "Rho", "Var", and "Var (norm)". If we add another column (say, "label" or "classification status" or whatever) with values of "accepted", "rejected", and "ignored", along with a column for the rationale, we could log all of the necessary information in a more concise and parsable format (especially if more metrics are ever added). Tracking the rationale will take some time to figure out and is probably better handled by someone other than me, but I'd be happy to get the rest of it working and to lay the groundwork for next steps. WDYT? |
I see this as two separate parts. The more significant part is how do we set up a data structure to keep track of this information across multiple functions and workflows? Once that is figured out, deciding how to output the information will require discussion, but will be relatively easy. I could see some information ending up in comp_table.txt. What makes this really tricky is that the current default workflow is a fairly complex decision tree where some selection decisions depend on earlier conditional clauses. Without a major re-working of the code, a variable could track where in the code a decision happened and that information could be outputted, but only power-users would be able to interpret this output. If I was starting from scratch, I'd like to have every decision point be a separate function with a unique label. Then a selcomp-like function would only include a decision tree with calls to separate functions. The program output would include the decision tree. In the simplest form, this would be copying the decision tree function into the TED directory output, or there would be a wrapper function that turns the decision tree function into something that's slightly more human readable. At that point, comp_table.txt or some other document could list where in the decision tree a selection was made using only a single identifier per component. Also "comp", "Kappa", "Rho", "Var", and "Var (norm)" are what's saved in the output, but there are over 10 more very distinct metrics that are included in the current decision tree. I could see benefits to adding these to comp_table.txt. That said, since the calculated metrics may vary across workflows, including all these data would make comp_table.txt less consistent and it would require anyone reading comp_table.txt to use column labels rather than just taking the values from column number X. I'd lean towards adding these values to comp_table.txt, but if they make that file get too messy, it might be worth re-considering. |
As a start, I've been working on converting the component tables from At minimum, it should make it easier to read in the component tables if they're spreadsheets, plus selecting columns based on names is, in my opinion, more robust than just using the column numbers. Column order can be changed without raising an error (resulting in the wrong metrics being used with the wrong criteria), but if the column names change (or differ between methods), then the workflow will break. |
This probably is something you already know, but there is currently a variable Lines 230 to 235 in 0c5f93d
As of now, it's a dictionary, but I can see the benefits of turning it into a pandas DataFrame to allow some more structured manipulations. That said, seldict contains both single values/component, like Kappa , Rho & varex , and it also contains matrices of voxelwise information, like Z_maps & F_R2_maps One option would be to keep seldict as is and then make a separate DataFrame that copies stuff form seldict and also includes other stuff or to break seldict into a data frame that just includes single-value metric and another structure that includes the volumes.
As for how they're saved, the current comp_table.txt is similar to your comp_table_ica.txt except it includes some comments at the top with added information. I think both approaches could be fine. |
I'm not sure how to handle Yeah, for the ICA component table the exact selection criteria were a bit harder to track (as you know), so unfortunately I haven't incorporated something like "rationale" yet. All I really did was drop the header and move header-specific information (e.g., component classifications) to the table. If there's any important information in the header that I missed I can incorporate it into the DataFrame. However, the PCA component table does include the rationale (although I did somehow manage to mess things up there, which is why I haven't opened a PR for my changes) as a semicolon-separated list of reasons in a single column. |
As the efforts to track selection criteria move forward, I want to make a push to remove all references to "MidK" and "Ignored" components from the code. Those terms are dangerously confusing since MidK essentially means reject, for reasons besides the kappa/rho relationship. Ignore means keep because variance for the component "too low to matter" This is probably the main unneeded question I've had to answer from too many end users. As per the discussion above, components should be classified only based on whether they were accepted or rejected along with provenance information on why. |
I totally agree with this, and will make sure that #122 labels the components appropriately. However, I believe that we do need to keep some kind of distinction between accepted and ignored, because the ignored (but not accepted) components should be retained during the minimum image regression GSR here. I have a PR open (#125) to fix an issue with the way it's currently implemented, but as far as I can tell, the ignored components should be treated differently from the accepted components. |
However "ignored" are retained, they should not be called "ignored." I've seen too many situations where people very reasonably assume that something that is ignored is removed from the data. "LowVarianceRetained" is wordy, but more accurate. I'm very open to thoughts on less wordy labels. There's a longer discussion on whether anything should be retained merely because it's low variance, but I don't think we're quite at the point where that discussion is needed. |
Okay, so we need at least three labels- BOLD-like components (accepted), non-BOLD components (rejected), and low variance components that are retained in the denoised, but not the high-Kappa, results (currently ignored). However, I wonder if this list is comprehensive. Is it possible for other people to come up with classification schemes that require more than these three labels? What about "retained"? One thing that we can do is to improve the documentation for the component tables. @emdupre brought up a good point in #119 here, that the rationales are hard to parse. We can replace my long-winded human-readable expressions with short codes, which we can document on the RTD site. If we want to build a table like the following, then that's also a great place to talk about what the different classifications (accepted, rejected, ignored/retained/LowVarianceRetained) mean.
|
@tsalo can you add in the screenshots you showed me, here ? They look lovely, and I think show that these codes are really nice to have !! It might make sub-selecting components for visual inspection much easier, later. One idea for now: we could also have a think on what the best code-naming scheme is, if any. |
I've added the screenshots below. The naming convention I came up with is to use the prefix We can obviously change that naming convention if anyone has any other preferences. |
These rationales and their codes were implemented in #164. Is this issue ready to close? |
I'm ready to close if you and @handwerkerd are ! |
Looks good to me. I don't have a chance to dig into this part of the code right now, but the visual output looks really useful for trying to figure out what happened during a specific run. My one Q is whether this info is also stored in a way that facilitates easy comparisons across runs. For example, is there a saved dataframe or a readable CSV file with this info? |
The component tables are currently outputted in a CSV (but will be replaced with jsons in #152 to follow BIDS convention). They have a rationale column with the code associated with a given reason for being rejected or ignored. It should be pretty easy for users to compare across runs. |
This has been directly or indirectly discussed in a few other issues ( #16 #35 #50 ), but I think it's worth it's own issue. A problem I've repeatedly come across when trying to understand the output of the original tedana is: When a component is rejected or in midK for non-obvious reasons, it is sometimes very hard to figure out what selection criterion was used for it's placement without digging into the code. For example see "Reasons for Fork" at https://bitbucket.org/BenGutierrez/me-ica which was made because the default high variance selection criteria in meica meant that a component representing the visual cortex response to a block design flashing checkerboard was regularly getting rejected not because of it's TE dependence but because it's variance was above a hard-coded threshold.
As part of the #16 efforts to break up the selection components into multiple functions, it would be useful to create a data structure (or add fields to an existing data structure) that tracks which selection criteria are applied and where each component falls within each criteria. This might get a bit tricky because, for some selection methods, including the one in the original tedana, several selection criteria are combined across a decision tree, so it might be necessary to store the decision tree in some way.
I think there is a separate issue regarding how to give this information to the end user in an easily digestible way, but I wanted to put this idea out there as something to consider including at a low level rather than trying later retrofit this into the code.
I will comment here, and I may be able to personally work on this code, but don't count on me for substantive code changes before late June.
The text was updated successfully, but these errors were encountered: