-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Improve CAPS reader #760
[ENH] Improve CAPS reader #760
Conversation
Great ! I'll try to test this afternoon and get back to you. |
9db0ce9
to
8807b0f
Compare
@omar-rifai After merging #752 and rebasing I realized there were a few small issues due to the fact that the I propose here to split the Still unsure this is the best implementation, but I think it makes things a bit clearer. I'm open to suggestions though ! |
Any development that gets us closer to the Single Responsibility Principle is a good thing 👍 |
31b53e2
to
e729a9d
Compare
Anyone's up for doing a review ? |
I'm on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided a few suggestions where I could. Perhaps, some more explanation in the opening post as to what the situation was, how it was changed and what the benefits are would help.
As it stands right now, I need further guidelines to see where the pieces of improvement are.
I know you added this but it still was not enough for me.
EDIT: reference previous comment
clinica/utils/input_files.py
Outdated
raise ValueError(f"Arguments must have the same length.") | ||
if len(arg_sizes) == 0: | ||
return func(*args, **kwargs) | ||
arg_size = arg_sizes[0] | ||
new_args = [] | ||
for arg in args: | ||
if not isinstance(arg, Iterable): | ||
new_args.append((arg,) * arg_size) | ||
else: | ||
new_args.append(arg) | ||
new_kwargs = [{} for _ in range(arg_size)] | ||
for k, arg in kwargs.items(): | ||
for i in range(len(new_kwargs)): | ||
if not isinstance(arg, Iterable): | ||
new_kwargs[i][k] = arg | ||
else: | ||
new_kwargs[i][k] = arg[i] | ||
if len(new_args) == 0: | ||
return [func(**x) for x in new_kwargs] | ||
elif len(new_kwargs) == 0: | ||
return [func(*x) for x in zip(*new_args)] | ||
return [func(*x, **y) for x, y in zip(zip(*new_args), new_kwargs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is painful to read without comments to dissociate the cases covered by this decorator and their intermediate steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not proud of this function and I totally get that it is very painful to read.
I initially thought there was a more elegant/clever way to do this but couldn't find one. If you have suggestions/ideas I'll take them.
To try answering your points, I added a few things in 80b8d88:
- Basic unit tests for this function
- Examples and basic explanation of why we need this
- A few comments to break and explain the chunks of code
Hopefully this is enough to understand what this decorator does and how to use it.
I am thinking, perhaps what we are doing with BIDS, CAPS and CAPS group queries is a good use case for using typed dictionaries. At the end of the day, the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree with Ghislain on the constructor default which could be simplified, but otherwise looks good. Thanks.
Co-authored-by: Ghislain Vaillant <[email protected]>
I see your point but I don't know if we can get the same behavior with |
Another code smell that makes me think the design is not optimal is that the Liskov substitution principle is broken for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.
Ok GitHub won't remove my request for changes for some reasons, but that's ok 😅 |
This PR extracts from #729 the work related to the improvements of the CAPS reader which was necessary for making
PETVolume
work. It also does some refactoring as is explained below.Definition and addition of a new decorator
aggregator
inclinica.utils.input_files
.This decorator allows to indifferently pass scalars and iterables to functions retrieving file information.
This enables the CAPS reading engine to call functions from
clinica.utils.input_files
without the knowledge of what these functions expect. More precisely, it can just runsome_func(*args, **kwargs)
and gets either a dictionary or a list of dictionaries if some arguments were iterables.Decouple the reading strategies
By introducing the
CAPSFileDataGrabber
andCAPSGroupDataGrabber
, the file reading and group reading modes are separated which is clearer, closer to the single responsibility principle, and avoid the ugly hardcoding of the reader type inside the query (solution currently implemented).Similarly, input reading tasks are separated in three: BIDS, CAPSFile, and CAPSGroup but rely on a common generic implementation
add_input_reading_task
.Previously, the function
caps_query
was responsible for parsing the core workflow's input specs for reading CAPS which is of the following form:into :
caps_query
was holding the mapping between the label keys and these functions. Based on the key labels, it was calling the appropriate function with the passed kwargs and was setting the type of reader (file vs group) to be used by the CAPSDataGrabber to execute the formatted query. Because of that, a bids query was a dictionary{"label": dict_query}
while a caps query was `{"query": dict_query, "reader": "file/group"}.This PR proposes to rely on different query classes (
Query
,BIDSQuery(Query)
,CAPSQuery(Query)
,CAPSFileQuery(CAPSQuery)
andCAPSGroupQuery(CAPSQuery)
).The
bids_query
andcaps_query
functions are removed and each query class is responsible for maintaining its own mapping and formatting the queries in a suitable form for the data grabbers. The functionbids_reader
expects aBIDSQuery
, whilecaps_reader
expects aCAPSQuery
and builds the proper CAPSDataGrabber based on the type of the caps query passed.Example
Here is a small example of manually defining a CAPS file query and using the
caps_reader
to retrieve the corresponding files (note that we are querying files for 3 different tissue types):