Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Improve CAPS reader #760

Merged
merged 16 commits into from
Oct 17, 2022

Conversation

NicolasGensollen
Copy link
Member

@NicolasGensollen NicolasGensollen commented Sep 30, 2022

This PR extracts from #729 the work related to the improvements of the CAPS reader which was necessary for making PETVolume work. It also does some refactoring as is explained below.

Definition and addition of a new decorator aggregator in clinica.utils.input_files.

This decorator allows to indifferently pass scalars and iterables to functions retrieving file information.

>>> [t1_volume_native_tpm_in_mni(tissue, mod) for tissue, mod in zip(tissues, modulations)] # before
>>> t1_volume_native_tpm_in_mni((1, 2), (True, False)) # after

This enables the CAPS reading engine to call functions from clinica.utils.input_files without the knowledge of what these functions expect. More precisely, it can just run some_func(*args, **kwargs) and gets either a dictionary or a list of dictionaries if some arguments were iterables.

Decouple the reading strategies

By introducing the CAPSFileDataGrabber and CAPSGroupDataGrabber, the file reading and group reading modes are separated which is clearer, closer to the single responsibility principle, and avoid the ugly hardcoding of the reader type inside the query (solution currently implemented).

Similarly, input reading tasks are separated in three: BIDS, CAPSFile, and CAPSGroup but rely on a common generic implementation add_input_reading_task.

Previously, the function caps_query was responsible for parsing the core workflow's input specs for reading CAPS which is of the following form:

{ "label" : kwargs to be passed to some function from clinica.utils.input_files }

into :

{ "label" : dictionary query that can be passed to CAPSDataGrabber }

caps_query was holding the mapping between the label keys and these functions. Based on the key labels, it was calling the appropriate function with the passed kwargs and was setting the type of reader (file vs group) to be used by the CAPSDataGrabber to execute the formatted query. Because of that, a bids query was a dictionary {"label": dict_query} while a caps query was `{"query": dict_query, "reader": "file/group"}.

This PR proposes to rely on different query classes (Query, BIDSQuery(Query), CAPSQuery(Query), CAPSFileQuery(CAPSQuery) and CAPSGroupQuery(CAPSQuery)).

The bids_query and caps_query functions are removed and each query class is responsible for maintaining its own mapping and formatting the queries in a suitable form for the data grabbers. The function bids_reader expects a BIDSQuery, while caps_reader expects a CAPSQuery and builds the proper CAPSDataGrabber based on the type of the caps query passed.

Example

Here is a small example of manually defining a CAPS file query and using the caps_reader to retrieve the corresponding files (note that we are querying files for 3 different tissue types):

>>> caps_dir = "/Users/nicolas.gensollen/GitRepos/clinica_data_ci/data_ci/PETVolume/in/caps"
>>> from clinica.pydra.query import CAPSFileQuery
>>> query = CAPSFileQuery({"mask_tissues": {"tissue_number": (1, 2, 3), "modulation": False}})
>>> from clinica.pydra.interfaces import caps_reader
>>> from clinica.pydra.engine_utils import run
>>> run(caps_reader(query, caps_dir))
A newer version (0.19) of nipype/pydra is available. You are using 0+untagged.1697.g67f7e72.dirty
221012-15:24:31,306 nipype.workflow INFO:
	 [Node] Setting-up "caps_reader_task" in "/private/var/folders/zk/9_vr9pfn69xcvgyvhkj59lpr000z9t/T/tmpd1a6ykdi/Nipype1Task_a64706badf031ecb0827b5d39c10e3aaff998c27c5008801b8217219147589cf/caps_reader_task".
221012-15:24:31,308 nipype.workflow INFO:
	 [Node] Executing "caps_reader_task" <clinica.pydra.interfaces.CAPSFileDataGrabber>
221012-15:24:38,81 nipype.workflow INFO:
	 [Node] Finished "caps_reader_task", elapsed time 6.708054s.
"Result(output=Inputs(mask_tissues=[['/Users/nicolas.gensollen/GitRepos/clinica_data_ci/data_ci/PETVolume/in/caps/subjects/...

@NicolasGensollen NicolasGensollen marked this pull request as ready for review September 30, 2022 17:21
@omar-rifai
Copy link
Contributor

Great ! I'll try to test this afternoon and get back to you.

@NicolasGensollen NicolasGensollen marked this pull request as draft October 6, 2022 09:21
@NicolasGensollen NicolasGensollen marked this pull request as ready for review October 7, 2022 08:48
@NicolasGensollen
Copy link
Member Author

@omar-rifai After merging #752 and rebasing I realized there were a few small issues due to the fact that the CAPSReader was trying to do too many things (i.e reading both with clinica_file_reader and clinica_group_reader depending on the query).

I propose here to split the CAPSReader in two: the CAPSFileReader which only reads "standard" CAPS files, and the CAPSGroupReader which only reads from groups folders. I also created classes for queries instead of relying on dicts of dicts.

Still unsure this is the best implementation, but I think it makes things a bit clearer. I'm open to suggestions though !

@ghisvail
Copy link
Collaborator

ghisvail commented Oct 7, 2022

I propose here to split the CAPSReader in two: the CAPSFileReader which only reads "standard" CAPS files, and the CAPSGroupReader which only reads from groups folders.

Any development that gets us closer to the Single Responsibility Principle is a good thing 👍

@NicolasGensollen
Copy link
Member Author

Anyone's up for doing a review ?

@ghisvail ghisvail self-requested a review October 11, 2022 11:27
@ghisvail
Copy link
Collaborator

Anyone's up for doing a review ?

I'm on it

Copy link
Collaborator

@ghisvail ghisvail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided a few suggestions where I could. Perhaps, some more explanation in the opening post as to what the situation was, how it was changed and what the benefits are would help.

As it stands right now, I need further guidelines to see where the pieces of improvement are.

I know you added this but it still was not enough for me.

EDIT: reference previous comment

clinica/utils/input_files.py Show resolved Hide resolved
clinica/utils/input_files.py Outdated Show resolved Hide resolved
Comment on lines 217 to 238
raise ValueError(f"Arguments must have the same length.")
if len(arg_sizes) == 0:
return func(*args, **kwargs)
arg_size = arg_sizes[0]
new_args = []
for arg in args:
if not isinstance(arg, Iterable):
new_args.append((arg,) * arg_size)
else:
new_args.append(arg)
new_kwargs = [{} for _ in range(arg_size)]
for k, arg in kwargs.items():
for i in range(len(new_kwargs)):
if not isinstance(arg, Iterable):
new_kwargs[i][k] = arg
else:
new_kwargs[i][k] = arg[i]
if len(new_args) == 0:
return [func(**x) for x in new_kwargs]
elif len(new_kwargs) == 0:
return [func(*x) for x in zip(*new_args)]
return [func(*x, **y) for x, y in zip(zip(*new_args), new_kwargs)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is painful to read without comments to dissociate the cases covered by this decorator and their intermediate steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not proud of this function and I totally get that it is very painful to read.

I initially thought there was a more elegant/clever way to do this but couldn't find one. If you have suggestions/ideas I'll take them.

To try answering your points, I added a few things in 80b8d88:

  • Basic unit tests for this function
  • Examples and basic explanation of why we need this
  • A few comments to break and explain the chunks of code

Hopefully this is enough to understand what this decorator does and how to use it.

test/unittests/pydra/test_interfaces.py Outdated Show resolved Hide resolved
test/unittests/pydra/test_interfaces.py Outdated Show resolved Hide resolved
@ghisvail
Copy link
Collaborator

I am thinking, perhaps what we are doing with BIDS, CAPS and CAPS group queries is a good use case for using typed dictionaries.

At the end of the day, the current *Query classes only act as proxies to an internal dict without specifying its own encapsulation protocol. Since those classes are used to distinguish different kinds of queries, but we kind of want to keep access to the same dictionary protocol underneath, I think that's exactly what TypedDict + isinstanceof can give us with a clearer intent and limited overhead.

Copy link
Contributor

@omar-rifai omar-rifai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree with Ghislain on the constructor default which could be simplified, but otherwise looks good. Thanks.

clinica/pydra/engine.py Show resolved Hide resolved
@NicolasGensollen
Copy link
Member Author

I am thinking, perhaps what we are doing with BIDS, CAPS and CAPS group queries is a good use case for using typed dictionaries.

At the end of the day, the current *Query classes only act as proxies to an internal dict without specifying its own encapsulation protocol. Since those classes are used to distinguish different kinds of queries, but we kind of want to keep access to the same dictionary protocol underneath, I think that's exactly what TypedDict + isinstanceof can give us with a clearer intent and limited overhead.

I see your point but I don't know if we can get the same behavior with TypedDict as the one implemented here. I'll have a closer look.

@ghisvail
Copy link
Collaborator

Another code smell that makes me think the design is not optimal is that the Liskov substitution principle is broken for CAPSQuery w.r.t. the base Query on the abstract method combine_queries. The function signatures should not mismatch.

Copy link
Collaborator

@ghisvail ghisvail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.

@ghisvail ghisvail self-requested a review October 17, 2022 08:35
Copy link
Collaborator

@ghisvail ghisvail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.

@ghisvail
Copy link
Collaborator

Ok GitHub won't remove my request for changes for some reasons, but that's ok 😅

@NicolasGensollen NicolasGensollen merged commit a124170 into aramis-lab:dev Oct 17, 2022
@NicolasGensollen NicolasGensollen deleted the caps-reader-enh branch October 17, 2022 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants