[ENH] Improve CAPS reader #760

NicolasGensollen · 2022-09-30T17:20:00Z

This PR extracts from #729 the work related to the improvements of the CAPS reader which was necessary for making PETVolume work. It also does some refactoring as is explained below.

Definition and addition of a new decorator `aggregator` in `clinica.utils.input_files`.

This decorator allows to indifferently pass scalars and iterables to functions retrieving file information.

>>> [t1_volume_native_tpm_in_mni(tissue, mod) for tissue, mod in zip(tissues, modulations)] # before
>>> t1_volume_native_tpm_in_mni((1, 2), (True, False)) # after

This enables the CAPS reading engine to call functions from clinica.utils.input_files without the knowledge of what these functions expect. More precisely, it can just run some_func(*args, **kwargs) and gets either a dictionary or a list of dictionaries if some arguments were iterables.

Decouple the reading strategies

By introducing the CAPSFileDataGrabber and CAPSGroupDataGrabber, the file reading and group reading modes are separated which is clearer, closer to the single responsibility principle, and avoid the ugly hardcoding of the reader type inside the query (solution currently implemented).

Similarly, input reading tasks are separated in three: BIDS, CAPSFile, and CAPSGroup but rely on a common generic implementation add_input_reading_task.

Previously, the function caps_query was responsible for parsing the core workflow's input specs for reading CAPS which is of the following form:

{ "label" : kwargs to be passed to some function from clinica.utils.input_files }

into :

{ "label" : dictionary query that can be passed to CAPSDataGrabber }

caps_query was holding the mapping between the label keys and these functions. Based on the key labels, it was calling the appropriate function with the passed kwargs and was setting the type of reader (file vs group) to be used by the CAPSDataGrabber to execute the formatted query. Because of that, a bids query was a dictionary {"label": dict_query} while a caps query was `{"query": dict_query, "reader": "file/group"}.

This PR proposes to rely on different query classes (Query, BIDSQuery(Query), CAPSQuery(Query), CAPSFileQuery(CAPSQuery) and CAPSGroupQuery(CAPSQuery)).

The bids_query and caps_query functions are removed and each query class is responsible for maintaining its own mapping and formatting the queries in a suitable form for the data grabbers. The function bids_reader expects a BIDSQuery, while caps_reader expects a CAPSQuery and builds the proper CAPSDataGrabber based on the type of the caps query passed.

Example

Here is a small example of manually defining a CAPS file query and using the caps_reader to retrieve the corresponding files (note that we are querying files for 3 different tissue types):

>>> caps_dir = "/Users/nicolas.gensollen/GitRepos/clinica_data_ci/data_ci/PETVolume/in/caps"
>>> from clinica.pydra.query import CAPSFileQuery
>>> query = CAPSFileQuery({"mask_tissues": {"tissue_number": (1, 2, 3), "modulation": False}})
>>> from clinica.pydra.interfaces import caps_reader
>>> from clinica.pydra.engine_utils import run
>>> run(caps_reader(query, caps_dir))
A newer version (0.19) of nipype/pydra is available. You are using 0+untagged.1697.g67f7e72.dirty
221012-15:24:31,306 nipype.workflow INFO:
	 [Node] Setting-up "caps_reader_task" in "/private/var/folders/zk/9_vr9pfn69xcvgyvhkj59lpr000z9t/T/tmpd1a6ykdi/Nipype1Task_a64706badf031ecb0827b5d39c10e3aaff998c27c5008801b8217219147589cf/caps_reader_task".
221012-15:24:31,308 nipype.workflow INFO:
	 [Node] Executing "caps_reader_task" <clinica.pydra.interfaces.CAPSFileDataGrabber>
221012-15:24:38,81 nipype.workflow INFO:
	 [Node] Finished "caps_reader_task", elapsed time 6.708054s.
"Result(output=Inputs(mask_tissues=[['/Users/nicolas.gensollen/GitRepos/clinica_data_ci/data_ci/PETVolume/in/caps/subjects/...

omar-rifai · 2022-10-05T09:34:35Z

Great ! I'll try to test this afternoon and get back to you.

NicolasGensollen · 2022-10-07T08:57:23Z

@omar-rifai After merging #752 and rebasing I realized there were a few small issues due to the fact that the CAPSReader was trying to do too many things (i.e reading both with clinica_file_reader and clinica_group_reader depending on the query).

I propose here to split the CAPSReader in two: the CAPSFileReader which only reads "standard" CAPS files, and the CAPSGroupReader which only reads from groups folders. I also created classes for queries instead of relying on dicts of dicts.

Still unsure this is the best implementation, but I think it makes things a bit clearer. I'm open to suggestions though !

ghisvail · 2022-10-07T08:59:54Z

I propose here to split the CAPSReader in two: the CAPSFileReader which only reads "standard" CAPS files, and the CAPSGroupReader which only reads from groups folders.

Any development that gets us closer to the Single Responsibility Principle is a good thing 👍

NicolasGensollen · 2022-10-11T09:42:10Z

Anyone's up for doing a review ?

ghisvail · 2022-10-11T11:27:34Z

Anyone's up for doing a review ?

I'm on it

clinica/pydra/interfaces.py

ghisvail

I provided a few suggestions where I could. Perhaps, some more explanation in the opening post as to what the situation was, how it was changed and what the benefits are would help.

As it stands right now, I need further guidelines to see where the pieces of improvement are.

I know you added this but it still was not enough for me.

EDIT: reference previous comment

clinica/utils/input_files.py

ghisvail · 2022-10-11T14:21:30Z

clinica/utils/input_files.py

+            raise ValueError(f"Arguments must have the same length.")
+        if len(arg_sizes) == 0:
+            return func(*args, **kwargs)
+        arg_size = arg_sizes[0]
+        new_args = []
+        for arg in args:
+            if not isinstance(arg, Iterable):
+                new_args.append((arg,) * arg_size)
+            else:
+                new_args.append(arg)
+        new_kwargs = [{} for _ in range(arg_size)]
+        for k, arg in kwargs.items():
+            for i in range(len(new_kwargs)):
+                if not isinstance(arg, Iterable):
+                    new_kwargs[i][k] = arg
+                else:
+                    new_kwargs[i][k] = arg[i]
+        if len(new_args) == 0:
+            return [func(**x) for x in new_kwargs]
+        elif len(new_kwargs) == 0:
+            return [func(*x) for x in zip(*new_args)]
+        return [func(*x, **y) for x, y in zip(zip(*new_args), new_kwargs)]


This is painful to read without comments to dissociate the cases covered by this decorator and their intermediate steps.

Yes, I'm not proud of this function and I totally get that it is very painful to read.

I initially thought there was a more elegant/clever way to do this but couldn't find one. If you have suggestions/ideas I'll take them.

To try answering your points, I added a few things in 80b8d88:

Basic unit tests for this function

Examples and basic explanation of why we need this

A few comments to break and explain the chunks of code

Hopefully this is enough to understand what this decorator does and how to use it.

test/unittests/pydra/test_interfaces.py

ghisvail · 2022-10-11T15:18:13Z

I am thinking, perhaps what we are doing with BIDS, CAPS and CAPS group queries is a good use case for using typed dictionaries.

At the end of the day, the current *Query classes only act as proxies to an internal dict without specifying its own encapsulation protocol. Since those classes are used to distinguish different kinds of queries, but we kind of want to keep access to the same dictionary protocol underneath, I think that's exactly what TypedDict + isinstanceof can give us with a clearer intent and limited overhead.

omar-rifai

I kind of agree with Ghislain on the constructor default which could be simplified, but otherwise looks good. Thanks.

clinica/pydra/engine.py

Co-authored-by: Ghislain Vaillant <[email protected]>

NicolasGensollen · 2022-10-12T09:09:02Z

I am thinking, perhaps what we are doing with BIDS, CAPS and CAPS group queries is a good use case for using typed dictionaries.

At the end of the day, the current *Query classes only act as proxies to an internal dict without specifying its own encapsulation protocol. Since those classes are used to distinguish different kinds of queries, but we kind of want to keep access to the same dictionary protocol underneath, I think that's exactly what TypedDict + isinstanceof can give us with a clearer intent and limited overhead.

I see your point but I don't know if we can get the same behavior with TypedDict as the one implemented here. I'll have a closer look.

ghisvail · 2022-10-12T09:28:34Z

Another code smell that makes me think the design is not optimal is that the Liskov substitution principle is broken for CAPSQuery w.r.t. the base Query on the abstract method combine_queries. The function signatures should not mismatch.

ghisvail

Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.

ghisvail

Consensus is to merge this as is, and maybe improve on it later. Removing my request for changes to allow a clean merge.

ghisvail · 2022-10-17T08:36:47Z

Ok GitHub won't remove my request for changes for some reasons, but that's ok 😅

NicolasGensollen marked this pull request as ready for review September 30, 2022 17:21

NicolasGensollen requested a review from omar-rifai September 30, 2022 17:22

Extract relevant work from PETVolume PR

8807b0f

NicolasGensollen force-pushed the caps-reader-enh branch from 9db0ce9 to 8807b0f Compare October 6, 2022 08:33

fix rebase mistake

deb9487

NicolasGensollen marked this pull request as draft October 6, 2022 09:21

NicolasGensollen added 4 commits October 6, 2022 16:16

Refactor - test - unstable

3543b1d

Add some docs

b860657

Add some basic tests

fb44b12

Finish testing query

816bca3

NicolasGensollen marked this pull request as ready for review October 7, 2022 08:48

NicolasGensollen added 3 commits October 7, 2022 11:06

forgot to add error message in caps_reader...

b877ba8

Add unit tests for bids_reader and caps_reader functions

8f35635

Simplify CAPSDataGrabber classes

e729a9d

NicolasGensollen force-pushed the caps-reader-enh branch from 31b53e2 to e729a9d Compare October 7, 2022 14:15

Simplify code

75e3f88

ghisvail self-requested a review October 11, 2022 11:27

ghisvail reviewed Oct 11, 2022

View reviewed changes

clinica/pydra/interfaces.py Outdated Show resolved Hide resolved

ghisvail reviewed Oct 11, 2022

View reviewed changes

clinica/pydra/interfaces.py Outdated Show resolved Hide resolved

ghisvail requested changes Oct 11, 2022

View reviewed changes

ghisvail reviewed Oct 11, 2022

View reviewed changes

test/unittests/pydra/test_interfaces.py Outdated Show resolved Hide resolved

omar-rifai approved these changes Oct 11, 2022

View reviewed changes

clinica/pydra/engine.py Show resolved Hide resolved

NicolasGensollen and others added 2 commits October 12, 2022 09:38

Apply suggestions from code review

8d0c430

Co-authored-by: Ghislain Vaillant <[email protected]>

Dict->dict, List[str]->list

1fcb458

NicolasGensollen added 2 commits October 12, 2022 10:00

Simplify Query constructor

0d26e5b

Add tests and docs for aggregator

80b8d88

NicolasGensollen added 2 commits October 12, 2022 16:00

remove value trait

87d6a7e

fix some design issues

3a9211c

ghisvail reviewed Oct 17, 2022

View reviewed changes

ghisvail self-requested a review October 17, 2022 08:35

ghisvail reviewed Oct 17, 2022

View reviewed changes

NicolasGensollen merged commit a124170 into aramis-lab:dev Oct 17, 2022

NicolasGensollen deleted the caps-reader-enh branch October 17, 2022 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Improve CAPS reader #760

[ENH] Improve CAPS reader #760

NicolasGensollen commented Sep 30, 2022 •

edited

Loading

omar-rifai commented Oct 5, 2022

NicolasGensollen commented Oct 7, 2022

ghisvail commented Oct 7, 2022 •

edited

Loading

NicolasGensollen commented Oct 11, 2022

ghisvail commented Oct 11, 2022

ghisvail left a comment •

edited

Loading

ghisvail Oct 11, 2022

NicolasGensollen Oct 12, 2022

ghisvail commented Oct 11, 2022

omar-rifai left a comment

NicolasGensollen commented Oct 12, 2022

ghisvail commented Oct 12, 2022

ghisvail left a comment

ghisvail left a comment

ghisvail commented Oct 17, 2022

[ENH] Improve CAPS reader #760

[ENH] Improve CAPS reader #760

Conversation

NicolasGensollen commented Sep 30, 2022 • edited Loading

Definition and addition of a new decorator aggregator in clinica.utils.input_files.

Decouple the reading strategies

Example

omar-rifai commented Oct 5, 2022

NicolasGensollen commented Oct 7, 2022

ghisvail commented Oct 7, 2022 • edited Loading

NicolasGensollen commented Oct 11, 2022

ghisvail commented Oct 11, 2022

ghisvail left a comment • edited Loading

Choose a reason for hiding this comment

ghisvail Oct 11, 2022

Choose a reason for hiding this comment

NicolasGensollen Oct 12, 2022

Choose a reason for hiding this comment

ghisvail commented Oct 11, 2022

omar-rifai left a comment

Choose a reason for hiding this comment

NicolasGensollen commented Oct 12, 2022

ghisvail commented Oct 12, 2022

ghisvail left a comment

Choose a reason for hiding this comment

ghisvail left a comment

Choose a reason for hiding this comment

ghisvail commented Oct 17, 2022

NicolasGensollen commented Sep 30, 2022 •

edited

Loading

Definition and addition of a new decorator `aggregator` in `clinica.utils.input_files`.

ghisvail commented Oct 7, 2022 •

edited

Loading

ghisvail left a comment •

edited

Loading