Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Parse ADNI XML metadata #587

Merged
merged 23 commits into from
Mar 30, 2022

Conversation

NicolasGensollen
Copy link
Member

@NicolasGensollen NicolasGensollen commented Mar 1, 2022

This PR continues the work started in #431 for adding metadata from XML files to the output of the ADNI to BIDS convertor. (Note: I made a lot of refactoring so I went for opening a new PR since the diff would be huge anyway...).

  • Refactor adni_json.py: shorter functions (especially parse_xml_file), more explicit variable and function names, docstrings, all functions private except create_json_metadata...
  • Change output column names to be lower case (seems to better match BIDS specs for tabular files)
  • Add JSON file to better describe column names ?
  • Add tests

@pep8speaks
Copy link

pep8speaks commented Mar 1, 2022

Hello @NicolasGensollen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-03-30 08:20:43 UTC

@ghisvail
Copy link
Collaborator

ghisvail commented Mar 1, 2022

Back when the first PR was proposed, I was wondering whether we could leverage the new pandas.read_xml to parse the XML metadata instead of raw xml.etree manipulations.

My thinking being: could we just let pandas do the actual XML parsing and only maintain code that selects and combines the metadata attributes we want? With the hypothesis that this would greatly simplify the resulting code and the future review of this PR.

@omar-rifai
Copy link
Contributor

omar-rifai commented Mar 1, 2022

Thanks @NicolasGensollen for picking this up. Also, for the record, the two main elements that were blocking for merging 431 were (in order of priority):

  • Defining which metadata to include (c.f discussion#293) and where they should be written in the BIDS hierarchy
  • Eventually parallelizing with an appropriate library (not a priority)

@omar-rifai
Copy link
Contributor

omar-rifai commented Mar 8, 2022

@NicolasGensollen, before finalizing can you make sure to add @emaheux to the author co-author of the first commit as he provided the draft code for this. Thanks!

@NicolasGensollen
Copy link
Member Author

@omar-rifai Of course ! I can also add you as the author of another one since I started from your work!

@NicolasGensollen NicolasGensollen force-pushed the parse-adni-xml-metadata branch 2 times, most recently from d7784cb to 8b2147f Compare March 11, 2022 10:14
@omar-rifai
Copy link
Contributor

Hello @NicolasGensollen, let us know if this is finalized so that we can review and merge. Thanks !

@NicolasGensollen NicolasGensollen marked this pull request as ready for review March 23, 2022 15:11
@NicolasGensollen
Copy link
Member Author

Hi @omar-rifai
I believe this is ready for reviews.
I re-ran the converter this afternoon (see folder data_ci_2022_xml, the output is in ref).

Copy link
Contributor

@omar-rifai omar-rifai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks ! I'll merge after the rebase.


.. note::
Use multiprocessing / multithreading for parsing the files?
Not sure we will get a huge performance boost by doing that tbh.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale behind this comment? Threading seems to be designed for that purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we can remove the comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d8b7d59

@omar-rifai
Copy link
Contributor

Probably related to the test itself but I'm getting an TypeError on test_get_existing_scan_dataframe:

 def test_get_existing_scan_dataframe(tmp_path):
        """Test function `_get_existing_scan_dataframe`."""
        from clinica.iotools.converters.adni_to_bids.adni_json import _get_existing_scan_dataframe
        from pandas.testing import assert_frame_equal
        subj_path = tmp_path / "sub-01"
        subj_path.mkdir()
>       with pytest.warns(match="No scan tsv file for subject sub-01 and session foo"):
E       TypeError: warns() missing 1 required positional argument: 'expected_warning'

@omar-rifai
Copy link
Contributor

@NicolasGensollen, can you also please run a make format (again?) so that it passes the lint checks?

@NicolasGensollen
Copy link
Member Author

can you also please run a make format (again?)

Done in 6034162

I'm getting an TypeError on test_get_existing_scan_dataframe

Oops, missed your comment on Friday.
Did you manage to run the tests since then?
Doing pytest -v works on my end.

@omar-rifai
Copy link
Contributor

can you also please run a make format (again?)

Done in 6034162

I'm getting an TypeError on test_get_existing_scan_dataframe

Oops, missed your comment on Friday. Did you manage to run the tests since then? Doing pytest -v works on my end.

Yes seems to work now, thanks! I also fixed some issues with the data_ci and the converters succeed now. However, It seems that there is still an issue with merge_tsv in iotools. Can you check that is everything merged ? I had a similar issue last week from a recent change in dev.

@omar-rifai omar-rifai merged commit 7d9232d into aramis-lab:dev Mar 30, 2022
@NicolasGensollen NicolasGensollen deleted the parse-adni-xml-metadata branch March 30, 2022 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants