Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding PCA script for dimension reduction of metrics #673

Merged
merged 22 commits into from
Mar 7, 2023

Conversation

gagnonanthony
Copy link
Contributor

No description provided.

@arnaudbore arnaudbore self-requested a review February 2, 2023 21:32
@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

Copy link
Contributor

@arnaudbore arnaudbore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really quick review. LGTM. We would need @mdesco to give it a try and test this as well.

scripts/scil_compute_pca.py Outdated Show resolved Hide resolved
scripts/scil_compute_pca.py Outdated Show resolved Hide resolved
scripts/scil_compute_pca.py Outdated Show resolved Hide resolved
@arnaudbore arnaudbore changed the title [WIP] Adding PCA script for dimension reduction of metrics Adding PCA script for dimension reduction of metrics Mar 1, 2023
@arnaudbore arnaudbore requested review from mdesco and arnaudbore March 1, 2023 21:46
@arnaudbore
Copy link
Contributor

Build passed ! Good Job 🍻 !

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

@arnaudbore
Copy link
Contributor

Build passed ! Good Job 🍻 !

@frheault
Copy link
Member

frheault commented Mar 6, 2023

@gagnonanthony Could you link us some real data we can test for ourselves?
(Like what you are using this script with in your project?)

@gagnonanthony
Copy link
Contributor Author

Yes @frheault, here's a link to the data I used in my project. It's a connectoflow output.

https://usherbrooke-my.sharepoint.com/:u:/g/personal/gaga3016_usherbrooke_ca/EU-469zr-CdKq0_G1elZkCcBUxU1TtFALgqgYCGZICD3sA?e=ySzPpQ

Copy link
Member

@frheault frheault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a first pass, I think we can talk about a few things in-person, I will be in the lab the whole day.


The script can take directly as input a connectoflow output folder. Simply use the --connectoflow flag. For
other type of folder input, the script expects a single folder containing all matrices for all subjects. Example:
input_folder/sub-01_ad.npy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use this structure?
image

/...

Output connectivity matrix will be saved next to the other metrics in the input folder. The plots and tables
will be outputted in the designated folder from the <output> argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit how to start interpretation? I ran your test data with the command:
scil_compute_pca.py ./ ../lol --list_ids test.text --metrics ad rd fa md --connectoflow

and my first column is : PC1 0.516503256371792 0.496327739954754 0.473191960436531 0.512808472323706 (in one sentence how do you interpret PCA, you can use chatGPT or stack overflow, but the user needs a starts or a least a link to a good resource for beginner (simpler than the paper, what is PCA)

Also, can you fix this kind of display overlap ?
image
image

/sub-02_md.npy
/...

Output connectivity matrix will be saved next to the other metrics in the input folder. The plots and tables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another name than connectivity matrix? Since the input is also named connectivity matrices...

# Import required libraries.
import argparse
import logging
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the built-in from third-party and scilpy should be alone in a third block

help='Path to the input folder.')
p.add_argument('out_folder',
help='Path to the output folder to export graphs and tables. \n'
'*** Please note, PC connectivity matrix will be outputted in the original input folder'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that, why not save this in the output?

'*** Please note, PC connectivity matrix will be outputted in the original input folder'
'next to all other metrics ***')
p.add_argument('--metrics', nargs='+', required=True,
help='List of all metrics to include in PCA analysis.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you specify these are expected to be suffixes and the extension must be immediately following and be .npy

p.add_argument('--metrics', nargs='+', required=True,
help='List of all metrics to include in PCA analysis.')
p.add_argument('--list_ids', required=True,
help='List containing all ids to use in PCA computation.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear, this is not a like --metrics is a list, it is a file containing a list.

Adding a metavar argument like metavar=FILE, could help.
I think the help should say path to a file containing a list of all ids is also crucial.

help='List of all metrics to include in PCA analysis.')
p.add_argument('--list_ids', required=True,
help='List containing all ids to use in PCA computation.')
p.add_argument('--common', choices=['true', 'false'], default='true',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, you should use action=store_true, which store directly true or false without having to type it.

Renaming it to --only_common would be clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to --not_only_common with action=store_true since I believe the common option should be the default one.

d = {f'{m}': [load_matrix_in_any_format(f'{args.in_folder}/{a}_{m}.npy') for a in subjects]
for m in args.metrics}
# Assert that all metrics have the same number of subjects.
nb_sub = [len(d[f'{m}']) for m in args.metrics]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think the whole f'{m}' is required since m is already a string

@gagnonanthony
Copy link
Contributor Author

@frheault, I updated the script according to your comments. Should be better now!

@arnaudbore
Copy link
Contributor

Build passed ! Good Job 🍻 !

Copy link
Member

@frheault frheault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnaudbore this answered my comments

@arnaudbore arnaudbore merged commit 0179bc0 into scilus:master Mar 7, 2023
@gagnonanthony gagnonanthony deleted the pcadwi branch March 8, 2023 03:07
@gagnonanthony gagnonanthony restored the pcadwi branch March 8, 2023 03:07
@gagnonanthony gagnonanthony deleted the pcadwi branch March 8, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants