-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding PCA script for dimension reduction of metrics #673
Conversation
Build Failed 💥 |
Build Failed 💥 |
Build Failed 💥 |
Build Failed 💥 |
Build Failed 💥 |
Build Failed 💥 |
Build Failed 💥 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really quick review. LGTM. We would need @mdesco to give it a try and test this as well.
Build passed ! Good Job 🍻 ! |
Build Failed 💥 |
Build passed ! Good Job 🍻 ! |
@gagnonanthony Could you link us some real data we can test for ourselves? |
Yes @frheault, here's a link to the data I used in my project. It's a connectoflow output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a first pass, I think we can talk about a few things in-person, I will be in the lab the whole day.
scripts/scil_compute_pca.py
Outdated
|
||
The script can take directly as input a connectoflow output folder. Simply use the --connectoflow flag. For | ||
other type of folder input, the script expects a single folder containing all matrices for all subjects. Example: | ||
input_folder/sub-01_ad.npy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/scil_compute_pca.py
Outdated
/... | ||
|
||
Output connectivity matrix will be saved next to the other metrics in the input folder. The plots and tables | ||
will be outputted in the designated folder from the <output> argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit how to start interpretation? I ran your test data with the command:
scil_compute_pca.py ./ ../lol --list_ids test.text --metrics ad rd fa md --connectoflow
and my first column is : PC1 0.516503256371792 0.496327739954754 0.473191960436531 0.512808472323706 (in one sentence how do you interpret PCA, you can use chatGPT or stack overflow, but the user needs a starts or a least a link to a good resource for beginner (simpler than the paper, what is PCA)
scripts/scil_compute_pca.py
Outdated
/sub-02_md.npy | ||
/... | ||
|
||
Output connectivity matrix will be saved next to the other metrics in the input folder. The plots and tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another name than connectivity matrix? Since the input is also named connectivity matrices...
# Import required libraries. | ||
import argparse | ||
import logging | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the built-in from third-party and scilpy should be alone in a third block
scripts/scil_compute_pca.py
Outdated
help='Path to the input folder.') | ||
p.add_argument('out_folder', | ||
help='Path to the output folder to export graphs and tables. \n' | ||
'*** Please note, PC connectivity matrix will be outputted in the original input folder' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that, why not save this in the output?
scripts/scil_compute_pca.py
Outdated
'*** Please note, PC connectivity matrix will be outputted in the original input folder' | ||
'next to all other metrics ***') | ||
p.add_argument('--metrics', nargs='+', required=True, | ||
help='List of all metrics to include in PCA analysis.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you specify these are expected to be suffixes and the extension must be immediately following and be .npy
scripts/scil_compute_pca.py
Outdated
p.add_argument('--metrics', nargs='+', required=True, | ||
help='List of all metrics to include in PCA analysis.') | ||
p.add_argument('--list_ids', required=True, | ||
help='List containing all ids to use in PCA computation.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unclear, this is not a like --metrics is a list, it is a file containing a list.
Adding a metavar argument like metavar=FILE, could help.
I think the help should say path to a file containing a list of all ids is also crucial.
scripts/scil_compute_pca.py
Outdated
help='List of all metrics to include in PCA analysis.') | ||
p.add_argument('--list_ids', required=True, | ||
help='List containing all ids to use in PCA computation.') | ||
p.add_argument('--common', choices=['true', 'false'], default='true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, you should use action=store_true, which store directly true or false without having to type it.
Renaming it to --only_common would be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to --not_only_common with action=store_true since I believe the common option should be the default one.
scripts/scil_compute_pca.py
Outdated
d = {f'{m}': [load_matrix_in_any_format(f'{args.in_folder}/{a}_{m}.npy') for a in subjects] | ||
for m in args.metrics} | ||
# Assert that all metrics have the same number of subjects. | ||
nb_sub = [len(d[f'{m}']) for m in args.metrics] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think the whole f'{m}' is required since m is already a string
@frheault, I updated the script according to your comments. Should be better now! |
Build passed ! Good Job 🍻 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnaudbore this answered my comments
No description provided.