Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rf parallel #20

Merged
merged 20 commits into from
May 19, 2021
Merged

Rf parallel #20

merged 20 commits into from
May 19, 2021

Conversation

yael1994
Copy link

No description provided.

@yael1994 yael1994 requested a review from shacharmo March 14, 2021 15:38
@yael1994
Copy link
Author

@yael1994
Copy link
Author

@yael1994
Copy link
Author

Copy link
Member

@shacharmo shacharmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this thoroughly tested? are we getting same or better results then before quicker?
Are the results reproducible, i.e. running the same experiments leads to same results (hint: check for seed in train_random_forest)

model_fitting/random_forest.py Outdated Show resolved Hide resolved
model_fitting/random_forest.py Outdated Show resolved Hide resolved
model_fitting/random_forest.py Outdated Show resolved Hide resolved
Copy link
Member

@shacharmo shacharmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, fix conflicts

IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
model_fitting/random_forest.py Outdated Show resolved Hide resolved


def generate_heat_map(df, number_of_features, hits_data, number_of_samples, output_path):
train_data = np.log2(df+1) if hits_data else df
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, this doesn't support p-value correctly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was decided after we showed to the client the logs to leave only for hits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this is true (and I'm not sure they wanted it for p-val), the p-val has opposite values, i.e. 0 is "best" value

model_fitting/train_random_forest.py Outdated Show resolved Hide resolved
model_fitting/train_random_forest.py Outdated Show resolved Hide resolved
Copy link
Member

@shacharmo shacharmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to properly address p-val.
Note that values are opposite, i.e., value of 0 for number of hits greater than all shuffles.

Regarding log scale, maybe add it as controllable parameter.
If so, do it in another PR

IgOmeProfiling_pipeline.py Outdated Show resolved Hide resolved
model_fitting/random_forest.py Outdated Show resolved Hide resolved
@shacharmo shacharmo merged commit f29afc4 into reads_filtration_change_seq_length May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants