-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rf parallel #20
Rf parallel #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this thoroughly tested? are we getting same or better results then before quicker?
Are the results reproducible, i.e. running the same experiments leads to same results (hint: check for seed in train_random_forest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, fix conflicts
|
||
|
||
def generate_heat_map(df, number_of_features, hits_data, number_of_samples, output_path): | ||
train_data = np.log2(df+1) if hits_data else df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, this doesn't support p-value correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was decided after we showed to the client the logs to leave only for hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this is true (and I'm not sure they wanted it for p-val), the p-val has opposite values, i.e. 0 is "best" value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to properly address p-val.
Note that values are opposite, i.e., value of 0 for number of hits greater than all shuffles.
Regarding log scale, maybe add it as controllable parameter.
If so, do it in another PR
No description provided.