Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor documentation changes #19

Open
topepo opened this issue Mar 25, 2022 · 4 comments
Open

Minor documentation changes #19

topepo opened this issue Mar 25, 2022 · 4 comments

Comments

@topepo
Copy link

topepo commented Mar 25, 2022

The package looks good. Although I'm not a fan of automated outlier removals, I like what you've done.

In the docs, could you:

  • use bake(rec, new_data = NULL) instead of juice()
  • have steps use selectors for type and role? So all_numeric_predictors() instead of all_numeric(), -all_outcomes()
  • have steps do some sort of normalization procedures before outlier detection (e.g. maybe Yeo-Johnson)? I feel that most identified outliers are just the edge of healthy skewed distributions.
@brunocarlin
Copy link
Owner

Point 1:
Ok, I can do that this weekend.

Point 2:
Sure no problem will be done this weekend.

Point 3:
Are you suggesting changing the examples to implement some form of normalization already present on recipes or implementing some form of normalization on the methods themselves?

If it is just the example
Yes, I feel that using all examples with mtcars was not the best idea... Can you please suggest a dataset for the examples?

If it is the methods
I can add the normalization process to all the outlier detection algorithms that lack it, keep in mind that some methods already implement some form of normalization inside.

@brunocarlin
Copy link
Owner

About the automatic outlier detection, I would market it as metric-based outlier removal... Because the idea is that you test it just like the rest of the pipeline on workflows, so you can try a bunch of outliers methods and their combination and see if it improves the metric on the test set... But that is not even close to the common usage relying on statistics or domain-based reasons for the decision on whether to implement some form of outlier removal, I guess some documentation to make that point clear could help. I was trying to be greedy and keep both aspects on a single package.

@brunocarlin
Copy link
Owner

brunocarlin commented Apr 5, 2022

@topepo as soon as #20 finishes it's ci/cd I will be merging it, I am still not sure on the best way to proceed on your third point

done

@brunocarlin
Copy link
Owner

@topepo Only place with juice is now on tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants