-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add MAPE to regression metrics (fixes #691) #822
Conversation
tests/metrics/test_regression.py
Outdated
@@ -7,12 +7,13 @@ | |||
import dask_ml.metrics | |||
|
|||
|
|||
@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "r2_score"]) | |||
@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"]) | |
@pytest.fixture( | |
params=[ | |
"mean_squared_error", | |
"mean_absolute_error", | |
"mean_absolute_percentage_error", | |
"r2_score", | |
] | |
) |
Looks like black==19.10b0
isn't happy about the line length here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn
counterpart, just as it's done for some of the other metrics a bit further down in the same test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like black==19.10b0 isn't happy about the line length here.
Ah ok. If dask-ml
has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable, since
Line 29 in f5e5bb4
"black", |
black
.
Once i switched to the conda instructions there, I got the expected diff. Updated in 1142fcc.
Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.
Can you clarify what you want me to change? As far as I can tell, that is exactly what happens by adding mean_squared_percentage_error
to the metric_pairs
fixture. Every metric in that fixture is tested against its scikit-learn equivalent by
dask-ml/tests/metrics/test_regression.py
Line 37 in f5e5bb4
assert abs(result - expected) < 1e-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable
You're absolutely right! I've got a PR over at #813 waiting to be reviewed (for a couple of weeks now), and subsequently merged. It should improve the static-checking situation.
Every metric in that fixture is tested against its scikit-learn equivalent by
Indeed - ignore me about this one, please! I got confused that we should probably further introduce extra tests like the test_mean_squared_log_error
one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll bring up the question of whether the setup.py
versions of the linters should be pinned, too, in #813.
I'm grateful for the I just pushed b05213b to attempt to address it. Basically, the use of
|
Looks good, thanks! |
This PR proposes adding
mean_absolute_percentage_error()
("MAPE"), as originally suggested in #691.It follows the implementation from
scikit-learn
(https://github.com/scikit-learn/scikit-learn/blob/9cfacf1540a991461b91617c779c69753a1ee4c0/sklearn/metrics/_regression.py#L280), including the use ofnp.finfo(np.float64).eps
in the denominator to prevent divide-by-0 errors.Notes for reviewers
This PR adds a bit of test coverage by adding
mean_absolute_percentage_error()
to themetric_pairs
fixture in tests. It would automatically get more specific coverage (like for combinations ofmultioutput
andcompute
) if #820 is accepted.Thanks for your time and consideration.