-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[waiting for author] Add stubs for the statistics module #546
Conversation
Since typeshed currently does not have working stubs for the fractions module, this commit removes all references to Fraction within the statistics module so that this pull request can be evaluated and accepted independently of the other pull requests I've made recently to improve various numbers related modules.
_TNum = TypeVar('_TNum', int, float, Decimal) | ||
_T = TypeVar('_T') | ||
|
||
def mean(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> _TNum: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that this makes mypy think that the return type for mean([1, 2, 4])
is an int. It actually returns a float. Similarly, it allows mixing ints with Decimal (but not float with Decimal). I think the only way to express this reasonably may be a complex set of overloads...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hmm -- when I did mean([1, 2, 3])
, I got back an int, but after some poking, that turned out to be only because 1 + 2 + 3 is actually evenly divisible by 3.
Perhaps I could get rid of int within the TypeVar
and rely on mypy to understand that ints should be auto-promoted to float
?
Regarding mixing ints with Decimals/mixing floats with Decimals: the documentation for statistics stated that mixing together different types is "undefined and implementation-dependent", and even recommends running map(float, input_data)
first if you have mixed numeric types.
So, if mixing together things like Decimal and int is undefined, then I think it's ok and actually for the best if typeshed forbids that altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the worry is mostly about float vs. Decimal, where the user has to
decide. int can be mixed with each of the others. But probably we can start
by making the type variable allow float and Decimal only.
On Wed, Sep 14, 2016 at 8:36 AM, Michael Lee [email protected]
wrote:
In stdlib/3.4/statistics.pyi
#546 (comment):+from typing import Iterable, Iterator, Sequence, TypeVar, Union, overload
+from decimal import Decimal
+
+# Note: according to the docs, statistics only explicitly supports
+# int, float, Decimal, and Fraction. Other types within the numeric
+# tower are not supported. It also states mixing together different
+# types results in undefined behavior, so the type signatures below
+# deliberately enforce that numeric types may not be mixed.
+#
+# TODO: Once either #545 or
+# #94 is accepted, add support
+# for fractions.
+_TNum = TypeVar('_TNum', int, float, Decimal)
+_T = TypeVar('_T')
+
+def mean(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> _TNum: ...Oh, hmm -- when I did mean([1, 2, 3]), I got back an int, but after some
poking, that turned out to be only because 1 + 2 + 3 is actually evenly
divisible by 3.Perhaps I could get rid of int within the TypeVar and rely on mypy to
understand that ints should be auto-promoted to float?Regarding mixing ints with Decimals/mixing floats with Decimals: the documentation
for statistics https://docs.python.org/3/library/statistics.html stated
that mixing together different types is "undefined and
implementation-dependent", and even recommends running map(float,
input_data) first if you have mixed numeric types.So, if mixing together things like Decimal and int is undefined, then I
think it's ok and actually for the best if typeshed forbids that altogether?—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/python/typeshed/pull/546/files/98161406c69d8f31f4b848dffb889c04a5e49098#r78774231,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACwrMiPTW4ogrthvoNIOaKJ3XE9k9XwIks5qqBSTgaJpZM4J8bk1
.
--Guido van Rossum (python.org/~guido)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is that the statistics functions should accept any of the following:
- just ints, just floats, just Decimal, or just Fraction
- mixed int + float, mixed int + Decimal, or mixed int + Fraction
- but not mixed float + Decimal
I'm on the fence about mixing float + Fraction -- that likely works due to the automatic coercion of Fraction to float, but I'm not confident enough to rely on that at the moment. Hence that's currently officially unsupported, even if it happens to work.
If you think the docs need improving to make that more clear, then let me know.
As far as mean() is concerned, mean(ints) may return either an int or a float depending on whether the sum divides evenly by the number of items. E.g mean([1, 2]) is 1.5 but mean([1, 3]) is 2 (an int). Does that complicate the type-checking too much?
I don't know this module very well, I've asked its author to help with the review. |
def pstdev(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> Union[float, Decimal]: ... | ||
def stdev(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> Union[float, Decimal]: ... | ||
def pvariance(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> _TNum: ... | ||
def variance(data: Union[Iterator[_TNum], Sequence[_TNum]]) -> _TNum: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signatures for the four functions pvariance, variance, pstdev and stdev are incomplete. They take an optional argument representing the mean of the data. I think something like this should be right:
def pstdev(data: Union[Iterator[_TNum], Sequence[_TNum]],
mu: Optional[_TNum]) -> Union[float, Decimal]: ...
def stdev(data: Union[Iterator[_TNum], Sequence[_TNum]],
xbar: Optional[_TNum]) -> Union[float, Decimal]: ...
def pvariance(data: Union[Iterator[_TNum], Sequence[_TNum]]
mu: Optional[_TNum]) -> _TNum: ...
def variance(data: Union[Iterator[_TNum], Sequence[_TNum]]
xbar: Optional[_TNum]) -> _TNum: ...
The difference in parameter name is intentional: mu is the mean of the entire population, and xbar is the mean of the sample.
# float as an implementation detail. In the interests of not breaking code | ||
# that relies on this implementation detail, the return type is set to the | ||
# Union of float and the contained numeric type. | ||
def median(data: Iterable[_TNum]) -> _TNum: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the return type of median is right. median() may return the same type as the data _TNum, but if it does an averaging step, it may return a float. For instance:
>>> statistics.median([2, 4])
3.0
>>> statistics.median([2, 4, 6])
4
I'm not sure how to write that as a type hint. Would that be Union[_TNum, float]
?
median_low and median_high look correct, as they always return one of the data points and don't interpolate or average between them.
@Michael0x2a Do you still care about this? |
Is there anything further I should be doing to help with this?
|
Now that this PR is about 8 weeks without a response I'm going to close it, sorry. @Michael0x2a if you want to work on this again please just open a new PR! Or if @stevendaprano wants to work on it feel free to copy form Michael's fork and we'll take it from there. |
This pull request adds stubs for the statistics module, including the
geometric_mean
andharmonic_mean
functions that will be added to the upcoming Python 3.6 release.In order to reduce the number of interconnected pull requests I'm making, I've deliberately omitted using the
fractions
module in any way so that this pull request can be evaluated and merged independently of my other pending pull requests.Once either #544 or #94 is merged, I'll submit a new pull request to add back support for
fractions.Fraction
to the statistics module. (The former requires #545 to first be accepted, and the latter appear to be indefinitely stuck until issue python/mypy#1264 is resolved).