Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Quantity into scalar and sequence classes #764

Closed
wants to merge 47 commits into from

Conversation

andrewgsavage
Copy link
Collaborator

No description provided.

@andrewgsavage
Copy link
Collaborator Author

So that's with BaseQuantity having the methods/attributes shared across sequence and scalar, with a mixin (is that the right name?) having the sequence specific ones.

Quantity inherits from BaseQuantity and returns either a QuantityScalar or QuantitySequence when initiated, depending on the value.

QuantitySequence and QuantityScalar both inherit from Quantity, with QuantitySequence also inheriting the QuantitySequenceMixin

@hgrecco
Copy link
Owner

hgrecco commented Feb 13, 2019

Great start. I do have a lot of questions (I put them within the code).

@@ -78,7 +78,7 @@ def wrapped(self, *args, **kwargs):


@fix_str_conversions
class _Quantity(PrettyIPython, SharedRegistryObject):
class BaseQuantity(PrettyIPython, SharedRegistryObject):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this classe "public" (i.e. no underscore)? Is this the class users should check against when using isinstance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn't thought of this when going through it!

It should be used when the user does know the UnitRegistry. In the current version users should be checking with _Quantity (as done in pint-pandas)
Using isinstance with a Quantity from one UnitRegistry and another UnitRegistry yields false:
'''
ureg=pint.UnitRegistry()

ureg2 = pint.UnitRegistry()

isinstance(my_array,ureg2.Quantity)
False
'''
Should this be public?


def build_quantity_class(registry, force_ndarray=False):

class Quantity(BaseQuantity):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this is class to be used when calling isinstance

@hgrecco
Copy link
Owner

hgrecco commented Feb 13, 2019

  1. Shall we split format between scalar and array code here
  2. Shall we change all the inplace code to be really in place for the array as well (e.g. here
  3. Is all these needed for scalar here? I guess yes because you can still do numpy.cos(34). Is there another, cleaner, way?
  4. Can we move fill, clip, etc to QuantitySequence?
  5. Will this be reached at all? code
  6. Is __iter__ the only thing to test for? code

I guess that all the questions point to the same thing: How much can we simplify the code and making it more maintainable with this change?

@andrewgsavage
Copy link
Collaborator Author

So this was very much a 'get it behaving as you expect' before spending time on those more specific things - I was just moving lines around before spending time working out what paths change/need changing.

1, 2, If the behavior for ndarrays is what's expected from a sequence then yes. I guess this applies everywhere ndarrays are treated differently.

  1. Haven't looked at this much. I think array_function may serve us better for some functions, eg add, but won't be all that different when units aren't relevant

4.Yea, missed them - are there any others?

  1. Haha no.

  2. The main thing that slips through that is strings, which are handled previously.

There should be little difference with this change, from here onwards I can only see moving things around / removing checks like 2.

@hgrecco
Copy link
Owner

hgrecco commented Feb 13, 2019

And how about the connection to pandas ... will this make pint-pandas easier to implement?

@andrewgsavage
Copy link
Collaborator Author

Came across this when looking for numpy functions
#484

y = np.arange(0, 1 * ureg.meter, 0.1 * ureg.meter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

The main reason I'd like this was because of that error message - I hit it many times in pint-pandas and spent a while working around it, so it'll make pint-pandas easier, along with checking for scalar/sequence.
Although I note at present to do means comparing to QuantitySequenceMixin, which I guess needs renaming if it's intended for that use..

Back to that error message - I think it's because somewhere in numpy there's a check for iter or getitem, which makes it think it's a sequence and take a wrong path. Splitting into QuantityScalar fails that check so takes a different path:

y = np.arange(0, 1 * ureg.meter, 0.1 * ureg.meter)
y
array([0, <Quantity(0.1, 'meter')>, <Quantity(0.2, 'meter')>,
       <Quantity(0.30000000000000004, 'meter')>, <Quantity(0.4, 'meter')>,
       <Quantity(0.5, 'meter')>, <Quantity(0.6, 'meter')>,
       <Quantity(0.7, 'meter')>, <Quantity(0.7999999999999999, 'meter')>,
       <Quantity(0.8999999999999999, 'meter')>], dtype=object)

I'm surprised that works with 0 and not 0*ureg.m

@andrewgsavage
Copy link
Collaborator Author

So that's the tests passing again, except the uncertainties ones. What's the plan there - are they being split into another module?

I'm in two minds as to how to go forward with __array_function. Functions like arrange
numpy.arange([start, ]stop, [step, ]dtype=None)
look tricky to implement quantity conversions/checks accounting for the various optional args/ optional first arg.

Alternatively we could convert only the args that are quantities to consistent/root units. This would make it easy to implement many functions https://github.com/hgrecco/pint/pull/764/files#diff-d9924213798d0fc092b8cff13928d747R115 but prevents function specific checks.

The other thing I've noticed is the double underscored attributes, eg __used have become _BaseQuantity__used with the inheritance. Not sure if this is an issue.

@andrewgsavage
Copy link
Collaborator Author

andrewgsavage commented Sep 3, 2019

Overall, I would think maintaining version and configuration compatibility is worth the complexity of supporting the differing interfaces, especially since that's what other libraries in the community seem to be doing.

Are there any libraries that see such a change between supporting/not supporting __array_function__, where users will have to change their code?

This is looking really good. Maybe we can try to define a roadmap and put a date for a release.

Assuming people are happy to only support numpy >=1.16 in the future, I'd suggest the following roadmap:

0.10 - Split Quantity into scalar and sequence classes (would split this PR into one splitting classes and another implementing __array_function__. )
0.11 - Support __array_function__, extas_require numpy>=1.16
We could backport changes to 0.10.x if necessary to maintain support for numpy <1.16

@andrewgsavage
Copy link
Collaborator Author

Based on tests in pydata/xarray#3238, it looks like there is inconsistent behavior in pint between numpy functions that go through __array_function__ and those that go through __array_ufunc__, especially with regards to non-explicitly-handled functions. In that case, the __array_function__ implementation returns NotImplemented, whereas the __array_ufunc__ implementation returns a result but strips units without a warning.

Should __array_ufunc__ be brought into consistency with how __array_function__ is implemented here by only operating on explicitly-handled ufuncs?

Maybe. This will break many functions that work but aren't explicitly handled. Are people likely to submit PRs to handle these?

@jthielen
Copy link
Contributor

jthielen commented Sep 3, 2019

Are there any libraries that see such a change between supporting/not supporting __array_function__, where users will have to change their code?

I think it depends on how much you consider "such a change." With any numpy function that internally calls np.asarray() or np.asanyarray(), an overriding __array_function__ implementation that preserves non-subclass type will be a breaking change in any library supporting __array_function__. Where pint may be different from other non-ndarray-subclass implementers I've seen (dask, sparse, cupy, etc.) is that casting to a plain ndarray and reconstructing afterwards is not an expensive operation, so downstream users are more likely to try working around numpy functions that cast to ndarray by assuming the casting occurs and correcting afterwards, rather than implementing a custom function (this is what MetPy assumed and is now trying to fix, xref Unidata/MetPy#1144).

Assuming people are happy to only support numpy >=1.16 in the future, I'd suggest the following roadmap:

0.10 - Split Quantity into scalar and sequence classes (would split this PR into one splitting classes and another implementing __array_function__. )
0.11 - Support __array_function__, extas_require numpy>=1.16
We could backport changes to 0.10.x if necessary to maintain support for numpy <1.16

Now that you mention it, I like the idea of splitting this PR into one splitting classes and another implementing __array_function__. Right now all the complications relating to __array_function__ do seem to be derailing and holding back the original titled focus of the PR 😁. Let me know if you'd like me to submit the new __array_function__ PR based on my changes in andrewgsavage#6 (which I can close) after you remove the __array_function__ implementation here.

My one concern with putting off __array_function__ until v0.11 is the delay in the __array_function__ implementation making a release. Based on engagement with pydata/xarray#525, #849, and related issues, it seems people are anxious to see this in action (I know I'm definitely one of them). For example, it seems like this would really help MetPy for its upcoming v1.0 release, which is scheduled to be out by the end of 2019. So, if v0.11 is going to be the target release for __array_function__ support, would it be okay to commit to it being released on a relatively short timeline (within the next couple months)? I'd be glad to do what I can to help make it happen.

Otherwise, this roadmap makes sense to me!

Maybe. This will break many functions that work but aren't explicitly handled. Are people likely to submit PRs to handle these?

I would think this will break non-explicitly-handled ufuncs just like the __array_function__ implementation breaks non-explicitly-handled functions. I believe it's crucial that the behavior be consistent for the sake of downstream libraries/users (xref pydata/xarray#3238). What that consistent behavior is (fail if not implemented or strip units and raise UnitStrippedWarning), however, I could be convinced either way. As far as PRs to handle these, I think it is the same situation between numpy array ufuncs and functions, and I don't think it would be difficult for people to submit implementations where any are missing.

@keewis
Copy link
Contributor

keewis commented Oct 30, 2019

what is the status on this? Now that both pydata/xarray#3238 and pydata/xarray#3447 are either merged or close to being merged, the support for __array_function__ in pint is one of the major blocks to getting pint to work with xarray.

I would be interested in and/or willing to help with getting this moving again or starting a new PR that only works on __array_function__ support.

@hgrecco
Copy link
Owner

hgrecco commented Oct 30, 2019

If all the interested parts are sattisfied with the currenbt solution, I think we should merge it. (after fixing the pending conflict)

@jthielen
Copy link
Contributor

There are a couple reasons I think this shouldn't be merged yet:

Instead, I'd be in favor of taking the __array_function__ implementation here and in andrewgsavage#6 and putting in a new PR separate from the Quantity class split. However, to avoid merge conflicts, I've been waiting on #880 to be merged into master before finalizing my implementation of it and submitting that PR.

@shoyer
Copy link

shoyer commented Oct 30, 2019 via email

@hgrecco
Copy link
Owner

hgrecco commented Oct 30, 2019

I just made my comments on #880 Short version, I am happy to merge it after a minor change.

@jthielen
Copy link
Contributor

jthielen commented Oct 30, 2019

In case it is useful to anyone at this point, a branch I have with a rough implementation of __array_function__ based on #880 and what was done here is up at https://github.com/jthielen/pint/tree/nep18-1.

Current To-Do List:

@keewis
Copy link
Contributor

keewis commented Nov 6, 2019

given that #880 has been merged, how do we want to continue with this? Would it be better to wait on a decision on #753 and #875 or should we go ahead and implement support for __array_function__ and worry about compatibility with QuantityScalar / QuantitySequence afterwards should that become necessary? If implementing directly is the consent: @jthielen, do you have the time to convert your branch to a PR?

@jthielen
Copy link
Contributor

jthielen commented Nov 6, 2019

@keewis My apologies for the delays this past week! I'd like to submit the branch I have as a PR, I just haven't had the time to complete the remaining checklist items yet. I should though have the time in the next few days (just probably not today).

@keewis
Copy link
Contributor

keewis commented Nov 6, 2019

I don't think you have to have completed those items when you create the PR (but good to hear you are working on this)

@jthielen
Copy link
Contributor

A status update: unfortunately, the current implementation of this PR (with implement_func) has not been amendable to handling mixed argument types, so I am still working on re-implementing alternatives that can be more signature-aware. So, it is probably a good thing I didn't submit the PR with what I had, since much of it is still in flux. I've also been finding small errors in function behavior that I've been slowly correcting along the way as well.

Also, a couple side questions:

  • While I've been making do without it by writing many unique @implements implementations, it would be nice to try using inspect.signature to reduce repeated code that only differs by the signature. Would this be enough justification to drop Python 2.7 now, given that it still hasn't been dropped in master yet?
  • Some of the NumPy functions of interest have signatures that vary across versions, so before I get too deep into ensuring compatibility for those, what should pint target as its minimum supported NumPy version? I would think following NEP-29 would be best (so, v1.14 until January 7, 2020), but I wanted to ask before jumping to any conclusions.

@keewis
Copy link
Contributor

keewis commented Nov 27, 2019

@jthielen, I tried to use your branch to run the tests from xarray and ran into some issues (among others something about having "dtype" as dtype or unexpected kwargs). What do you think about opening a PR using the current state so we can discuss these and other issues? I think it doesn't matter if you decide to iterate on the code design, but it would allow others to provide feedback on the choices (which could potentially save you some effort). I'd also like to help with investigating and squashing bugs, but right now I'm not sure where to suggest changes (should I create PRs against your branch?).

Considering official python 2.7 support expires in a little more than a month, I'd say let's drop it, at least on master (but that's only my personal opinion). @hgrecco, what do you think?

@jthielen jthielen mentioned this pull request Nov 29, 2019
@jthielen
Copy link
Contributor

@keewis I'm sorry for the continued delays! I've opened #905 as a draft PR with my current status and some notes about what I see as remaining high-level tasks for it. I really need to be making progress on it (if there is still to be hope of having this in a Pint release by the end of the year), but I'm not sure when I'll get the time needed with all my other end-of-semester tasks...hopefully soon though.

As far as suggesting changes, in absence of a better system, I think PRs to my branch would be best for large changes, whereas suggested changes in a review to #905 would be best for small changes?

@hgrecco
Copy link
Owner

hgrecco commented Dec 2, 2019

See #906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants