Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Modifying base to play nicely with physical units #15394

Closed
wants to merge 2 commits into from

Conversation

ajkeller34
Copy link
Contributor

I'm interested and personally invested in getting Julia to support physical units. There are some places in base that could be modified to suppress the numerous override warnings that appear when using my physical unit package for Julia 0.5.0-dev, Unitful.jl. Advertisements aside, I believe some changes to base are going to be necessary for any units package that fully supports Ranges (UnitRange, FloatRange, etc.) to have a chance of integrating with Julia without emitting warnings. Here's a TL;DR version: one(x) is not logically the same as oftype(x,1) when working with quantities that have units (one(x) is defined as the multiplicative identity, which should not have units!), and sometimes we need to strip units for certain numerical operations and comparisons. This is especially obvious in range.jl.

Encouraged by @StefanKarpinski in this julia-users thread, here are some changes to base I'd appreciate having comments on. The change to complex.jl is not as important as the changes to number.jl and range.jl. I built Julia using a few-day-old version of master and find no test regressions with my modifications. I could use some help with benchmarking, however. This is my first PR for Julia and while I have read the guidelines for contributions, please advise if I should be doing something differently.

Thanks!

@StefanKarpinski
Copy link
Member

cc @Keno – who has some experience with units, as the author of SIUnits.

@andreasnoack
Copy link
Member

I'm a bit worried that all number definitions would need to consider units explicitly. It seems wrong that we cannot handle it implicitly but since you have opened this PR you have probably had some issues.

I'd like to see some of the specific issues that you have had with the existing definition in order to understand how long we can get in making the base code generic instead of explicitly handling units. (It wasn't obvious to me from the julia-users conversation.) You are mentioning one, but why would that have to be handled by modifying base instead of just having a special one method for your unit type?

@@ -62,21 +62,29 @@ complex(z::Complex) = z
flipsign(x::Complex, y::Real) = ifelse(signbit(y), -x, x)

function show(io::IO, z::Complex)
r, i = reim(z)
if unitless(z) != z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case here is to print Complex{Unit} differently? Why not just define a method show{T<:Unit}(io::IO, z::Complex{T}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course you're right, thanks for protecting me from myself...

@JeffBezanson
Copy link
Member

My initial impression is that calling unitless in so many places is not going to be scalable --- will every library that handles numbers need to do the same thing?

Also, a different point, but I'm not sure it makes sense to allow converting numbers to units. It seems you should need n * Meter instead of oftype(1Meter, n).

@ajkeller34
Copy link
Contributor Author

@JeffBezanson Thank you for taking a look. Some replies to your concerns:

  1. Regarding scalability, do you mean that it would be a lot of work to put unitless in the appropriate places? Or, do you mean that this approach would degrade performance appreciably? I can understand that people who don't care about units would not want to take the additional effort to write code that is logically sound when units are considered, but that's their call... I hope base can provide some minimally invasive support for units at least.

    It's not the case that every method that handles numbers will need to call unitless. This would be crazy, and indeed many methods will work fine as is. Methods that are logically unsound when units are considered will of course fail. The problems I've encountered are mainly when:

    • A number appears explicitly (like x + 1, which is arguably bad if x has units.)
    • Quantities are compared after multiplication or division (obviously x*y will have different units than either x or y, so we cannot compare x*y and x, for example.)
  2. I see no problem with allowing conversions between numbers and unitful quantities, though I agree that a user would want to call n * Meter in the REPL. I think it could be syntactically convenient to construct new unitful quantities with the same units as existing unitful quantities, without explicitly digging into the type signatures to find out what the units were. That's what I'm doing with oftype here.

    On the other hand, I do think that automatic promotion is dangerous. Adding 1Meter + 1 instead of 1Meter + 1Meter is inconsistent, and I would rather that such promotion fail. Also, conversion between units with different dimensions (e.g. length and mass) should certainly fail.

@@ -124,16 +125,16 @@ function rat(x)
a, c = f*a + c, a
b, d = f*b + d, b
max(abs(a),abs(b)) <= convert(Int,m) || return c, d
oftype(x,a)/oftype(x,b) == x && break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use oftype(x,a)/oftype(x,b) == x/oftype(x, 1) instead of unitless?

@mbauman
Copy link
Member

mbauman commented Mar 8, 2016

It'd be really great to collect some use-cases where we need the meaning of unit(x) = oftype(x, 1). I've run into this with Gadfly and SIUnits (GiovineItalia/Gadfly.jl#655 and GiovineItalia/Gadfly.jl@1cb02d2), and it'd be nice to be able to write something like #15021, but that definition isn't the multiplicative identity, either.

Edit: It'd be interesting to see what breaks if we change Dates to comply with the multiplicative-identity meaning of one, too: https://github.com/JuliaLang/julia/blob/master/base/dates/periods.jl#L28

@@ -82,7 +82,8 @@ colon(a::Real, b::Real) = colon(promote(a,b)...)

colon{T<:Real}(start::T, stop::T) = UnitRange{T}(start, stop)

range(a::Real, len::Integer) = UnitRange{typeof(a)}(a, oftype(a, a+len-1))
range(a::Real, len::Integer) =
UnitRange{typeof(a)}(a, oftype(a, oftype(a, unitless(a)+len-1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe UnitRange{typeof(a)}(a, a + oftype(a, len-1)))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is certainly better.

@mbauman
Copy link
Member

mbauman commented Mar 8, 2016

This is a really great PR, @ajkeller34. It's great to see you pushing Base in this direction. I think, though, that almost all of it can be accomplished without unitless or unit. In general, it seems like it's easier to keep the computations in the domain of the input instead of trying to strip units.

@StefanKarpinski
Copy link
Member

@mbauman – that's a really good point. I agree that making more operations work for quantities with units seems to be the less troublesome route here. I think that also dovetails with @JeffBezanson's comment about scaling. The scalability issue is that if we take the route of stripping units off before doing any computation then any code that wants to work with units needs to do this. If we take the route of making more core operations work with units, then anything that's coded with operations that work with units will automatically also work with units. So the hard part is thinking about how operations in the standard library should work with units – but once that hard work is done, everyone benefits.

@JeffBezanson
Copy link
Member

Exactly. In fact I remember making ranges work with dates at one point, which was done by carefully using generic constructs like start - one(stop-start), but not really introducing anything new.

@StefanKarpinski
Copy link
Member

Should probably be tested by introducing mock unit-like types and making sure operations work.

@ajkeller34
Copy link
Contributor Author

The points raised here make a lot of sense. Just to give rationale, I went with unitless and unit originally because I liked that I could inline some statements to keep the code in base extremely close to what it was before my PR. I'll revise according to the discussion here, and we can see where we are after that.

@StefanKarpinski I have some tests in Unitful.jl (probably not enough) that cover when units are used. I'd suggest we just add tests there—as far as base is concerned, the changes I want to make are really just tweaks that should yield the same behavior as before for unitless numbers. So I don't see a need for mock unit testing in base, especially if I get rid of unit and unitless. Please let me know though if you feel strongly about that. I see that regressions in base could result in failures for units packages without failures for unitless numbers, so I guess I'll think about how to do this.

@mbauman Thanks for the nice words, which do matter for a first-time contributor!

@timholy
Copy link
Member

timholy commented Mar 8, 2016

Agreed that it's great to having you working on this, @ajkeller34! Like you, I am aware that we still have a ways to go before all is well in unit-land.

I also agree that unitful quantities are often a good test to see if we've gotten our abstractions right. In test/, the MeterUnits and RootInt objects are basically mock-units, so there's some precedent.

@StefanKarpinski
Copy link
Member

@ajkeller34: to elaborate on my testing proposal, I meant that we should have bare bones types that mimic relative and absolute unit quantities enough to make sure that the code in Base is doing what it should with respect to them. Otherwise changes will keep getting made accidentally in Base that will break Unitful. Unitful can do more thorough testing of actual unit functionality, of course.

@ajkeller34
Copy link
Contributor Author

I've played around with the code a bit more, trying to get it to work without defining unitless and unit. I could use some input before proceeding.

  • It would be good to clarify if the len field of a LinSpace should really be an AbstractFloat, or if it should be an Integer, as asked by @timholy. Likewise for FloatRange.
  • With LinSpace and FloatRange, the divisor field can be problematic if units are considered. There are two ways to proceed:
    1. Make the divisor field a different parametric AbstractFloat type. This way, the constructor can be modified to guarantee that the divisor field will not have units. Many other methods will then work as-is, since dividing by divisor will not give different result units. The downside here is that most of the time, when units are not used, there will be redundant information in the type signature for LinSpace and FloatRange, since start and divisor will usually be the same type.
    2. Keep the type of the divisor field as is. Then step and divisor have the same type, and therefore the same units. We might need to write code in numerous places to deal with unit annoyances, e.g.: step(r::FloatRange) = r.step / (r.divisor / oftype(r.divisor, 1)), which strips the units off of divisor without introducing new functions. I'm of the opinion that this is equally bad as just defining a unitless function, since you'd need to write this sort of thing all over the place.

@timholy
Copy link
Member

timholy commented Mar 9, 2016

I'd feel more comfortable deferring to @StefanKarpinski, but in looking at the code, I can't help but come to the conclusions that (1) the len field should be Int (we expect length to return Int, so there is no point supporting other Integer types in the internal representation...of course, one should support any Integer in the constructor arguments), and (2) divisor has to be unitless, and hence cannot be expected to be of the same type as start and step.

More speculatively, I wonder about replacing divisor by 1/divisor and using multiplication in the iteration routines. Division is slow. But multiplying by (1/x) does not always yield the exact same answer (down to the ulp level) as dividing by x, which is why I call this speculative: the challenge would be in making sure that all the careful work that Stefan did on the precision is not lost. (Presumably, if you pass the tests, you're golden!)

References and related work:

@ajkeller34
Copy link
Contributor Author

Thank you @timholy for the comment and references. The following is not a reply regarding linspace and FloatRange in particular, but rather some generic concerns I'm having.

Let's consider the following unit-unsafe but straightforward method from range.jl:
range(a::Real, len::Integer) = UnitRange{typeof(a)}(a, oftype(a, a+len-1))

This code is not unit-friendly because a could have units whereas len and 1 do not, and you can't very well add unitful and unitless numbers. A suggestion to fix this put forward by @mbauman earlier in this thread was to replace this with:

range(a::Real, len::Integer) = UnitRange{typeof(a)}(a, a + oftype(a, len-1))

This fixes the unit math, and initially I agreed that this would work fine. However, I think we are now conflating the unit type and the numeric type. In the unit-unsafe method, if we try range(0xff,2) an InexactError() will be thrown. If we instead tried the alternative, unit-friendly implementation, the behavior with regards to the numeric types changes, and an empty range is returned instead. This is because we are now adding two unitful numbers of the same numeric type and no inexact conversion is requested—overflows happen silently.

Another example. Consider rat(x):

function rat(x)
    y = x
    a = d = 1
    b = c = 0
    m = maxintfloat(Float32)
    while abs(y) <= m
        f = trunc(Int,y)
        y -= f
        a, c = f*a + c, a
        b, d = f*b + d, b
        max(abs(a),abs(b)) <= convert(Int,m) || return c, d
        oftype(x,a)/oftype(x,b) == x && break
        y = inv(y)
    end
    return a, b
end

At some point, if x has unit X, we ought to make a have unit X. However, we don't want to do anything to the numeric type of a. If I wanted to strip off the units of x, I could just do x/oftype(x,1), maintaining the numeric type (provided it is a floating-point number, which it mercifully happens to be in this particular instance...). But I don't see a way to just give a the unit of x using a method in Base, and I think you want that for this particular function.

I was trying to avoid these kinds of issues with unit and unitless. There would be no mistaking the intent of calls to unit and unitless, and therefore fewer inadvertent code regressions compared to some type gymnastics where the original intent is unclear from looking at the code. I think it is better to have a method that dispatches differently depending on whether the number has units or not, than to play tricks with existing methods like oftype, or to strip units from a unitful quantity x by dividing by oftype(x,1) (which only preserves the numeric type for floating-point numbers). That's really what we're doing when dividing by oftype(x,1): stripping units, without admitting we really want a function for that.

Can anyone address the points I raised? I hope I don't come across as overly defensive of my original idea, but how else can you treat the numeric and unit types separately when it is necessary or at least extremely convenient to do so? I don't think the scaling would be so bad... many methods in Base will work just fine with unitful quantities, without unit and unitless appearing all over the place.

Thanks again for reading.

@toivoh
Copy link
Contributor

toivoh commented Mar 15, 2016

Regarding ranges: I don't think that UnitRange makes sense for unitful quantities. The way that I see it, a big part of the point with using units is that you should get the same physical results regardless of whether you are using e.g. kgs or pounds to measure something, so there is no default step size that makes sense for unitful quantities.

You should always need to specify three things for a unitful range. In this way, there's some mathematical constructions, and some things in Base, that only make sense when the unit is 1. I think it's important to not make those let through units.

Considering the second example, rat tries to find a rational representation of a floating point number, right? It's hard to see the meaning of a unitful integer, so I guess that this is motivated from a compensation perspective (used in the ranges code, I guess?). In this case I would be inclined to say that the caller knows which numerical representation it wants to use and should be able to provide a unitless input to rat, but I don't know who the caller is.

In the rat example, I'm sure there will be a place where you will have to do something like oftype, but let's find the proper place.

One important reason that I see to avoid stripping off units is that it's very easy to mistakenly conclude that x*y or x+1 has the same units as x, eg you're circumventing exactly the kind of safety barrier that you are trying to set up.

@toivoh
Copy link
Contributor

toivoh commented Mar 15, 2016

Thinking about it a little more, in the rare cases like our current implementation of float ranges, you need something that gives a hint of the underlying representation.

The operation that might be most natural in such cases is to be able to factorize a quantity into a unitless number and a unit. The caller would than have to take care to use both parts appropriately.

@ajkeller34
Copy link
Contributor Author

What is the explicit definition of a UnitRange? If I may think of UnitRange as a StepRange with step size of 1, it's not too big of a stretch to me to think of it as having a step size of T(1), where T is the type parameter of the UnitRange. In this case the default step size is just one of the unit. 1m:5m has a step size of 1m, 1kg:5kg has a step size of 1kg, etc.

Having unitful integers can be very helpful, e.g. when working with unit standards and exact conversions. 1inch is exactly 254//100 cm, etc. When you only allow floating point numbers it is harder to write code respecting exact conversions.

@ajkeller34
Copy link
Contributor Author

@StefanKarpinski @Keno I am eager to work on this PR but I am waiting for response on some points I raised earlier. I guess I remain unconvinced that unit and unitless are the wrong way to go here and I've made some detailed arguments why, but here are my key messages:

  • The only assumption I make in base regarding how units are implemented is that they can be inferred from type parameters somehow. This is true for SIUnits as well as my package, and presumably would have to be true for any unit implementation that is fast at runtime.
  • In atypical but important cases you want to distinguish numeric and unit types. This requires additional methods unit and unitless unless stronger assumptions are made regarding how units should be implemented. Otherwise, conversion unavoidably conflates numeric and unit types.
  • Some atypical but important cases arise in range.jl. For instance, adding the length of a range to the starting value of a range: the length has no units but the starting value may have units. These invalid operations are avoided by stripping units. Note that I don't think it would be possible to strip units off an integer numeric type without unitless... you couldn't do something like (3m)/oftype(3m,1) because you'd mess with the numeric type.
  • When the mathematical operations are sound—adding unitless numbers, or adding unitful numbers with the same dimension (e.g. length + length)—the operations should proceed transparently and do not require putting these unit and unitful functions everywhere. I think maybe the initial reaction to my PR was that we'd be seeing these functions appear all over the place in base (@andreasnoack). They may yet appear in a few other places, but it will not be a disaster where you need to call these functions every other line.

I'm looking forward to when I can use units in Julia as part of my superconducting qubit measurement and design code, but I've been holding back until I know I have some support on this. I want to argue my case for why unit and unitless are needed. Please cc anyone who I should be communicating with to make this happen. Many thanks.

(I also recognize that tests are needed before this PR has even a chance of being merged.)

@ssfrr
Copy link
Contributor

ssfrr commented Mar 22, 2016

First, thanks @ajkeller34 for pushing this, I think that making units more first-class is an awesome improvement.

tl;dr - I'm not certain that we'll get to the point where whole packages will work with units without modification, but I think that making it easy for package authors to support units is huge (and we're pretty close). There are always going to be times when package authors need an escape hatch that converts between unitful and unitless values, so I think there's a use for unit and unitless. I think that library code should generally avoid stripping units in nonphysical ways.

In more depth:

Having just gone through the process of designing a package that should work well with units (I used SIUnits.jl because it's 0.4 compatible), maybe I can give some more data points.

Here are the places where I needed to take special care to handle unitful sampling rates and times:

  1. special-casing indexing for unitful values, because foo[1.1s] multiplies by the sample rate, but foo[1.1] is an error.
  2. interfacing with other libraries (C libs in my case) that can't handle units, so I needed to strip them out for those function calls
  3. checking for mis-matched units to give better error messages instead of NoMethod errors
  4. I ended up with more type parameters, because any value in a type field needed to be parametric if it might store a unitful value, where I might otherwise just use a Float64.

So the cases I had to be unit-aware were either when I wanted some escape hatch into non-unit land (1-3), or when I needed to store a value in a type field (4). The former seems like a good use-case for unit and unitless, but the latter seems like it might be a big problem for expecting non-unit-aware code to Just Work with units.

This example seems to be a nice example of some code that I know works with or without units. I had to do some gymnastics to get the types correct for the SinSource fields, but I don't know if there's any way around that, aside from allowing unitful and unitless numbers to be intermixed in physically-incorrect ways, which seems like badness.

Regarding when library code should strip units and proceed as if the given values were unitless, I think that it's usually indicative that maybe the operation you're doing isn't physically meaningful and it should be up to the user to handle unit stripping. It seems to me that you should be able to replace 1inch with 2.54cm in your arguments and the result should have the same physical meaning.

@timholy
Copy link
Member

timholy commented Apr 1, 2016

👍 to @ssfrr's final paragraph. @ajkeller34, I've certainly experienced similar frustrations in working with units, but I think the best answer is to get as far as you can without explicit unit-stripping and -adding functions. Then, file bugs against julia for those cases where you can't figure out a proper solution.

This has by far the best chance of a delightful outcome: by pointing out problems in specific places, and giving everyone a chance to scratch their collective heads, either (1) some bright person will come up with brilliant solutions that don't require code-uglification, or (2) everyone will give up and agree with you. Either way, you win. What's not to like? 😄

@ajkeller34
Copy link
Contributor Author

Thanks @timholy, I'll think about some alternatives. One can get pretty far without explicit unit-stripping and -adding functions, but ranges are tricky. And yes, @ssfrr has the right idea: I was only using unit and unitless when I was trying to do physically unmeaningful operations like adding a unitless and a unitful number. This can be helpful in getting all the Range types to work with the fewest changes. I agree that it is a totally legitimate position to be squeamish about putting such operations in Base. I just don't think that implicit unit stripping makes sense either: e.g. dividing by oftype(T,1) to strip units, where T is a unitful numeric type, is not only opaque but actually wrong for integers, since a floating point number results. So any solution the community will agree upon should probably avoid these sorts of operations altogether and come at it from a different angle. Or maybe we'll conclude that certain kinds of ranges should not support units.

@toivoh
Copy link
Contributor

toivoh commented Apr 1, 2016

I also agree that the most likely approach to be fruitful with the non-straightforward cases is to consider one at a time.

Also, I would like to question the use of oftype to strip or add units. I believe that oftype should satisfy

oftype(1.0cm, 1inch) = 2.54cm

and not oftype(1, 1inch) = oftype(1, 1cm) = 1. If you actually need to strip units there should be another (dedicated) construction, such as factoring a quantity into a number and a unit.

@TotalVerb
Copy link
Contributor

I agree with @tolvoh here. Unit conversion is compatible with the meaning of conversion, but unit change is not.

These days I have been leaning towards simply disallowing UnitRange for unitful quantities, because as mentioned above it is not really meaningful. One might imagine having to figure out if UnitRange on the B type should use the B or the more common dB, and I think it would be good to avoid that.

@timholy
Copy link
Member

timholy commented May 4, 2016

(Seems like there was some oddity with @TotalVerb's comment, I deleted a bunch of duplicates.)

Right, UnitRange doesn't make sense for unitful quantities, though StepRange does. It may also be fair to say that sometimes one might use a Range as a substitute for an Interval type, but in that case it would be much better simply to have the Interval.

@StefanKarpinski
Copy link
Member

Just to comment on the situation here for any casual observers: this is still very much something we want but essentially requires appropriate separation of one(x) into with and without units versions, which then must be distinguished throughout the standard library, which is a fair bit of work, but which we do still intend to get done for 1.0 (hopefully).

@tkelman
Copy link
Contributor

tkelman commented Feb 1, 2017

x-ref #20268

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.