-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Go 2: Add dimensionality to enrich type system and strengthen compile-type checking #57973
Comments
Historically Go has encouraged programming by writing code rather than coding by designing types. This change seems to make the type system more complicated in order to simplify the actual code. That's not a path that Go usually takes. I'm also not completely clear on how this works. If velocity is distance divided by time, then distance is time multiplied by velocity. If I write |
Perhaps the clarity and usefulness of this would be more clear in a more complicated, real-world example involving more variables and dimensions?
|
I'm unclear how often something like would be useful. Velocity, time, and distance are pretty obvious, but in a program, how often can types be related like this? Can this handle more complicated relationships that map to more programming-related problems, rather than physics? In other words, can you give an example of something that is more likely to come up in an actual project? |
I would encapsulate the type conversions in a function like so:
Is that insufficient? |
First, thank you for your engagement on this and testing whether the idea is robust. That's as close as I could get too. (I tried doing the equivalent with an anonymous function, but for some reason couldn't get the typing to pass the compiler...). The main difference here is that the compiler is no longer checking dimensions for us, because we have not told it explicitly that there is a relationship between velocity, distance, and time. So if we made a coding error in the function (unlikely in this simple example, more likely in realistic examples), such as mistakenly writing
the compiler would not be able to detect it. So I think I would say that this is considerably more verbose than being able to write So one way to think of this is that we are in a sense "inlining" the type checking that function parameters perform and putting it into the type checking of the variables directly in the formula; and in addition we gain the dimensional analysis that guards against incorrect formulas. Would it help if I were to show some more extended examples? |
@DeedleFake I have spent the last decade primarily working with people doing scientific and engineering projects so I admit I have a kind of population bias. However, with those people dimensional analysis is something they do all the time as verification of their formulas; so it would be very valuable to that population for the compiler to verify their formulas. However, I do think lots of other domains have the potential to benefit although in less obvious ways than physics dimensions - although it might take a bit of a mental shift to realize that their variables have dimensions. For example, I have also been working with people in distributed storage-defined software and here they have many integers being juggled in calculation such as buffer size (bytes); bytes per block; blocks per network packet; overhead (bytes); queue lengths; etc. Consequently, it is easy and far from unknown to mix up what different integers represent when calculating buffer sizes, overhead allowances, etc. So for one simple example, if we want to compute how many bytes fit in a network packet, we calculate bytes per block * blocks per network packet, which very naturally has units of bytes per network packet. Another example comes from own experience in a factory automation system, one of the gnarliest I dealt with as field engineer. At a canned food factory there was a production line subsystem that counted cans off the line, and a warehouse management subsystem that counted them into the warehouse. However, the latter subsystem was crashing periodically with an integer overflow. Eventually I found the problem at the interface between the two subsystems: the warehouse subsystem was expecting to receive a count of the number of pallets, which it then multiplied by 16 to get the number of cases of product in stock. However, the programmer of the production line system was not passing the number of pallets, but rather the number of cases, which when multiplied by 16 caused the crash. Judicious use of definitions (dimensions) such as countCansPerCase, countCasesPerPallet, countCases, and countPallets could have prevented the confusion. Lots of errors of this class happen at the interface between subsystems (e.g. the Mars mission that was lost because of a confusion between Metric and Customary units). For another domain example, consider project tracking/planning software. You might have lots of integer values such as number of FTEs; story points for each story; story points per day for each programmer; burn down rates; etc. As integers they can be combined in many ways, most of them meaningless. By defining dimensions such as FTEs, story points, days, etc. we can ensure that we are only combining them in meaningful ways. |
@DeedleFake P.S. if you have an existing codebase I could look at and see how it might work with this proposal, I am willing to do so. I think that is a fairer test than my making up an arbitrary or hypothetical example. |
I really like this idea. It brings something that we naturally do every day and makes it a part of the language. In banking and insurance domains there are many dimensions that would only help to ensure the calculations are correct. It seems to me that this is a mechanical check that is actually very hard for humans to do. But it would be easy for a machine to do. Make it a basic feature of function parameters. Then it can easily be applied to the special case of operators. |
Dimension checking is something that gets drilled into us when dabbling in physics but from a comp.sci point of view, how would someone implement this? All I can think of seems to rely on operator overloading at its basis (which is not allowed in Go for also good reasons) . Edit: more specifically, a subset of operation overloading restricted to subtypes. |
I am familiar with this sort of dimensional-analysis-driven programming in some high-level languages, but I don't think any general-purpose language I've used has included it as a first-class feature. Instead, I've typically seen it implemented by starting in a language that offers operator overloading and then using a library which allows constructing quantity types which include suitable overloads to make the arithmetic operators work correctly with them while also propagating the dimensions (and sometimes also converting units). For example:
Of course I don't claim that the languages I'm familiar with are the only languages worth considering, but I would consider the three languages I used for the examples above to all have some intended use-cases in common with Go. (If you are thinking of some specific other languages that do have dimensions and quantities as first-class language primitives rather than as libraries, I'd love to learn about them!) Assuming that it were valuable to model dimensions and quantities in Go, might it be more appropriate to follow the lead of these other languages and implement operator overloading as the language-level primitive, and then leave it to libraries to deal with the specific domain of engineering and scientific applications that would benefit most from dimensions and quantities? |
The proposal here is not for operator overloading. It is to extend the type system to permit certain type combinations that are currently not permitted. Right now, the language does not permit dividing a variable of type |
I understand the proposal as allowing to define relationships specifically between types whose underlying types support the Go today does allow multiplying two values of a single named type whose underlying type supports multiplication. For example: type velocity float64
v1 := velocity(1.0)
v2 := velocity(2.0)
result := v1 * v2
fmt.Printf("Result is %#v, a %T", v1*v2, result)
In a dimensional analysis system I think this is an incorrect result, because the result should be I'm not sure if this one hole where the types don't work out correctly invalidates benefits of making new expressions valid, but it does give me pause. Particularly when considering one of the examples in the proposal which already includes a "velocity squared": type energy (mass * velocity * velocity) This seems to suggest that products of the same type are considered for dimensional analysis when combined with another type, but what about this one? type velocitySquared (velocity * velocity) Would a declaration like the above change the result of my example above to be |
The last example in my previous comment -- where defining By that I mean that if Although it was the self-product case that made me think of this, I think the same seems to be true for the other examples. If It's also possible that two different packages might define different types involving Perhaps it would be reasonable to limit this only to relationships between types in the same package, but I don't think anything else in Go behaves like that. This also reminds me a little of a feature of the Rust programming language that some developers find frustrating (based on the number of grumpy questions I can find about it online): in order to call a method of a trait on a type that implements that trait, you must import the trait itself into scope even though the expression relying on it doesn't include the trait name anywhere in it. Rust traits also deal with the problem of potentially-conflicting definitions using the so-called "orphan rules", which (roughly) require that the relationship between a type and a trait must be defined in either the crate that defined the type xor the crate that defined the trait. That means that the compiler only needs to consider two possible locations for a definition (so no global analysis) and the program is invalid if the two locations conflict with one another. Of course this dimensional analysis problem is far less general than Rust traits, but it does seem to still have the same question of which behaviors are in scope in which locations, and of potentially-conflicting definitions in different packages affecting the same type. I'm not sure if Rust's solutions to this problem really make sense for Go. |
Not specific to go, I don't quite like this kind of types in general.
Because of 1) and 3), I actually have never used this kind of types for numerical simulation. Also, not to nitpick, velocity generally also includes the direction information - and that's 3 values for a 3D velocity, plus the coordinate system. I am unsure how to properly design a system to account for those. |
@apparentlymart I believe the result would have to be explicitly typed: type velocity float64
type velocitySquared float64
var result velocitySquared
v1 := velocity(1.0)
v2 := velocity(2.0)
result = v1 * v2
velocityDoubled := velocity(2.0) * v1 // type velocity
result2 := velocitySquared(v1 * v2) @czetie thanks. I can see the benefit of the compiler checking the formula is correct. For non-trivial formulas I would cover it by a test, but I can see how it makes a library package easier to use correctly. I wonder if there could be some proof of concept linter that could parse a comment to start with and validate a program for correctness.
This could help work out things like what was raised by @fardream before commiting to a language change |
No overloading required (other than what already exists for * and /). In its simplest form it just requires lifting the restriction against mixing types in multiplication and division provided that their underlying types are numeric (while retaining it for + and - because it strongly makes sense there). |
Actually I agree with you, but I did not want to complicate the simple examples.
Again, yes, and real world dimensional analysis typically does this, but I didn't want to complicate the examples here. (Remember the infamous lost Mars Lander due to a Newtons vs. customary units mixup?). Regarding the float32/float64 versions of libraries, again this makes sense. If I were designing a language from scratch I would allow function parameter overloading so that functions could be written with the same names and different parameters (32 and 64 bit versions) and the compiler would distinguish them as distinct functions, calling the matching function based on the types of the parameters. In other words, functions are distinguished not merely by name but their entire signature: name, return type, and parameter type. I suspect that may not be very Go-like, but as is probably clear by now, I like making the compiler do the work! For the memory reuse, not to sound flippant, but that sounds more like Fortran than Go :-) which of course is what a lot of people in your kind of domain are using. I would suggest that some kind of union type would be appropriate? |
Very good point. I also suspect that for some applications development teams will want to have a common, shared set of definitions for common dimensions and relationships (much like good old header files in C). Perhaps something that mirrors the Public/private distinction for functions would be appropriate, so that any given developer can make use of team-shared dimensionality as well as private definitions of their own? FWIW I like the idea of reusing the existing mechanism for the Public/private distinction. |
This is a very interesting idea. To make this work you would still need to lift the restriction on mixing different types in division or multiplication, or equivalently implicitly casting down [am I saying that right?] to the underlying types to allow it to compile, but obviously that is a much smaller work effort than implementing the analyzer. |
Yes, and I suspect you would still want tests anyway to validate the calculations. My former customers in the scientific realm used to run massive amounts of tests whenever we changed our software to ensure that previous results still came out the same, only faster. One way to think of this is that it is essentially "folding in" a certain amount of test - which is a huge help to people who, unlike you, are less than diligent in creating or maintaining tests. (I remember seeing some astonishingly high numbers for the amount of tests that are dead code, obsolete, or trivial "return true" stubs because people see it as a burden to write.) |
#20757 is closely related. |
@ianlancetaylor, I think the type-inference problem gets simpler (in some sense) if we allow variables to carry arbitrary combinations of units, and only collapse down to individual types when assigning to a variable of that type: an expression of type |
That said, I think the A better motivating example might be the SI base units and their derived units: In such a system, I think it would make more sense to separate the notion of “units” from “numeric representation”: it is more meaningful to speak of “an |
@bcmills Interestingly, in my early drafts of this idea I started from the same kind of description that you have here. And yes, strictly speaking dimension is orthogonal to type in the usual sense. I even thought about having a keyword of dimension or dim, so you might write something like:
I also had thoughts about, as mentioned upthread, about the distinction between dimensions (length, time, etc.) and units (meters, seconds, etc.). After noodling with various designs for the syntax for a while I decided that for strictly practical purposes it was less disruptive to the language (and therefore less to learn) to collapse everything into the type concept. The question would be, is it likely that in a given project we would want to have more than one variable type ( I also had in mind the kind of non-scientific realms where dimensions are largely "counts" rather than metrics. For example in my hypothetical project planning tool we might have something like
|
@ianlancetaylor @czetie @apparentlymart I still can't help but wonder if there isn't a general treatment somewhere to deal with time and temperature (and other) where deltas can be added but not the values themselves in most cases (unless we want to compute an average but a generic average function could deal with that easily). Seems that restricting related definitions to a same package could perhaps work. Would have to check. @bcmills |
The general treatment is to not provide arithmetic operators for such types. ( Note that most of the problems with |
This long article seems interesting here: https://gmpreussner.com/research/dimensional-analysis-in-programming-languages Now that we have generics perhaps they can be used for this looking at the examples from other languages? |
I just finished that article and was coming here to share it. The article is a good survey of what has been done in other languages. It is from 2018. The end is good where the author presets some alternatives and their pros and cons. https://gmpreussner.com/research/dimensional-analysis-in-programming-languages |
@czetie just wondering. Isn't there still the question of what the return type should be if we just relax rules around operations as you imply? (@apparentlymart original question I believe) @beoran @OwlSaver nice find! type distance[M unit] float64
type duration[S unit] float64
type speed[T, U unit] = distance[T] / duration[U]
type speedSI = speed[meter, second]
// where meter and second are types that satisfy the unit interface Don't know if it would make sense, purely brainstorming. |
I'm not I'm understanding the question, but I'll take a shot and if I have misunderstood please let me know I have been writing code like this:
I believe you suggesting that the last line here should be explicitly type-converted to the type of the result:
If so, I think you are right. The proposal is essentially the same, except that what we are now saying is instead of "the expression must have the same dimension as the variable to be assigned to", we are saying "the variable must have the correct dimensions for the type conversion (which in this case it does). So in effect this becomes an extension of the type conversion rules:
Does that help? |
Yes, overall the idea is there. I'm still unsure about whether that would just be a change in conversion rules. But other than that, yes :) |
The general idea of most of my questions so far has been to try to make the proposal more concrete/complete by describing it as precisely as I can and then noting aspects that seem to not yet be fully specified. In a few cases I also made some observations about what I might expect based on similar decisions elsewhere in the language, but some of the things I've asked just seem like tricky design edge cases where the answer isn't clear to me, so I'm asking in an attempt to understand what you'd expect to see (as someone who has more experience with programming with dimensional analysis in other languages) and fit that with what seems possible to achieve and what interacts well or less well with existing language features. With all of that in mind, I'm now thinking about this idea of implementing in two steps:
It's important to define for any operation what its return type should be, so splitting this into two steps requires step 1 to answer the question I previously posed about what type a multiplication or division of two different named types should return. Concretely, the question is what the following program should print, and why: type A float64
type B float64
a := A(1.2)
b := B(2.5)
fmt.Printf("%T", a * b) It seems like in this two-step proposal the second step would give an answer to this question if there were a type defined as the product of A and B, but for this to be implemented in two steps the first step must itself be fully specified. I think some earlier questions still apply for step 2:
I think doing this in two steps also adds at least one additional question:
I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer. I don't know the answers to these questions myself. I'm curious about your thought process leading to the answers at least as much as the answers themselves, because that can help to understand the shape of the problem and the scope of the proposed solutions. |
@apparentlymart > I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer. Absolutely clear, and I'm happy that you're as invested in this proposal as you are, and these are exactly the right questions to ask if this concept is to become an implementable design. Many of them are simply things I had not thought of myself.
This in particular is an important question, and not one I have worked through because it wasn't originally part of my thought process. If the answer is No then to me that implies that Step 1 should be done in Go v1, and step 2 in Go v2, because it will existing code no longer compile (although the fix is probably automatable). If the answer is Yes then it leaves a hanging question about the type of the result and what it can be validly assigned to, especially as the Go philosophy is averse to implicit type conversions. |
@czetie I just want to point out that it's highly unlikely there will ever be a v2 of Go. "Go 2" was a conceptual name under which major new features such as modules, error inspection and generics were introduced. But after it turned out that they could be introduced without major breakage, that name is no longer to be taken literally. The idea of introducing a new major version of Go with breaking changes has now been discarded as far as I know. In fact, the plan is to take backward compatibility even more seriously. |
@gophun Thanks for the heads up. I saw a lot of references to it and had no idea it had gone away. That makes it even more important to work through in design the implications of a two stage approach, if indeed that is considered desirable. |
Thanks for the proposal. This is an interesting idea. However, it might be more appropriate for an analysis tool, not in the language proper. The analysis tool could interpret special comments describing the type relationships, and then verify that all conversions were applied appropriately. That might be even better than this proposal because the tool could also verify units (acre foot vs. cubic meters, or whatever). In general Go's type system is quite simple, and this makes it more complex without providing a clear enough gain. This new functionality is also not generalized across the type system, in that it only works for scalar types. Also the emoji voting is not in favor. Therefore, this is a likely decline. Leaving it open for four weeks for final comments. |
@ianlancetaylor Thank you for your thoughts. A couple of final thoughts for your consideration before final close:
|
@ianlancetaylor Almost but not entirely orthogonal to this proposal, it has provoked another very high level thought in me that relates to the overall philosophy of Go. Is there a way for me to raise this more privately than in this forum? It is a potentially outlandish thought and this does not seem like the right place for it, at least until it has undergone more scrutiny. |
You can always write directly to me (iant at golang.org) but I can't promise a useful or timely reply. |
It is unlikely that we would consider permitting multiplication and division on values of different types. That kind of special case makes the language more complicated and harder to understand. Simple rules are highly desirable. |
Simple rules are certainly desirable, but not if their the wrong rules,
which I think this one is. "As simple as possible but no simpler",
allegedly said by Einstein. It makes perfect sense to forbid + and - on
aliased types because they are incommensurable. It makes equally perfect
sense to allow it for * and / as endless real-word examples show. Imagine
teaching this in a Go class:
"You can't add a speed and a duration, that makes no sense. But you can
multiply a speed and a duration, that gives you a distance."
(My final word on this proposal.)
…On Thu, Feb 2, 2023 at 5:50 PM Ian Lance Taylor ***@***.***> wrote:
It is unlikely that we would consider permitting multiplication and
division on values of different types. That kind of special case makes the
language more complicated and harder to understand. Simple rules are highly
desirable.
—
Reply to this email directly, view it on GitHub
<#57973 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFOUAO7EDUOVD2O6PZGBVLWVQ24DANCNFSM6AAAAAAUFFFMGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Sent from a full-size standard keyboard. I blame auto-carrot for any typos.
|
@czetie I know it's your final word but just wondering if you had thought about how one could build a domain specific language based on go's syntax? As I understand, the rule about operations is kept simple in a way that doesn't allow direct translation of distance, speed etc into types. Simply defining d=v*t without unit parameterization of these types seems limiting too? Anyway, the main point is that if one can piggyback on the language and build a DSL that can be transpiled to it, it might be preferable, make more sense and much more flexible and comprehensive? |
Definitely thinking about it. My even-bigger thought is a series of DSLs
for various things that are useful but not acceptable in the language
itself (whether for reasons of simplicity, avoiding bloat, etc.) that can
be transpiled into valid Go code. And along with that is the the thought:
what features would a language need to have in order to better support
being the target of such transpilation?
Another such DSL might be one for specifying UIs, which is generally messy.
(I said earlier in this thread that I'm not a huge fan of generation where
it can be avoided, but it might be the way to get the benefits of "minority
interest" capabilities such as this into a language whose philosophy is to
remain lean at its core.)
…On Fri, Feb 3, 2023 at 7:35 AM Awad Diar ***@***.***> wrote:
@czetie <https://github.com/czetie> I know it's your final word but just
wondering if you had thought about how one could build a domain specific
language based on go's syntax?
It's more work but I assume that would eschew complexity from being
inserted in the type system with some added benefits (vector/tensor
handling for quaternion math for instance.., handling of + and - (adding
deltas would be possible then, etc)
As I understand, the rule about operations is kept simple in a way that
doesn't allow direct translation of distance, speed etc into types.
At best, it describes that only the same *kind* of types can be used in
operations which is a lower level concept.
Simply defining d=v*t without unit parameterization of these types seems
limiting too?
Anyway, the main point is that if one can piggyback on the language and
build a DSL that can be transpiled to it, it might be preferable, make more
sense and much more flexible and comprehensive?
—
Reply to this email directly, view it on GitHub
<#57973 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFOUAI3QDGLMXFWDZLFCHDWVT3RFANCNFSM6AAAAAAUFFFMGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Sent from a full-size standard keyboard. I blame auto-carrot for any typos.
|
@czetie A good example of a Go based programming language is Go+ https://github.com/goplus/gop. You could probably do something similar to make a Go derivative language. |
You could bypass operators entirely and require methods. For example, type Velocity struct {
val float64
}
type Time struct {
val float64
}
type Distance struct {
val float64
}
func (v Velocity) MulTime(t Time) Distance { return Distance{v.val * t.val} }
func (t Time) MulVelocity(v Velocity) Distance { return v.MulTime(t) } It's incredibly unergonomic, but it would allow you to completely control the interactions of units. |
Something like what @DeedleFake wrote was what I was imaging generating in the code generation-based solution I mentioned back in #57973 (comment) . Although I would probably have just used I think generating these boilerplate methods for all of the possible relationships implied by some declared dimensions would at least avoid the non-ergonomic nature of implementing the functions. Of course it still means that using the generated functions requires method call syntax instead of operator syntax, but in a language that doesn't offer operator overloading that is exactly the tradeoff I would expect: built-in operators are for built-in operations only, and more specialized operations use method or function call syntax. d := Distance(10)
t := Time(2)
v := d.DivTime(t) // "v" has type Velocity due to the generated method signature The remaining snafu for this is that it is already valid to multiply two values of the same named type without any explicit conversions, and that result contradicts the dimensional analysis rules: v1 := Velocity(2)
v2 := Velocity(10)
// The following is a nonsense operation, but if it _were_ meaningful
// then it ought to produce a "velocity squared" result, but it actually
// produces just Velocity.
v3 := v1 * v2 Nothing to stop there being a generated This doesn't seem like a big show-stopper, though. In a program where the normal way to manipulate these values is via method calls hopefully someone is already expecting to get a "velocity squared" value using a method anyway, and will not think to try the normal multiplication operator. |
I thought that of that and was going to mention it at the bottom of my post, but I didn't because I thought the issue with the ability to, for example, multiply two velocities that you mentioned later would be enough of a problem to make that approach not worth the extra convenience. |
I think it's a shame things are going to pan out this way. It's a great feature for DSP pipeline (which there a fair few go libraries/applications for) verification too. In that vein, I don't think it's any less "clear" than complex numbers, and if done well could even serve as a new basis for that and other number systems. Also think it's a shame to overlook the significance of compile-time correctness guarantees. To that end it would be great if this thread were left open for discussion after the issue is closed. I think a potentially good implementation of this feature comes from what you might call a/an arithmetic domain/s. package domains
// Using upper/lower case for public/private
type Complex[T ordered & number] domain {
Real[T] = T
Imaginary = Real[T] + I[T]
} // binary operator set is {+}
type SI[T number] domain {
Speed[T] = Meter[T] / Second[T]
Hz[T] = 1 / Second[T]
Acceleration[T] = Speed[T] / Second
Force[T] = Mass[T] * Acceleration[T] & Mass[T] / (Second[T] * Second[T]) // here & means the expressions on either side should be equivalent
Energy[T] = (PlanckConstant /* const defined elsewhere */ * Frequency[T]) & (Force[T] * Acceleration[T] | 1/2 * Mass[T] * Speed[T] * Speed // here | means either of the expressions on either side are valid
} // operator set is {*, +}
type Audio[N, R number] domain {
Hz[R] = 1/Second[R]
SampleRate = R(Samples[N]) * Hz[R] // samples sh/c-ould be an integer
Freq[R] = Complex[R] | Hz[R] + Phase[R]
}
type (
Velocity3d[T number] domain {
X[T] + Y[T] + Z[T]
}
Velocity4d[U number] domain {
Velocity3d[U] + T[U]
}
) package main
import ."domains"
func Mass[T float32|float64](f SI.Force[T], a SI.Acceleration[T]) SI.Mass[T] {
return f/a
}
There is a concern about how to specify that you do not want the resulting value to be "reduced" ie, a vector has multiple parts but a scalar is reduced to one part. It is possible that there are systems whose arithmetic is best modeled by addition/subtraction but warrants "reduction". There are also "terminal" units AFAIU, all of this could be implemented without making use of operator overloading by using generics. Essentially, the units should behave "like" |
https://github.com/aurora-opensource/au is a library that provides compile-time dimensional unit safety in C++14 (as opposed to runtime unit manipulation which is more common in libraries in various languages). While their implementation is entirely different from what is proposed here (a library instead of a change to the language), a lot of the discussion in their documentation and on their repository is relevant here. |
@sparr While we won't be able to do as much compile type checking in Go as that library uses in C++, I think it should be possible to port this approach to Go generics. I recommend people who are interested in units in Go to try this approach. |
More discussion but no change in consensus. |
Author background
Would you consider yourself a novice, intermediate, or experienced Go programmer?
Novice with Go, experienced with other languages esp. C
What other languages do you have experience with?
Fortran, Pascal, Algol, C, Rust, Python, PL/M, various assembler
Related proposals
Has this idea, or one like it, been proposed before?
Not to my knowledge or that of other Go programmers I spoke with
Does this affect error handling?
No
Is this about generics?
No
Proposal
Golang does not allow mathematical operations between operands of different types, even if they have compatible underlying types. This provides highly useful protection for addition and subtraction and is helpful in avoiding a certain class of logical errors. For example, it is almost certainly a logical error to attempt to calculate
speed + elapsedTime
, (assuming their types have been appropriately defined as float64, for example). However, this is too restrictive for multiplication and division:distance = speed * elapsedTime
is a perfectly reasonable calculation. In order to perform this calculation today, the programmer must reduce everything to compatible types, typically the underlying types, this sacrificing the benefits of type checking. For example:A small enhancement to the rules for type mixing in expressions would allow useful constructs as mentioned above such as
distance = speed * elapsedTime
, as well as other more complex calculations, while still preventing mistakes such asdistance = speed + elapsedTime
. The characteristic of these calculations is that they mix values of different "dimension", in this case velocity, which in turn can be defined as having dimension Length/Time; and Time. And the resulting value has dimension different from either, i.e. Length in this case. While adding values of different dimension makes no sense, multiplying or dividing them can be perfectly meaningful and extremely useful.This proposal would allow the developer to specify, by using the type system, dimensions and valid combinations of dimensions together with the new dimension that they result in.
This proposal would help any programmer working with data that has dimensions, e.g. physical quantities of Length, Time, Distance, or combinations of them, i.e. essentially all scientific and engineering data, by providing an added compile-time check that only quantities of the correct dimension are being mixed.
It could also help programmers in commercial applications where dimension might be defined more abstractly than in physics. For example, to estimate project cost we multiply "number of FTEs" by "fully loaded FTE cost" to get a result in dollars (or appropriate currency). We should not be able to use any old number defined in dollars in this calculation, but only one defined as an FTE cost. Or we might estimate project duration by dividing "number of story points" by "story points per FTE per day" and multiplying by "number of FTEs" to get a result with dimension measured in Days. Other examples of common classes of commercial mistake that can be caught with appropriate definitions include mixing quantities in different currencies; or multiplying by a fraction (0.0 to 1.0) instead of a percentage (0.0 to 100.0).
The more complex the calculation, the more valuable the checking of dimension. And to reiterate, this checking is lost today even if the programmer defines types, because in order to write calculations that mix types, they must be reduced to the underlying type, e.g.
float64
.This proposal would ADD a new form of type definition as an extension to existing syntax to define a type as the product or quotient of other types.
It would also MODIFY the restriction on mixing types in arithmetic expressions: it would still be forbidden to mix types in addition and subtraction, but it would be permitted to mix types in multiplication and division provide they have compatible underlying types.
Such type definitions could be compounded to any (reasonable) degree necessary for the domain of the application.
Consider the following code fragment that attempts to calculate a velocity from the division of distance by time.
The compiler will reject the assignment because the quotient of
d
andt
has a different type fromv
, even though they have the same underlying types. Specifically,d / t
has type (or dimension)distance / time
, which is not the same asvelocity
- at least as far as the compiler knows! We could make it compile by the following explicit type conversions:v = velocity(float64(d) / float64(t))
However, now we have sacrificed the type checking that we had hoped for by creating types for
distance
andtime
;d
andt
could mistakenly be variables of any type whose underlying type isfloat64
.A better approach which allows us to retain the type checking is to define the type
velocity
not asfloat64
but as the quotient of typesdistance
andtime
:Of course, this example is very simple; real scientific and engineering formulas will often have many more variables and types.
Is this change backward compatible?
Yes. This does not remove or change any existing syntax; nor change the semantics of any existing compilable code. It extends the allowed forms for calculations (type mixing) and for specifying types in terms of other types.
Orthogonality: how does this change interact or overlap with existing features?
The concept of dimension is complementary to existing syntax for types, but as far as other language features are concerned, e.g. parameter passing, the end result behaves just like existing types.
Is the goal of this change a performance improvement?
No.
Costs
Would this change make Go easier or harder to learn, and why?
Probably minimal change. One small addition to the syntax for types is added, and a potential confusion about mixing types is removed.
What is the cost of this proposal? (Every language change has a cost).
Modest. Existing code for determining the underlying types of variables and expressions and for converting between types, particularly to and from arithmetic types, will be relevant and can probably be reused. Additional code will be required in the code for parsing arithmetic expressions (see below under proposed implementation).
How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
Compiler only.
What is the compile time cost?
Modest. Additional checking for type compatibility in multiplications and divisions involving defined types whereas today those expressions are quickly rejected as errors.
What is the run time cost?
Nil. After compile-time checking, the code reduces to the underlying types like any other calculation.
Can you describe a possible implementation?
The compiler builds a dependency tree of types defined in terms of other types all the way down to underlying types. When it encounters an expression that multiplies or divides types it traverses the tree down to one level above underlying types, and verifies that dimension matches on both sides. For example, with the definitions
a variable of type
energy
has dimension(mass * velocity * velocity)
which expands to(mass * (distance / time) * (distance / time))
. The compiler will also expand the types of the variables to ensure that they have the same base dimensions, e.g.(force * distance)
expands to(mass * acceleration * distance)
and finally to(mass * distance / time / time * distance)
. Note that in order to see that these two are the same dimension, they must be rearranged into a common canonical form. However, this can be as simple as a list of base dimensions and for each a list of their indexes, e.g.mass^1
distance^2
time^-2
Therefore, verifying compatibility requires only traversing the dependency tree to the bottom and accumulating a count of the index of each base dimension.
No.
Long prose description of proposal:
Enhancement request_ Dimensionality of Types.docx
The text was updated successfully, but these errors were encountered: