Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Another proposal for a standardized object construction/initialization. #48

Closed
zah opened this issue May 16, 2018 · 37 comments · Fixed by nim-lang/Nim#20480
Closed

Comments

@zah
Copy link
Member

zah commented May 16, 2018

Currently, the approach of initializing object used in the standard library is to offer a proc following a naming convention built around the type name (e.g. initTable[string, int]()). This has the following drawbacks:

1. You cannot write generic code that constructs an arbitrary type given as a parameter.

This is the most significant drawback and it should be obvious.

2. There is an inconvenient DRY violation when initializing fields of record types

type Foo = object
  tbl: Table[int, string]

proc makeFoo: Foo =
  result.tbl = initTable[int, string]() # I had to repeat the type parameters here

3. Another DRY violation when using type aliases

type
  MyCache = Table[int, string]
...
var cache = initTable[int, string]() # I had to repeat the type parameters here

Proposal 1

All of the above problems can be solved by switching to the following convention:

EDIT: After some discussions in the comments below (https://github.com/nim-lang/Nim/issues/7832#issuecomment-403488670), I'm revising the proposal:

  1. The canonical init proc for a type is defined like this:
proc init*(T: type Foo, arg1: Bar, arg2: Baz): Foo =
  result = Foo(field1: arg1, field2: arg2)
  1. A generic init template can be used to initialize existing locations by inferring their type:
template init*[T](val: var T, args: varargs[typed]) =
  val = init(T, args)
Previous Proposal 1) The canonical `init` proc for a type is defined like this:
proc init(f: var Foo, arg1: Bar, arg2: Baz) =
   f.field1 = arg1
   f.field2 = arg2
  1. There is a generic init template accepting a type name and forwarding the initialization parameters
proc init(T: typedesc, args: varargs[typed]) =
   var res: T
   init(res, args)
   res

The previously problematic code is now beautiful:

type Foo = object
  tbl: Table[int, string]

proc makeFoo: Foo =
  init result.tbl

type
  MyCache = Table[int, string]
...
var cache = init MyCache
var anotherTable = init Table[int, string] # existing code now requires fewer keystrokes

Proposal 2

Make init a well-known proc similar to items and pairs and treat the currently invalid expressions such as Foo(x, y, z) as init(Foo, x, y, z) (Foo here is assumed to be a type).

Some further notes:

Nim is gradually moving to user-defined memory management schemes. The initialization scheme can be augmented in the future to play well with various memory allocation schemes. Consider the following helper:

template myAlloc(x: typed) = 
  type T = type(x)
  var mem = myAllocImpl(sizeof(T))
  mem[] = x # this will have to be only slightly more complicated in reality

# the usage will look like this:
var f = myAlloc Foo(10, 20)

For supporting arenas and other similar schemes, it would be easy to extend this syntax with additional niceties such as the following:

var f = alloc Foo(10, 20) @ myArena
@zah
Copy link
Member Author

zah commented May 16, 2018

@cooldome
Copy link
Member

While I like the approach a question to consider: init template above will cause extra copy, need a way to avoid it

@zah
Copy link
Member Author

zah commented May 16, 2018

Well, the compiler is free to perform copy elision. This might be a good time to put some effort into implementing it.

@metagn
Copy link
Contributor

metagn commented May 16, 2018

This is probably a terrible idea but:

type Foo = object
  tbl: Table[int, string] {.init.}

proc `=init`[K, V](tbl: var Table[K, V]) =
  tbl = initTable[K, V]()

var foo: Foo
foo.tbl["key"] = "value"

@jangko
Copy link

jangko commented May 17, 2018

currently, generics, varargs, aliases, and statics are fragile. perhaps they are less explored, or because they are buggy so less people use/fix them.
it is generally a good idea to put them into use and convince more people to use them.
in the end not only we have beautiful and generic code but also more resilient generics, varargs, aliases, and statics.

@mratsim
Copy link
Collaborator

mratsim commented May 17, 2018

I would add distinct and ranges to that list.

Regarding the proposal, I really like it.

That would bring init more in line with the current new like let a = new ref array[2, int] or let a = new Tree.

New can also be refined the same way (except for the finalizer):

proc new(f: var Foo, arg1: Bar, arg2: Baz) =
   f.field1 = arg1
   f.field2 = arg2

proc new(T: typedesc, args: varargs[typed]): T =
   new result # what to do with the finalizer here?)
   new(result, args)

So that we can do new(seq[int], 10) instead of newSeq[int](10)

@Araq
Copy link
Member

Araq commented May 18, 2018

generics, varargs, aliases, and statics are fragile.

Ouch, I would only list 'statics' here... :-)

@andreaferretti
Copy link

@Araq Agreed :-)

@Araq
Copy link
Member

Araq commented Jul 1, 2018

Proposal 1 is clearly superior in my book as it doesn't require yet another language feature. (Btw it needs to use macros.unpackVarargs for this to work.)

@zah
Copy link
Member Author

zah commented Jul 3, 2018

I should clarify that proposal 2 is an extension of proposal 1, not an alternative. It's intended to make the language a bit more convenient after Proposal 1 is in place.

@Araq
Copy link
Member

Araq commented Jul 9, 2018

Any proposal that transforms var x = f() into var x; f(x) is deeply problematic. We want to warn about uninitialized variables (and turn them into an error eventually) and if f takes a var T we don't know whether f really wrote to x or not. In other words, init is statement based, but we need an expression based solution, so that var x = MyObject(field: init, parentObject: ParentObject()) can work.

@zah
Copy link
Member Author

zah commented Jul 9, 2018

This goes back to some of our old discussions about constructors. In Nim, every single proc can be considered a constructor, so it's a problem that the compiler is not able to check that all the fields of the result are being initialized. And we'll probably need some kind of control-flow-aware analysis to do this properly.

If we have such control-flow based analysis, I can imagine the init procs suggested here also taking advantage of it (they can have an uninitialized constraint on the main parameter which requires all the fields to be initialized).

In any case, before turning this down, we'll have to come up with an alternative. To achieve equal convenience in all the examples I have provided, I can imagine two types of schemes:

  1. Using return-type overloading and some well-known "init" proc.

  2. Rewriting the init statements into something that explicitly passes the desired return type as a parameter:

template init(x: var T, args: varargs[typed]) =
   x = init(T, args)

proc init(T: type, x, y: ...): T =
   result.x = x
   result.y = y

As you can see, this rewrite is roughly equivalent, it doesn't suffer from the var x: T problem you mentioned, but ultimately it has the same problem that you must ensure that the result variable is fully populated.

@Araq
Copy link
Member

Araq commented Jul 9, 2018

it doesn't suffer from the var x: T problem you mentioned,

Yes, good.

but ultimately it has the same problem that you must ensure that the result variable is fully populated.

This should be solved by embracing the object construction syntax Foo(x: 0, y: 1) -- needs to check all fields are populated, but doesn't require control flow to do so. (This is what Rust does too and it works for them...)

@andreaferretti
Copy link

andreaferretti commented Jul 10, 2018

@Araq The problem with this approach is that whenever constructing one of this fields requires some long and complicated code, people will move this code above the definition of Foo, and then assigning to a field may trigger a copy. In Rust, this does not happen because their control flow analysis guarantees that no copy will happen.

let foo = Foo(x: somethingWhichIsComplicatedToWriteInline())
# becomes
let x = somethingWhichIsComplicatedToWriteInline()
let foo = Foo(x: x)

By the way, forcing this pattern makes result completely unuseful - if I have to write result = Foo(...), I may write return Foo(...) just as easily. The advantage of result was just that one could populate it one field at a time. I don't care one way or the other, but if we disallow this pattern, we might as well remove the special result variable altogether

@Araq
Copy link
Member

Araq commented Jul 10, 2018

In Rust, this does not happen because their control flow analysis guarantees that no copy will happen.

It gets turned into a move in Nim too.

By the way, forcing this pattern makes result completely unuseful - if I have to write result = Foo(...), I may write return Foo(...) just as easily.

No, result is as useful as before, result = @[]; for x ...: result.add x

The advantage of result was just that one could populate it one field at a time.

That never was its purpose.

I don't care one way or the other, but if we disallow this pattern, we might as well remove the special result variable altogether

No, result is not about object construction.

@zah
Copy link
Member Author

zah commented Jul 10, 2018

So, are we happy enough with the revised proposal? I can edit the spec at the top of this issue.

@Araq
Copy link
Member

Araq commented Jul 10, 2018

Yeah please do so. Not sure what the revised proposal now looks like. ;-)

@andreaferretti
Copy link

It gets turned into a move in Nim too.

Great to know, I was not sure about this!

No, result is as useful as before, result = @[]; for x ...: result.add x

Well, yes, but hardly more convient than var result = @[]; ...

In my experience, result was mostly used for this case where you want to construct some immutable object. In the constructor function, one just uses result and populates it one field at a time, while the returned result is often used with let and never mutated.

This achieved most of the benefits of immutability, with some convenience: for many kind of objects, the only moment you want to mutate them is at construction time, and the use of the special variable result allowed to do this conveniently. Then, only few var remain, and you can inspect them more closely. A lot of the advantges of a functional approach without the need to populate all fields together, which can be cumbersome.

Maybe this was never the intended use of result, but certainly it is the only practical advantage I had found for it, so I always justified its use in this light

@Araq
Copy link
Member

Araq commented Jul 10, 2018

Well, yes, but hardly more convient than var result = @[]; ...

It's exactly as convenient as your use case which uses result for construction.

Then, only few var remain, and you can inspect them more closely.

That's all still true when construction turns from result.x = 3; result.y = 4 to result = MyObject(x: 3, y: 4).

A lot of the advantges of a functional approach without the need to populate all fields together, which can be cumbersome.

It isn't "cumbersome" in my experience. shrug

@zah
Copy link
Member Author

zah commented Jul 10, 2018

I've updated the proposal. Only points 1) and 2) of Proposal 1 have been changed.

@andreaferretti
Copy link

I think I failed to highlight my point. The special variable result is an unusual language feature, and a pretty surprising one for new users. The only reason I saw it had an advantage was the pattern where contructing an object amount to populating the fields of result one at a time.

If the plan is to disallow this pattern, I don't think that result carries its weight.

It's exactly as convenient as your use case

Exactly. The point is that having result should, at least in some cases, give something more convenient than not having it.

The following pattern

type Foo = object
  a, b, c: int

proc makeFoo(x: int): Foo =
  result.a = x
  result.b = x + 1
  result.c = x * x

proc useFoo() =
  let foo = makeFoo(5)
  echo foo

allows to avoid var most of the time, which kind of makes me aware of where I use mutability, while allowing to mutate an object in the only moment where I want it: when I construct it, in order to construct it piecewise.

If this pattern cannot be used, what is the advantage of having the result variable? Not typing return?

@Araq
Copy link
Member

Araq commented Jul 10, 2018

I think I failed to highlight my point.

No, but I completely disagree with your point.

The special variable result is an unusual language feature, and a pretty surprising one for new users. The only reason I saw it had an advantage was the pattern where contructing an object amount to populating the fields of result one at a time.If the plan is to disallow this pattern, I don't think that result carries its weight.

result gives us a guaranteed "named return value" optimization. Also both Pascal/Delphi and Eiffel have result, it's not unusual at all. ;-)

And again, every time you use some kind of accumulation to compute the 'result', result does carry its weight. It was never its primary purpose to allow some "nice" unstructured object initialization.

@andreaferretti
Copy link

Ok, sorry, maybe it was not its use, but I had misunderstood it for one of its main use cases :-)

Guaranteed "named return value" optimization is a much better reason for the existence of result - maybe this should be mentioned more clearly in the manual, since it is an important feature!

@alehander92
Copy link

What's the status of this? having a smarter init for stuff like Tables/JsAssoc etc would be a big ergonomic win

@arnetheduck
Copy link

just had a bit of discussion about this with @zah - one important point this proposal does not address is the special status that constructors have in C++: they allow you to limit construction to a specific set of functions per type, meaning you can be guaranteed that no "rogue" instances will be created.

With nim gaining destructors, having that kind of control over construction is crucial in order to be able to write efficient destructors - you can make assumptions about what states the object can be in - it brings balance to construction and destruction - one function is guaranteed to be called when instances spring to life, and another when they pass.

@Araq
Copy link
Member

Araq commented Sep 25, 2018

With nim gaining destructors, having that kind of control over construction is crucial in order to be able to write efficient destructors - you can make assumptions about what states the object can be in - it brings balance to construction and destruction - one function is guaranteed to be called when instances spring to life, and another when they pass.

This deserves a more elaborate answer but in a nutshell: Constructors that require parameters beyond the self parameter are problematic:

  • More language complexity, not obvious how to deal with containers of T when T only has constructors that take parameters.
  • Language asymmetry: Destruction can be and should be implicit and only take the self, construction must be explicit and takes more parameters?
  • Unclear: If intermediate results produced by every function f are subject to deconstruction (and they are in Nim and in C++!) this means every f constructs an object. What is the point of a dedicated constructor then? That every object constructions starts off from a fixed set of special operations. Ok, but that is covered by basic API design if we can enforce a simple control flow property: Prove that every path sets a variable before it is used (I wrote an RFC to cover that aspect). This doesn't require a dedicated language construct.

@arnetheduck
Copy link

More language complexity, not obvious how to deal with containers of T when T only has constructors that take parameters.

disallow, initially at least - this means the resize operation is not callable, but the add operation is. later, one can add a special factory-based resize etc. basically, offer the functionality that is meaningful in a context where construction is limited, but disallow the rest.

Language asymmetry: Destruction can be and should be implicit and only take the self, construction must be explicit and takes more parameters?

only self - by default, parameters, optionally? it's actually an interesting feature - to have the ability to get a compiler error whenever an instance is projected to go out of scope - that means you can force the user to explicitly think about destruction - not a bad idea actually - similar to discard - could be used to enforce certain coding standards or error checks where it's appropriate to do so

What is the point of a dedicated constructor then? .. Prove that every path sets a variable before it is used...

this is not sufficient - the constructor is used in c++ to limit the possible states of the object beyond simply being set or not. examples:

  • enforce non-nil reference on construction, of some resource - this way, one can elide if-not-nil checks throughout.
  • state machine starts in a particular state - allows you to elide meaningless state transitions safely.

I wonder if this can be enforced in some other way however - ie special annotations, tricks with types and concepts etc

@Araq
Copy link
Member

Araq commented Nov 7, 2018

this is not sufficient - the constructor is used in c++ to limit the possible states of the object beyond simply being set or not.

I don't understand how this and your following examples differ from my description.

@arnetheduck
Copy link

type X = object
  v: int

initX(): X(v: 42)

let's say that I want X.v to always start at 42. what prevents someone in the above example from doing var x = X(v: 33)? this is nothing I can do with the current tools available to me, within a module (cross-module, I guess * helps, but that's not enough while implementing the module itself - it's when implementing the module that I want the compiler's help to guarantee that I haven't accidentally constructed the object poorly)

@Araq
Copy link
Member

Araq commented Nov 8, 2018

That idea came up before and it was suggested the type could be annotated with .requiresInit, which is at least far easier to add then full-blown constructors.

@krux02
Copy link
Contributor

krux02 commented Nov 8, 2018

@arnetheduck with current Nim your objects always start zeroed out. That is actually a quite nice property to have. Introducing constructors that allow what you want would destroy this property.

@zah
Copy link
Member Author

zah commented Dec 17, 2018

So, why aren't we making progress with this proposal?

I've already included the suggested init template in several of our libraries and I'd like to have it available in system.nim instead.

template init*[T](val: var T, args: varargs[typed]) =
  mixin init
  val = init(T, args)

@arnetheduck
Copy link

arnetheduck commented Dec 17, 2018

@krux02

with current Nim your objects always start zeroed out. That is actually a quite nice property to have. Introducing constructors that allow what you want would destroy this property.

yes - that is exactly what I'm asking for tools to avoid. as a library author, I would like to create beautiful libraries where the defaults are meaningful for the type that I create, and not an arbitrary choice made by the language. I think Nim recognizes how important it is - in fact it is so important that a massively backwards incompatible change was introduced in the language to special-case two types: strings and sequences - these two behave like is natural for their type - there is now an empty string and not nil waiting for you when you create it. I'd like to have that power also, without extending the language!

template init*[T](val: var T, args: varargs[typed]) =
mixin init
val = init(T, args)

@zah - with that function, you make it very easy to shoot yourself in the foot - for example, there's nothing preventing that the function is called with a non-zeroed object, meaning you'll start seeing safety guides surrounding its use: "only call on previously uninited objects" etc / or you'll have to implement it defensively and make up for this incredibly easy mistake (if file is open: close; now-do-init)

edit: the proposed function replaces instance - this is not an issue (thanks @zah for pointing out)

@arnetheduck
Copy link

arnetheduck commented Dec 17, 2018

initTable

the main example is another great case for why you would want this: to meaningfully use the Table type, you first have to call a magic function - if you don't, it breaks. this is really bad UX - it's incredibly easy to forget it unless the compiler enforces it, and you end up with the exact same issues you used to have when you forgot about initializing a ref to something not nil.

right now for example, you can put things into a Table if you forget to call initTable, but you can't get them out again - this is a side effect of the language special-casing sequences to not nil (it used to crash!):

import tables
  
var x: Table[int, int]
x[44] = 42

echo x[44] # IndexError on this line

very hard to correlate this issue / bug to "you forgot to call initTable" - there's just nothing to help you ("read the docs" aka "blame the user" is not a helpful option).

@zah
Copy link
Member Author

zah commented Dec 17, 2018

with that function, you make it very easy to shoot yourself in the foot - for example, there's nothing preventing that the function is called with a non-zeroed object

@arnetheduck, your statement is not true. Take a look at the definition carefully. It assigns a fresh copy to the memory location which should be always safe.

@zah
Copy link
Member Author

zah commented Dec 17, 2018

Otherwise, my proposal can be extended further to cover "default construction" too. But with the current destructors and move handling plans, Nim needs to assume that all-zero locations are always safe to use. The main reason for this is that an object may be moved out from an array, which have to zero out the array location in order to defuse the destructor that will be called when the array is destroyed. The only possible alternative to this would be allocating a bitmask with every array/seq specifying which slots have been already destroyed. This solution has been considered too expensive by Araq.

Well, to clarify, you need to reset the location which may put it another default safe state (not necessarily all-zeros).

@arnetheduck
Copy link

@PMunch just pointed out another interesting case:

var myRange: range[10..20]
echo myRange # echos 0, because everything is instantiated as a zero

clearly, 0 is not an expected value in this case..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants