Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[superseded] ref syntax for lvalue expressions: byRef: myref=x[1].foo[2] #11824

Closed
wants to merge 14 commits into from

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Jul 25, 2019

this PR introduces a ref syntax for lvalue expressions, similar to what you can do in C++ with auto& a = someExpr()

compared to #11686 and prior attempts, there are 4 advantages:

  • better syntax: byRef: foo = expr vs alias(foo, expr)
  • side effect safe: the lvalue expression is evaluated only once, at declaration time, so any side effect will be evaluated only once. This avoids the pitfall mentioned here Alias sugar #11686 (comment)
  • type safe: CT errors are caught at declaration time, not on 1st use
  • allows export

See example2 vs example 3 below for the comparison

By design, it only works with lvalues and will give a CT error on non-lvalues (for which ref makes no sense).
It is a pure library solution.

example 1: simple example

import std/macros
var x = @[1,2,3]
byRef: x1=x[1]
x1+=10
doAssert type(x1) is int
doAssert x == @[1,12,3]

example 2 illustrating side effect safety, type safety

  import std/macros
  var count = 0
  proc identity(a: int): auto =
    block: count.inc; a # introduces a side effect (or an expensive computation etc)
  var x = @[1,2,3]
  byRef: x1=x[identity(1)] # the lvalue expression is evaluated only here
  doAssert count == 1 
  x1 += 10
  doAssert type(x1) is int # use x1 just like a normal variable
  doAssert x == @[1,12,3]
  x1 += 100
  doAssert count == 1 # count has not changed
  # byRef: x2=y[0] # correctly gives CT error: undeclared identifier: 'y'

example 3: comparison vs #11686

  var count = 0
  proc identity(a: int): auto =
    block: count.inc; a
  var x = @[1,2,3]
  alias(x1, x[identity(1)])
  doAssert count == 0
  x1 += 10
  doAssert count == 1 # count has changed
  x1 += 100
  doAssert count == 2 # count has changed again => bad
  doAssert x == @[1,2+10+100,3]
  alias(x2, y[identity(1)]) # doesn't give CT error even though `y` undefined

note

  • see also [superseded] alias: myecho=echo to alias any symbol #11822 which defines an alias syntax that works with symbols instead of lvalues, it's a different concept
  • a byPtr is analogously defined in this PR for cases where unsafeAddr is needed instead of addr, eg for unsafe access to non-var params; it is useful in some situations

[EDIT]

  • this PR provides a safer pattern than directly calling addr or unsafeAddr since the address doesn't escape the scope where it's defined
  • also allows export, eg: byRef: foo*=bar

@ghost
Copy link

ghost commented Jul 25, 2019

These seem out of place in macros. sugar would be more fitting.

@zah
Copy link
Member

zah commented Jul 25, 2019

There is a way to implement this in the compiler while preserving memory-safety. It would be a larger reform that would also allow a more relaxed usage of the var, lent and openarray types as fields in objects, as long as they appear only in stack locations. Our code in Nimbus has numerous occasions where having such a support will bring significant optimisations and I plan to write a detailed RFC about it as time permits. The proposal will be somewhat inspired by Herb Sutter's plans for introducing lifetime analysis in C++.

@Araq
Copy link
Member

Araq commented Jul 25, 2019

There is a way to implement this in the compiler while preserving memory-safety. It would be a larger reform that would also allow a more relaxed usage of the var, lent and openarray types as fields in objects, as long as they appear only in stack locations.

Just to ensure we're on the same page: "stack location" is the wrong idea for this, there must be some kind of borrowing going on. I propose to re-use the "location is derived from the first parameter" idea that we use for var T return types, we can expand it via from later but so far we've done well without from.

lib/core/macros.nim Outdated Show resolved Hide resolved
@zah
Copy link
Member

zah commented Jul 25, 2019

I'll try to clarify:

Let's use the term "referenced object" for an object owning memory that can be pointed to by a var/lent/openarray pointer. When I say "stack location", I'm referring mostly to the lifetime aspects. To ensure safety, you need to know that the lifetime of the referenced object is strictly enclosing the lifetime of the pointer. This is ensured when the pointer is a part of a newly created stack variable, because the lexical scopes ensure that this stack variable will be destroyed before the referenced object is destroyed.

The pointed memory itself can live on the heap (as it would be the case when you obtain an openarray view into a sequence). Most of the complications of the feature come from such containers that offer manual control over the referenced memory (i.e. you can call setLen on a sequence before it goes out of scope and this will invalidate all existing pointers pointing to it).

It the RFC, I'll explain how there is an isomorphism between the new rules and the current language features - every code written with the relaxed rules can already be written by extracting the continuation of a proc at a given line to a separate proc where the lent, var and openarray pointers can appear as parameters. Thus, to ensure memory safety, we need to implement the required lifetime analysis anyway.

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better syntax: byRef: foo = expr vs alias(foo, expr)

I prefer the latter. My favourite would be:

alias:
  foo = expr

I guess you've decided to use byRef because you wanted to implement byPtr too, but AFAIK that separation is unnecessary.

lib/core/macros.nim Outdated Show resolved Hide resolved
lib/core/macros.nim Outdated Show resolved Hide resolved
@timotheecour
Copy link
Member Author

@dumjyl

These seem out of place in macros. sugar would be more fitting.

done.

PTAL, all comments addressed, except for @zah's comment (#11824 (comment)); my understanding is this would be a much more involved change, and that if it materializes it can be done in a backward compatible way to support more things inside byRef

@timotheecour timotheecour force-pushed the pr_byRef branch 2 times, most recently from e8b060f to b437242 Compare July 30, 2019 18:34
@timotheecour timotheecour changed the title nim now has a ref syntax for lvalue expressions: byRef: myref=x[1].foo[2] [feature] ref syntax for lvalue expressions: byRef: myref=x[1].foo[2] Aug 2, 2019
@timotheecour
Copy link
Member Author

@dom96 this is PR is still marked as "change requested", I believe I've addressed all comments so far; in general "changes requested" option is overkill compared to just "leave a comment" option, as it tends to stay there even after the comments were addressed

@Araq
Copy link
Member

Araq commented Aug 23, 2019

IMO the only thing left is bikeshedding about the names and whether it shouldn't be a Nimble module first, before we add hardly proven "syntax sugar". I quite like it though and maybe the chances we got it wrong are slim?

@timotheecour
Copy link
Member Author

/cc @Araq ping

  • I've rebased to latest devel
  • added since + changelog
  • moved back to tsugar the test that was moved from distinctBase type trait for distinct types #13031 by @cdome (it had been moved because tsugar was too small, but now i'm adding more to tsugar so makes sense to add it back)
  • using a separate nimble module would just cause too much friction (we'd end up with sugar.byRef vs somepkg.byRef for more harm then benefit); plus it'd prevent its use in compiler sources etc)

@Araq
Copy link
Member

Araq commented Jan 12, 2020

Time moved on since then and I don't like it much anymore. Firstly the name should not contain ref, ref is already a heap pointer in Nim! Secondly, byPtr vs byRef is the wrong idea, the macro enables a safer idiom because the underlying pointer cannot escape and so unsafeAddr should be fine. Why not drop the distinction and use the name alias for it?

@timotheecour
Copy link
Member Author

it's not just about escaping: the point is to use byRef in most cases and only byPtr when needed, for the same reasons as the split addr vs unsafeAddr; not only it helps when auditing code, but it is safer too as byRef prevents you from modifying let variables, unlike byPtr, see example I have in unittest:

block byPtrfBlock:
  type Foo = object
    x: string
  proc fun(a: Foo): auto =
    doAssert not compiles (block: byRef: x=a.x) # good, byRef prevents you from modifying immutable variable `a`
    byPtr: x=a.x # byPtr is more dangerous, allows you to modify immutable `a`
    x[0]='X'
  let foo = Foo(x: "asdf")
  fun(foo)
  doAssert foo.x == "Xsdf"

also, alias is different, reserved for future introduction of symbol alias as was already discussed elsewhere.

@Araq
Copy link
Member

Araq commented Jan 13, 2020

Fair enough but then the unsafeAddr variant doesn't need macro sugar, if it comes up, write the unsafeAddr out.

@timotheecour
Copy link
Member Author

timotheecour commented Jan 13, 2020

the situation is exactly symmetrical with byRef (via addr) vs byPtr (via unsafeAddr); If you just use unsafeAddr you'll need deref on each access:

byPtr: x = foo.bar[2].baz
x+=12
fun(x)
echo x

with byPtr macro you need deref on each access:

let x = foo.bar[2].baz.unsafeAddr
x[]+=12
fun(x[])
echo x[]

@Araq
Copy link
Member

Araq commented Jan 13, 2020

Yes, I know. unsafeAddr comes up rarely though and doesn't deserve sugar. Look, the macro encodes a safe idiom of addr. That's the only reason I like it so much, it adds something on top of addr, improved safety. The variant that uses unsafeAddr does no such thing.

@timotheecour
Copy link
Member Author

timotheecour commented Jan 13, 2020

PTAL; removed byPtr from this PR to unblock this (I hear your arguments and it's a sugar vs bloat tradeoff)

@Araq
Copy link
Member

Araq commented Jan 13, 2020

Well now it needs a good name, byRef is confusing since it has nothing to do with Nim's ref keyword.

@timotheecour

This comment has been minimized.

@timotheecour
Copy link
Member Author

can whoever (anonymously) marked my PTAL comments as spam kindly let me know why they did so?

a PTAL (please take another look) comment is commonly used to indicate that all reviewer comments were addressed (or need reviewer input) and that the PR is ready again for the reviewer to take a look. It also makes it visually clear when looking at a PR whether it's pending on reviewer or PR author.

This practice is common both in tech companies as well as in open source, eg:

@Araq
Copy link
Member

Araq commented Jan 24, 2020

can whoever (anonymously) marked my PTAL comments as spam kindly let me know why they did so?

Be assured it wasn't anybody from the core team. :-)

@timotheecour
Copy link
Member Author

ping @Araq

@Araq
Copy link
Member

Araq commented Jan 30, 2020

The feature is fine now except I think it's ugly. Not your fault, but I want something like alias x = y.

@timotheecour
Copy link
Member Author

timotheecour commented Jan 31, 2020

Not your fault, but I want something like alias x = y.

but that's impossible, the colon is needed; did you see my answer here #11824 (comment) ?

byAddr x = expr is parsed as (byAddr x) = expr and changing that would require a parser change, eg a builtin pragma {.parse_as_var_definition.} which would parse byAddr x = expr instead as byAddr(x = expr)

macro byAddr*(def: untyped): untyped {.parse_as_var_definition.} = ...

But I don't see how something like that could be implemented robustly, as the parser would have to have access to semantic analysis when resolving something like foo x = expr to check whether foo is annotated with {.parse_as_var_definition.}, I don't see how that could work with symbol aliasing/renaming (from sugar import byAddr as byAddr2), or fully qualified sugar.byAddr2 etc)

And I don't think we should introduce a new builtin keyword for that feature, which is just a library thing, because other similar constructs may be needed (eg byUnsafeAddr as mentioned above) could be similarly defined (in nimble package / user code / future stdlib addition).

I think byAddr: x = expr is bearable, I got used to it easily

@Araq
Copy link
Member

Araq commented Jan 31, 2020

Well we can add a keyword to the language. Or maybe we can/should do let x {.byAddr.} = y

@timotheecour
Copy link
Member Author

Well we can add a keyword to the language. Or maybe we can/should do let x {.byAddr.} = y

let x {.byAddr.} = y

that's more ugly, longer to type, and is confusing (given that you can write x+=2 even though it's shown as a let)
given that I expect to use this feature a lot (thanks to its performance advantages over let x = y; but also all other points mentioned in top post), we need a clean syntax.

so I opted for a new keyword:

byaddr x = y

=> see #13342

will close this PR if other PR is greenlighted

@timotheecour timotheecour changed the title [feature] ref syntax for lvalue expressions: byRef: myref=x[1].foo[2] [superseded] ref syntax for lvalue expressions: byRef: myref=x[1].foo[2] Mar 23, 2020
@timotheecour
Copy link
Member Author

superseded by #13508

@timotheecour timotheecour deleted the pr_byRef branch March 23, 2020 22:54
@timotheecour timotheecour restored the pr_byRef branch March 23, 2020 22:54
@timotheecour timotheecour deleted the pr_byRef branch March 23, 2020 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants