Every value should be printable with echo #203

krux02 · 2020-03-22T13:00:40Z

In my own experience echo is used most heavily for debugging purposes. It is part of the hello world program, and then it is continuously used to explore the programming language and long the behavior of the program. It is also used to put values into error messages, for example in raise newException(KeyError, "key not found: " & $key). But some types are just not printable with $. This usually comes as a surprise, especially in generic code, as the majority of types do already have a $ overload.

The solution to the problem is to provide a generic solution of $ for every type that the Nim language allows, so that echo x and "key not found: " & $key will always compile, no matter what the value is.

After all these overloads would be need an addition:

`ref`/`ptr` types

ref types are the natuarl thing to use in Nim, when the code is inteded to be executed on a javascript vm, as this is what the backend naturally supports and doesn't need to be emulated. But the big disadvantage of using a ref type is that it isn't printable with echo. My suggestion is to provide the following implementation:

proc `$`*[T: ref | ptr](arg: T): string =
  if arg == nil:
    "nil"
  else:
    $arg[]

This implementation is able to print nil when required, but otherwise the output won't make it obvious that it prints a pointer value. Sometimes people care about the pointer values and they want to see them. But pointer values are at least 16 characters in hex and therefore very verbose in the output. To my experience people usually really don't care about the pointer value. As long as it points to something valid, it should print just that, not some hexadecimal string.

Dealing with Cycles

The current implementation of $ on objects iterates all printable fields recursively. This is problematic, as with this RFC it will run into infinite recursion if the value forms a pointer circle. An because $ is located in system.nim and no further imports are allowed, a hash map to store previously visited nodes can't be used.

However there is already a nice static analyser built into the compiler to detect if a type is even able to form cycles: canFormAcycle. Pointers to such recursive types could then be printed as just ... like non printable types are represented right now. This isn't very helpful, but still much more helpful that printing all ptr/ref types as "...".

Distinct Types

It is easy to implement $ for distinct types:

proc `$`*[T: distinct](arg: T): string =
  $distintcBase(T)(arg)

The question though remains, should distinctness simply be shaved off here? After all the programmer explicitly asked for a new incompatible type. @Araq mentioned this could be dangerous because of SQL injection. But I don't think this is really a problem. After all a distinct string remains a distinct string and it is not automatically converted to string. It is only converted in echo because it explicitly attaches a $ conversion to every argument. Maybe a distinct string should be quoted and not just printed. Or it should be altered in some other way. But that is still open to discussion. But I do think something should be printed.

Pointer

The type pointer doesn't have any content it can print. The only thing that it can print is the value of the pointer itself. Earlier I argued printing actual pointer values is verbose, but here there is nothing else that can be printed. So it should be done. After all fields that are of type pointer are rare.

Callbacks

Unfortunately I don't have a good solution for proc value. I can check if they are nil and print "nil" if they are nil. But unfortunately I cannot print anything useful in the other case. A cdecl callback can be cast to a pointer and the address can be printed. But that doesn't work for closures. gdb can print even the name of a cdecl callback, but I don't know how to do that in Nim.

I have a PR where this RFC has been implemented. It also shows how code would be simplified in the standard library and the compiler when after all a $ is available for everything.

The text was updated successfully, but these errors were encountered:

krux02 · 2020-03-22T14:45:24Z

@Araq You asked for this RFC here. Now you can complain about it here.
@timotheecour You commented on my PR already. Here you can comment on the RFC. Please try to break it. I would like to know where it falls apart.

0xACE · 2020-03-23T00:34:56Z

Regarding distinct types: I'd suggest this RFC does not implement the stringify procedure. As the manual essentially promises that a distinct type will not implicate conversions to the base type which this RFC would violate.

krux02 · 2020-03-23T10:43:02Z

@0xACE regarding "implicit conversion". This is not an implicit conversion. This is after all an explicit conversion to string with the $ operator. And the $ operator is injected were dollar oparator application is exlicitly requested.

timotheecour · 2020-03-23T11:26:40Z

recursive expansion of ptr/ref by default is a bad idea

eg, printing PContext generates a string of 2.1MB when you insert a echo c debugging statement and compiling compiler/nim.nim
See snippet here: https://gist.github.com/timotheecour/c97476b7cdf02ea48c56a1c4c480c198
Not very useful for debugging (I had to fix some stuff otherwise it wouldn't compile in your PR).
recursive expansion causes other issues as mentioned in refactoring and yak shaving. Nim#13687 (comment) (eg regression 4, regression 8)
addresses are useful to show for ref/ptr/pointer as they indicate identity of instances, so you can tell which objects are the same based on that, even across multiple calls to $. That's not possible if each address is just dereferenced, as I've argued here refactoring and yak shaving. Nim#13687 (comment)

A hash table is the only sane way to deal with ref/ptr aliasing

Static detection of cycles (as you're using in your PR) is the wrong criterion to use, and has both false positives and false negatives in terms of what gets duplicated.

Instead, the correct way is to do what a cycle-aware serializer does, for example depth first search visit of each node, marking the ones already visited to avoid duplicates, using a hash table to mark node addresses. That's precisely what I've implemented in my own pretty printer.

The criterion you're using allows duplicate printing of nodes when you have a DAG, eg: see example2 I gave in nim-lang/Nim#13687 (comment), it causes duplicate printing as follows:

(bar1: (s: "some long string that gets duplicated"), bar2: (s: "some long string that gets duplicated"), x: 12)

And on the other hand, it doesn't print nodes that should be printed (aren't duplicates), eg:

block:
  type Foo = ref object
    id: int
    next: Foo
  let a = Foo(id: 10, next: Foo(id: 11))
  echo a

prints (id: 10, next: ...) where next is not expanded even if there's no address duplicate.

proposal

it's impossible to satisfy every use case with builtin $ so we should have a builtin $ (ie, imported from system) that does the simplest possible thing with no customization possible, and a separate module that can be customized for more advanced / customizable prettyprinting.

proposal part 1: keep `system.$` simple

$ should "work" (even if it means only prints some address) on every type (that's the only part of your proposal I agree with)
only top-level ref/ptr should expand (no recursion), so that:

type Foo = ref object
  id: int
  next: Foo

let a = Foo(id: 10, next: Foo(id: 11))
doAssert $a == 0xdeadbeef->(id: 10, next: 0xdead1234)
doAssert $a[] == (id: 10, next: 0xdead1234)

proc pointers prints address using rawProc
closures prints address using rawProc + rawEnv
distinct prints using ...
all those are subject to usual overloading rules in case of $ overload in scope
note that addresses typically are length 9 digits (eg 0x1020c98d0), not 16 (leading ones being 0), except for stack addresses length 12 digits, (eg 0x7faff7d000a0)

proposal part 2: add std/prettyprint module

Since we're using a separate module, we can have more dependencies, which simplifies implementation.

type Option = ref object
  depth: int # how far to recurse
  showDistinct: bool # when true, if no `$` overload exists for a distinct type, print via undistinct
  fpFormat: string # customizes FP printing eg `"%.02f"`
  # etc... that customizes prettyprinting

pretty[T](result: var string, a: T, options: Options = nil) = ...
template pretty[T](a: T, options: Options = nil): string = pretty(result, a, options) # for convenience since so common

all overloads are outplace form and there's a single inplace form, just like in #191

floats print using ryu (probably too complex for default inclusion in system)
distinct can be shown via undistinct depending on showDistinct
we use a Hash[ByteAddress] to mark already visited addresses and avoid printing duplicates

I've already implemented that in a private project, so I know it's quite doable. It enables customizable prettyprinting, eg to terminal, using colors, or to html with dynamically expandable nodes etc.

0xACE · 2020-03-23T16:27:33Z

@krux02

@0xACE regarding "implicit conversion". This is not an implicit conversion. This is after all an explicit conversion to string with the $ operator. And the $ operator is injected were dollar oparator application is exlicitly requested.

You are describing the problem itself. Here the $ operator is making an assumption about a distinct type to which the user explicitly declared to have absolutely no relations to its basetype, which includes the parent $ operator.

The distinct type is meant to be incompatible with its base type, implementing this RFC for distinct types would be a clear violation of this promise.

I understand that implementing this would help with a "resistance free" programming approach for nim, but it does so at the cost of nims integrity. Vague implementations like these would not lead to a great design in formal systems where soundness is desired to be preserved. It is called distinct not distinct_but_inherit_$.

zah · 2020-03-23T18:55:46Z

Two thoughts to share regarding this proposal:

Perhaps it's a bit unfortunate that we are used to debugging things with echo. echo is considered to have side-effects, which makes it unusable with the func keyword. debugEcho works there, but it's a bit of a mouthful to type. If we had the habit of using another special function with a developer-friendly print out such as the one shown by @timotheecour and a shorter name such as dbg, then this debate will be much less controversial ("of course dbg can print everything, it shows you the memory contents of objects"). echo is about end-user-friendly print outs.
"Every value should be printable" is a case of what I call "default implementation of a generic proc" and unfortunately the language has some unresolved design issues with such procs. The generic sandwitch problem or a missing import makes it very easy for the compiler to pick the default implementation over a more specific and correct overload supplied by the user. Perhaps printing things to the screen is not so mission-critical task and we can tolerate the risk, but we've seen some very nasty and difficult to track bugs stemming from this problem. I've proposed a potential solution that expands the responsibilities of the method keyword as a way to introduce user-supplied type-bound operations that will work outside of the context of type hierarchies and that won't suffer from the lexical scope issues.

Araq · 2020-03-24T09:33:43Z

It is called distinct not distinct_but_inherit_$.

I think this argument is so strong that I cannot imagine a stronger argument that is in favor of "$ for distinct".

proposal part 2: add std/prettyprint module

Yes! That's the solution we should have. Zahary's dbg is probably the same idea.

mratsim · 2020-04-15T17:05:42Z

Please don't print distinct type.

Currently the best way to create a secret cryptographic keys type is to use a distinct type and prevent printing and comparing them.

Assuming you somehow import the type and but don't import the == or the $ overload, the distinct will at least prevents the code from compiling.

i.e. if you allow printing distinct we would need a secret modifier that would behave the same as today to keep providing the same guarantees for crypto libraries.

andreaferretti · 2020-04-15T17:25:09Z

It is not a great guarantee, after all. If you have access to an instance of a key, you can take its address and read the raw memory. Moreover, you are trying to protect against an attacker who is... writing the application?

mratsim · 2020-04-15T19:04:55Z

It's to protect against misuse. Most of the cryptographic problems currently are due to misuse of libraries API, and anything that help against such misuse is a win see https://www.usenix.org/legacy/publications/library/proceedings/sec02/full_papers/gutmann/gutmann_html/index.html

Araq · 2020-04-20T11:23:27Z

std/prettyprint is an excellent idea, please let's have that.
"Every value should have $" is officially rejected.

zah · 2020-04-20T17:10:23Z

pp would be a nice name for the pretty-print function. IMO, it should be considered noSideEffect like debugEcho.

github-actions · 2023-05-25T02:23:31Z

This RFC is stale because it has been open for 1095 days with no activity. Contribute a fix or comment on the issue, or it will be closed in 7 days.

metagn · 2023-06-01T08:32:49Z

Fwiw we have packages for this, like https://github.com/treeform/pretty

krux02 changed the title ~~Every value should be printable with echo~~ [WIP] Every value should be printable with echo Mar 22, 2020

krux02 changed the title ~~[WIP] Every value should be printable with echo~~ Every value should be printable with echo Mar 22, 2020

krux02 mentioned this issue Mar 22, 2020

refactoring and yak shaving. nim-lang/Nim#13687

Closed

timotheecour mentioned this issue Mar 25, 2020

distinctBase overload for values nim-lang/Nim#13746

Merged

timotheecour mentioned this issue Apr 6, 2020

float to string to float roundtrip should be possible for every float value nim-lang/Nim#7717

Closed

timotheecour mentioned this issue Apr 21, 2020

$ works for ref|ptr|pointer for all targets (c,cpp,js,vm) + other bugfixes nim-lang/Nim#14043

Closed

timotheecour mentioned this issue Dec 2, 2020

RFC: std/prettyprint timotheecour/Nim#412

Closed

timotheecour mentioned this issue Mar 11, 2021

add std/smartptrs nim-lang/Nim#17333

Closed

3 tasks

timotheecour mentioned this issue Apr 6, 2021

add compiler/debugutils.nim to help debugging compiler nim-lang/Nim#17652

Merged

timotheecour mentioned this issue Jun 11, 2021

std/prettyprints: pretty print any expression (including with cycles), customizable #385

Open

github-actions bot added the Stale label May 25, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Every value should be printable with echo #203

Every value should be printable with echo #203

krux02 commented Mar 22, 2020 •

edited

Loading

krux02 commented Mar 22, 2020

0xACE commented Mar 23, 2020

krux02 commented Mar 23, 2020 •

edited

Loading

timotheecour commented Mar 23, 2020 •

edited

Loading

0xACE commented Mar 23, 2020

zah commented Mar 23, 2020 •

edited

Loading

Araq commented Mar 24, 2020

mratsim commented Apr 15, 2020 •

edited

Loading

andreaferretti commented Apr 15, 2020

mratsim commented Apr 15, 2020

Araq commented Apr 20, 2020

zah commented Apr 20, 2020

github-actions bot commented May 25, 2023

metagn commented Jun 1, 2023

Every value should be printable with echo #203

Every value should be printable with echo #203

Comments

krux02 commented Mar 22, 2020 • edited Loading

ref/ptr types

Dealing with Cycles

Distinct Types

Pointer

Callbacks

krux02 commented Mar 22, 2020

0xACE commented Mar 23, 2020

krux02 commented Mar 23, 2020 • edited Loading

timotheecour commented Mar 23, 2020 • edited Loading

recursive expansion of ptr/ref by default is a bad idea

A hash table is the only sane way to deal with ref/ptr aliasing

proposal

proposal part 1: keep system.$ simple

proposal part 2: add std/prettyprint module

0xACE commented Mar 23, 2020

zah commented Mar 23, 2020 • edited Loading

Araq commented Mar 24, 2020

mratsim commented Apr 15, 2020 • edited Loading

andreaferretti commented Apr 15, 2020

mratsim commented Apr 15, 2020

Araq commented Apr 20, 2020

zah commented Apr 20, 2020

github-actions bot commented May 25, 2023

metagn commented Jun 1, 2023

krux02 commented Mar 22, 2020 •

edited

Loading

`ref`/`ptr` types

krux02 commented Mar 23, 2020 •

edited

Loading

timotheecour commented Mar 23, 2020 •

edited

Loading

proposal part 1: keep `system.$` simple

zah commented Mar 23, 2020 •

edited

Loading

mratsim commented Apr 15, 2020 •

edited

Loading