Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every value should be printable with echo #203

Closed
krux02 opened this issue Mar 22, 2020 · 14 comments
Closed

Every value should be printable with echo #203

krux02 opened this issue Mar 22, 2020 · 14 comments
Labels

Comments

@krux02
Copy link
Contributor

krux02 commented Mar 22, 2020

In my own experience echo is used most heavily for debugging purposes. It is part of the hello world program, and then it is continuously used to explore the programming language and long the behavior of the program. It is also used to put values into error messages, for example in raise newException(KeyError, "key not found: " & $key). But some types are just not printable with $. This usually comes as a surprise, especially in generic code, as the majority of types do already have a $ overload.

The solution to the problem is to provide a generic solution of $ for every type that the Nim language allows, so that echo x and "key not found: " & $key will always compile, no matter what the value is.

After all these overloads would be need an addition:

ref/ptr types

ref types are the natuarl thing to use in Nim, when the code is inteded to be executed on a javascript vm, as this is what the backend naturally supports and doesn't need to be emulated. But the big disadvantage of using a ref type is that it isn't printable with echo. My suggestion is to provide the following implementation:

proc `$`*[T: ref | ptr](arg: T): string =
  if arg == nil:
    "nil"
  else:
    $arg[]

This implementation is able to print nil when required, but otherwise the output won't make it obvious that it prints a pointer value. Sometimes people care about the pointer values and they want to see them. But pointer values are at least 16 characters in hex and therefore very verbose in the output. To my experience people usually really don't care about the pointer value. As long as it points to something valid, it should print just that, not some hexadecimal string.

Dealing with Cycles

The current implementation of $ on objects iterates all printable fields recursively. This is problematic, as with this RFC it will run into infinite recursion if the value forms a pointer circle. An because $ is located in system.nim and no further imports are allowed, a hash map to store previously visited nodes can't be used.

However there is already a nice static analyser built into the compiler to detect if a type is even able to form cycles: canFormAcycle. Pointers to such recursive types could then be printed as just ... like non printable types are represented right now. This isn't very helpful, but still much more helpful that printing all ptr/ref types as "...".

Distinct Types

It is easy to implement $ for distinct types:

proc `$`*[T: distinct](arg: T): string =
  $distintcBase(T)(arg)

The question though remains, should distinctness simply be shaved off here? After all the programmer explicitly asked for a new incompatible type. @Araq mentioned this could be dangerous because of SQL injection. But I don't think this is really a problem. After all a distinct string remains a distinct string and it is not automatically converted to string. It is only converted in echo because it explicitly attaches a $ conversion to every argument. Maybe a distinct string should be quoted and not just printed. Or it should be altered in some other way. But that is still open to discussion. But I do think something should be printed.

Pointer

The type pointer doesn't have any content it can print. The only thing that it can print is the value of the pointer itself. Earlier I argued printing actual pointer values is verbose, but here there is nothing else that can be printed. So it should be done. After all fields that are of type pointer are rare.

Callbacks

Unfortunately I don't have a good solution for proc value. I can check if they are nil and print "nil" if they are nil. But unfortunately I cannot print anything useful in the other case. A cdecl callback can be cast to a pointer and the address can be printed. But that doesn't work for closures. gdb can print even the name of a cdecl callback, but I don't know how to do that in Nim.

I have a PR where this RFC has been implemented. It also shows how code would be simplified in the standard library and the compiler when after all a $ is available for everything.

@krux02 krux02 changed the title Every value should be printable with echo [WIP] Every value should be printable with echo Mar 22, 2020
@krux02 krux02 changed the title [WIP] Every value should be printable with echo Every value should be printable with echo Mar 22, 2020
@krux02
Copy link
Contributor Author

krux02 commented Mar 22, 2020

@Araq You asked for this RFC here. Now you can complain about it here.
@timotheecour You commented on my PR already. Here you can comment on the RFC. Please try to break it. I would like to know where it falls apart.

@0xACE
Copy link

0xACE commented Mar 23, 2020

Regarding distinct types: I'd suggest this RFC does not implement the stringify procedure. As the manual essentially promises that a distinct type will not implicate conversions to the base type which this RFC would violate.

@krux02
Copy link
Contributor Author

krux02 commented Mar 23, 2020

@0xACE regarding "implicit conversion". This is not an implicit conversion. This is after all an explicit conversion to string with the $ operator. And the $ operator is injected were dollar oparator application is exlicitly requested.

@timotheecour
Copy link
Member

timotheecour commented Mar 23, 2020

recursive expansion of ptr/ref by default is a bad idea

A hash table is the only sane way to deal with ref/ptr aliasing

Static detection of cycles (as you're using in your PR) is the wrong criterion to use, and has both false positives and false negatives in terms of what gets duplicated.

Instead, the correct way is to do what a cycle-aware serializer does, for example depth first search visit of each node, marking the ones already visited to avoid duplicates, using a hash table to mark node addresses. That's precisely what I've implemented in my own pretty printer.

The criterion you're using allows duplicate printing of nodes when you have a DAG, eg: see example2 I gave in nim-lang/Nim#13687 (comment), it causes duplicate printing as follows:

(bar1: (s: "some long string that gets duplicated"), bar2: (s: "some long string that gets duplicated"), x: 12)

And on the other hand, it doesn't print nodes that should be printed (aren't duplicates), eg:

block:
  type Foo = ref object
    id: int
    next: Foo
  let a = Foo(id: 10, next: Foo(id: 11))
  echo a

prints (id: 10, next: ...) where next is not expanded even if there's no address duplicate.

proposal

it's impossible to satisfy every use case with builtin $ so we should have a builtin $ (ie, imported from system) that does the simplest possible thing with no customization possible, and a separate module that can be customized for more advanced / customizable prettyprinting.

proposal part 1: keep system.$ simple

  • $ should "work" (even if it means only prints some address) on every type (that's the only part of your proposal I agree with)
  • only top-level ref/ptr should expand (no recursion), so that:
type Foo = ref object
  id: int
  next: Foo

let a = Foo(id: 10, next: Foo(id: 11))
doAssert $a == 0xdeadbeef->(id: 10, next: 0xdead1234)
doAssert $a[] == (id: 10, next: 0xdead1234)
  • proc pointers prints address using rawProc
  • closures prints address using rawProc + rawEnv
  • distinct prints using ...
  • all those are subject to usual overloading rules in case of $ overload in scope
    note that addresses typically are length 9 digits (eg 0x1020c98d0), not 16 (leading ones being 0), except for stack addresses length 12 digits, (eg 0x7faff7d000a0)

proposal part 2: add std/prettyprint module

Since we're using a separate module, we can have more dependencies, which simplifies implementation.

type Option = ref object
  depth: int # how far to recurse
  showDistinct: bool # when true, if no `$` overload exists for a distinct type, print via undistinct
  fpFormat: string # customizes FP printing eg `"%.02f"`
  # etc... that customizes prettyprinting

pretty[T](result: var string, a: T, options: Options = nil) = ...
template pretty[T](a: T, options: Options = nil): string = pretty(result, a, options) # for convenience since so common

all overloads are outplace form and there's a single inplace form, just like in #191

  • floats print using ryu (probably too complex for default inclusion in system)
  • distinct can be shown via undistinct depending on showDistinct
  • we use a Hash[ByteAddress] to mark already visited addresses and avoid printing duplicates

I've already implemented that in a private project, so I know it's quite doable. It enables customizable prettyprinting, eg to terminal, using colors, or to html with dynamically expandable nodes etc.

image

image

@0xACE
Copy link

0xACE commented Mar 23, 2020

@krux02

@0xACE regarding "implicit conversion". This is not an implicit conversion. This is after all an explicit conversion to string with the $ operator. And the $ operator is injected were dollar oparator application is exlicitly requested.

You are describing the problem itself. Here the $ operator is making an assumption about a distinct type to which the user explicitly declared to have absolutely no relations to its basetype, which includes the parent $ operator.

The distinct type is meant to be incompatible with its base type, implementing this RFC for distinct types would be a clear violation of this promise.

I understand that implementing this would help with a "resistance free" programming approach for nim, but it does so at the cost of nims integrity. Vague implementations like these would not lead to a great design in formal systems where soundness is desired to be preserved. It is called distinct not distinct_but_inherit_$.

@zah
Copy link
Member

zah commented Mar 23, 2020

Two thoughts to share regarding this proposal:

  1. Perhaps it's a bit unfortunate that we are used to debugging things with echo. echo is considered to have side-effects, which makes it unusable with the func keyword. debugEcho works there, but it's a bit of a mouthful to type. If we had the habit of using another special function with a developer-friendly print out such as the one shown by @timotheecour and a shorter name such as dbg, then this debate will be much less controversial ("of course dbg can print everything, it shows you the memory contents of objects"). echo is about end-user-friendly print outs.

  2. "Every value should be printable" is a case of what I call "default implementation of a generic proc" and unfortunately the language has some unresolved design issues with such procs. The generic sandwitch problem or a missing import makes it very easy for the compiler to pick the default implementation over a more specific and correct overload supplied by the user. Perhaps printing things to the screen is not so mission-critical task and we can tolerate the risk, but we've seen some very nasty and difficult to track bugs stemming from this problem. I've proposed a potential solution that expands the responsibilities of the method keyword as a way to introduce user-supplied type-bound operations that will work outside of the context of type hierarchies and that won't suffer from the lexical scope issues.

@Araq
Copy link
Member

Araq commented Mar 24, 2020

It is called distinct not distinct_but_inherit_$.

I think this argument is so strong that I cannot imagine a stronger argument that is in favor of "$ for distinct".

proposal part 2: add std/prettyprint module

Yes! That's the solution we should have. Zahary's dbg is probably the same idea.

@mratsim
Copy link
Collaborator

mratsim commented Apr 15, 2020

Please don't print distinct type.

Currently the best way to create a secret cryptographic keys type is to use a distinct type and prevent printing and comparing them.

Assuming you somehow import the type and but don't import the == or the $ overload, the distinct will at least prevents the code from compiling.

i.e. if you allow printing distinct we would need a secret modifier that would behave the same as today to keep providing the same guarantees for crypto libraries.

@andreaferretti
Copy link

It is not a great guarantee, after all. If you have access to an instance of a key, you can take its address and read the raw memory. Moreover, you are trying to protect against an attacker who is... writing the application?

@mratsim
Copy link
Collaborator

mratsim commented Apr 15, 2020

It's to protect against misuse. Most of the cryptographic problems currently are due to misuse of libraries API, and anything that help against such misuse is a win see https://www.usenix.org/legacy/publications/library/proceedings/sec02/full_papers/gutmann/gutmann_html/index.html

@Araq
Copy link
Member

Araq commented Apr 20, 2020

  1. std/prettyprint is an excellent idea, please let's have that.
  2. "Every value should have $" is officially rejected.

@zah
Copy link
Member

zah commented Apr 20, 2020

pp would be a nice name for the pretty-print function. IMO, it should be considered noSideEffect like debugEcho.

@github-actions
Copy link

This RFC is stale because it has been open for 1095 days with no activity. Contribute a fix or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2023
@metagn
Copy link
Contributor

metagn commented Jun 1, 2023

Fwiw we have packages for this, like https://github.com/treeform/pretty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants