-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-line string literals (blocks of lines) #161
Comments
This is based on standard practices with text file formatting (removal of extra whitespace and adding LF after each line). Adding \r explicitly at the end of each line completes the CR-LF sequence for Internet protocols (not even Windows needs it in text files anymore). Any line within the block may be terminated by a backslash. This is useful for splitting otherwise overly long lines on multiple source code lines without adding LFs to the string, and on the last line to prevent the final newline. |
This makes the heredoc situation in Nim too complicated IMO. |
The """ hack of Python is problematic precisely because it mixes source code formatting with string contents. Having PyDocs or .unindent "handle" this is far from satisfactory. import strutils
# Correct output but messed up source code formatting
for i in 1..2:
stdout.write("""<li>
Item
</li>
""")
# Incorrect output (Item not indented)
for i in 1..2:
stdout.write("""
<li>
Item
</li>
""".unindent)
# Proposed string literal: clean source code that matches output
for i in 1..2:
stdout.write(
":
<li>
Item
</li>
) Fixing this in Python would be quite problematic at this time, but Nim as a new language based on indented blocks definitely /should/ get it right. |
@awr1 Escape sequences and whitespace handling are mentioned for completeness. This proposal requires less of them than the current string literals do. Reading from external files is not really a solution. The need for longer string literals (beyond docstrings) is clear and that's why """ literals exist in the first place; their implementation just sucks. |
Then IMO the behavior of |
A new function could be probably added to |
Generally I like the idea. I never really liked triple string literals as they are messy. Yet I don't like to change the language for this minor annoyance if the workarounds that don't need a language change haven't been fully explored. Scala's solution to this problem is val speech = """Four score and
|seven years ago""".stripMargin Another big problem is, I have no idea how to tell my editor (and github and all the other editors out there) that |
I made a quick proof of concept with minimal changes to lexer. Needs some further work even if accepted to language (like separate lexer token type for this literal). |
IMHO the syntax should be: const foo = '''
string literal here that
needs no closing quotes but it's far too late for this. Yet another way to write string literals is the last thing we need. We would need to patch |
@krux02 Most editors seem to ship Nim mode already, and would probably update their handling promptly if the language was changed. Meanwhile, this certainly is a problem because many editors and Github syntax highlighter consider anything that follows to be a string, until the next " appears somewhere else, although even with the current language syntax (with any language out there, really) they should terminate single-quoted string processing at the first newline. Indentation is not so much a problem; one extra tab press at most, because standard auto-indent behaves well with this literal. Library solutions cannot work properly because once the string is formed, information about source code indentation is no longer available. Adding another special character to denote margin isn't really helpful. Also, such solutions cannot avoid the need to escape quote marks within the literal, like the string block does. |
In any case, fixing this sort of issue is much better to do at Nim 0.21, a language used by a handful of projects, rather than after 1.0. Using ": as the token also does not affect existing software (although I would like to see """ deprecated and eventually removed entirely -- far prior to 1.0 release). First I considered """: or similar, but that would break existing software. Also, ":, if put on its own line, provides visual cue to where the left margin of string content goes (given that a string block must be indented exactly two spaces, which is already the recommended indentation for Nim). |
Can this issue be moved to RFCs? |
Regarding the symbol used to start it, ": directly communicates that it is string and a block but has the disadvantage of being mishandled by existing tools. Something that is not considered to be a start of string would be less invasive, e.g. $: would probably communicate the same thing in Nim context but the content would be seen as code in syntax highlighters, and the colon might trigger smart indentation in some tools (in particular, those based on Python rules). I am definitely open to this sort of suggestion, although I believe that in the long run the support of current tools should not really be a consideration. The benefit of ": is that it instantly triggers any coder to notice that something unconventional is happening, while with $: that might not be as apparent, and the content being a string would be not at all apparent to non-Nim coders. |
YAML already has a very well known and documented contruct for this, why not just use that. I think is awesome that you can use literal JSON on Nim code directly, YAML syntax can be very friendly as start of a block because it uses let variable0 = :>
YAML like literals.
let variable1 = :|
YAML like literals. |
|
then ❔ I agree that I dont feel a huge need for this. 🤷♀️ |
FWIW, a comment at the end avoids problems with current highlighters without changing anything else (a simple hack - not part of RFC): await client.send ":
HTTP/1.1 200 OK\r
content-type: text/plain\r
content-length: 13\r
\r
Hello World!
#" |
🤔 |
@Tronic which highlighters? github doesnt highlight correctly anyway. which is correct representation of current syntax, since |
@SolitudeSF I use this in VSCode. Obviously tools need to be fixed, and that really shouldn't be a big issue. After all, they already manage to handle the mix of different quotation formats & comment parsing, incl. Nim-specific syntax and escape sequences. |
For stuff like this I just use |
i dont see how this can be trivially fixed, since most editors use regex based highlighting which cant have indentation awareness. |
Too bad you can not do the |
@Tronic If editor support can't be provided, I can only reject this feature. What value does it have when virtually no editor will support it, or if it will take years until the editors will have a solution for it? Also I am the one who maintains the emacs integration at this point, it is not like that emacs will magically grow support for this feature. |
I'll admit I was wrong about |
@SolitudeSF Regex cannot match indent?
matches this string block. Use backward lookup or editor's custom handling of captures, if necessary. Every serious editor implements some sort of recursive matching in addition to basic regex to be able to do parenthesis matching, to handle HTML closing tags etc. If nimpretty is a concern, I am sure I can quickly patch that as well. |
If this can be properly highlighted with a |
This sort of approach seems to work (tried in VSCode):
I'll have a proper look later. |
@Tronic AFAICT your regex cannot deal with arbitrary indentation. In Nim indenting with all numbers of spaces is allowed. If there exists a regex that can work with arbitrary indentation then I will support this feature. |
VSCode highlighter updated to support @Clyybber Surrounding code may be indented by arbitrary number of spaces. String block contents must be indented by exactly two spaces, compared to the leading line, as discussed in this thread. This is to allow indentation to appear within string content, so any indentation on top of those two spaces are included in the string. The highlighter marks string content and block indent with separate classes, so that in principle one could style and make the two-space margin visible by CSS effects (not that I recommend doing so). |
@Tronic I don't think we should enforce those to be indented by exactly two spaces. |
@Tronic pros of heredoc(as used in D, see https://forum.nim-lang.org/t/471#23415):
let s = q"EOS
This is a multi-line
heredoc string; no need to re-indentEOS"
echo s produces: |
If the argument is "you can always come up with a delimiter that isn't used" then Nim's triple quotes work just as well: const
s = """
foobar
UNUSED_DELIM
baz
""".replace("UNUSED_DELIM", "\"\"\"")
Requires no language change and is easier to implement for highlighters as it doesn't involve a regex with backtracking (which is NP complete iirc?) |
This is how string literals work in const char * vogon_poem = R"V0G0N(
O freddled gruntbuggly thy micturations are to me
As plured gabbleblochits on a lurgid bee.
Groop, I implore thee my foonting turlingdromes.
And hooptiously drangle me with crinkly bindlewurdles,
Or I will rend thee in the gobberwarts with my blurlecruncheon, see if I don't.
(by Prostetnic Vogon Jeltz; see p. 56/57)
)V0G0N"; Not only does it allow to specify arbitrary delimiters that won't clash with the content, it would also allow to write editor extensions that detect such string blocks for syntax highlighting. Then you can can have SQL strings, python strings, etc all with correct syntax highlighting. Currently Nim has call string literals, for example |
yes, that's C++'s version of D's heredoc string I mentioned above in #161 (comment) . Ability to copy paste code without messing with |
@krux02 Theoretically it can overcome the delimiter appearing in content problem. In practice everyone just uses it as another form of """ and complains that Indented-block literals make a clear separation between source code formatting (indent of the block) and string content (any characters within the block). This way clean source code formatting can be preserved without introducing extra whitespace into the string. For me it is actually really hard to understand how in 2010's people still design formats with issues that were widely understood and fixed in 1990's if not decades earlier. I presume that the argument has always been that "we cannot fix this because of compatibility" and that "it would take years". As I have demonstrated in this thread, fixing it both in the Nim compiler and in popular text editors took only few hours of work, and frankly I've already spent far more than that here, arguing for it. |
Well we need to check that. I'm not convinced that popular text editors can be "fixed". |
Implements highlight for string block literals as discussed in nim-lang/RFCs#161
With the introduction of import strutils
proc foo =
let str = dedent """
Hello
World!
"""
stdout.write str
foo() will print
|
So, I have to import |
@AmjadHD I would suggest writing another proposal for that. I'm even optimistic that it would be accepted, since it's unlikely to cause too much breakage. |
I would suggest -- instead of, or in addition to Python-style """ literals -- using indented block syntax for multi-line string literals. E.g.
Where str is defined equivalent to
This syntax avoids the indentation problem with string literals that .unindent attempts to address. Also, for clarity, all string content appears within the block, not on the opening or closing lines as is with """.
The literal terminates as soon as the block ends (i.e. a non-empty line indented less is found), avoiding the need for """ at the end. This also avoids the need to escape double quotes that belong to the string.
Whitespace at the end of any line and empty lines at the end would be omitted (and could be added via escape sequences in the rare cases where needed). Whitespace-only lines in the middle would become simply \l (no matter if there are spaces or not). This removes any ambiguity with source code formatting and makes the intention explicit.
This suggestion proposes string block to be indented by exactly two spaces (compared to the line with ": in it). Any further initial spaces would become string content.
This could still be used within parenthesis or other expression, provided that the continuation of that expression appears less indented than the string content.
The text was updated successfully, but these errors were encountered: