Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax: strict string escapes #265

Merged
merged 1 commit into from
Mar 26, 2020
Merged

syntax: strict string escapes #265

merged 1 commit into from
Mar 26, 2020

Conversation

alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Mar 25, 2020

This change causes Starlark, like Go, to reject backslashes that
are not part of an escape sequence. Previously they were treated
literally, so \ ( would encode a two-character string, and much
code relied on this, especially for regular expressions.

This may break some programs, but the fix is simple:
double each errant backslashes.

Python does not yet enforce this behavior, but since 3.6
has emitted a deprecation warning for it.

Also, document string escapes.

This is Google issue b/34519173.

Change-Id: I5c9609a4e28d58593e9d6918757bca2cfd838d51

@alandonovan alandonovan requested a review from jayconrod March 25, 2020 21:01
@alandonovan
Copy link
Contributor Author

@laurentlb

Copy link
Collaborator

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation and spec change look good. Makes sense if Python's doing the same thing.

Please link other issues related to this change in the commit message. I think they are bazelbuild/bazel#8380, bazelbuild/starlark#38, and maybe bazelbuild/buildtools#688.

doc/spec.md Outdated
\n \x0A line feed
\r \x0D carriage return
\t \x09 horizontal tab
\v \X0B vertical tab
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\x0B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc/spec.md Outdated
"a\
b" # "ab"
r"a\
b" # "a\\nb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a\\\nb" I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc/spec.md Outdated

```python
"a\nb" # "a\nb" = 'a' + '\n' + 'b'
r"a\nb" # "a\\nb" = 'a' + '\' + '\n' + 'b'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'\\'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@alandonovan alandonovan force-pushed the backslash branch 2 times, most recently from a5c7a21 to 67db445 Compare March 26, 2020 13:49
Copy link
Contributor Author

@alandonovan alandonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jay.

doc/spec.md Outdated

```python
"a\nb" # "a\nb" = 'a' + '\n' + 'b'
r"a\nb" # "a\\nb" = 'a' + '\' + '\n' + 'b'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc/spec.md Outdated
\n \x0A line feed
\r \x0D carriage return
\t \x09 horizontal tab
\v \X0B vertical tab
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc/spec.md Outdated
"a\
b" # "ab"
r"a\
b" # "a\\nb"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This change causes Starlark, like Go, to reject backslashes that
are not part of an escape sequence. Previously they were treated
literally, so \ ( would encode a two-character string.

Many programs rely on this, especially for regular expressions and
shell commands, and will be broken by this change, but the fix is simple:
double each errant backslash.

Python does not yet enforce this behavior, but since 3.6
has emitted a deprecation warning for it.

Also, document string escapes.

Related issues:
- Google issue b/34519173: "bazel: Forbid undefined escape sequences in strings"
- bazelbuild/starlark#38: Starlark spec: String escapes
- bazelbuild/buildtools#688: Bazel: Fix string escapes
- bazelbuild/bazel#8380: Bazel incompatible_restrict_string_escapes: Restrict string escapes

Change-Id: I5c9609a4e28d58593e9d6918757bca2cfd838d51
@alandonovan alandonovan merged commit 16e44b1 into master Mar 26, 2020
@alandonovan alandonovan deleted the backslash branch March 26, 2020 14:23
laurentlb added a commit to laurentlb/starlark that referenced this pull request Sep 24, 2020
Copied from google/starlark-go#265
I omitted the implementation notes.
laurentlb added a commit to bazelbuild/starlark that referenced this pull request Sep 25, 2020
Spec: document string literals

Copied from google/starlark-go#265
I omitted the implementation notes and removed the note about the hex escape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants