Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix more Unicode bugs #24182

Closed
wants to merge 6 commits into from
Closed

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 4, 2024

  • Use Latin-1 in many native file write rules for consistency with the internal encoding.
  • Use Latin-1 for the resolved repository file and the JSON profile.
  • Fix unused_input_list handling of non-ASCII characters in file names.
  • Flip the legacy_utf8 parameter of repository_ctx.file to False and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice.
  • Change repository_ctx.template to operate on raw bytes for consistency with repository_ctx.read and to fix substitution with non-ASCII keys/values.
  • Move some usages of UTF_8 closer to their usage site to clarify why they are correct.
  • Fixes parsing of dependency files with Unicode character contents (/showIncludes and .d files)

@fmeum fmeum force-pushed the 23859-unicode-starlark branch 5 times, most recently from d539093 to b378d1f Compare November 5, 2024 14:29
@fmeum fmeum requested a review from tjgq November 5, 2024 15:55
@fmeum fmeum marked this pull request as ready for review November 5, 2024 15:55
@fmeum fmeum requested review from a team, lberki, Wyverald and meteorcloudy as code owners November 5, 2024 15:55
@fmeum fmeum requested review from aranguyen and removed request for a team November 5, 2024 15:55
@fmeum fmeum force-pushed the 23859-unicode-starlark branch from b378d1f to db728d4 Compare November 5, 2024 15:56
@github-actions github-actions bot added team-Performance Issues for Performance teams team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Rules-Java Issues for Java rules team-Rules-CPP Issues for C++ rules team-Local-Exec Issues and PRs for the Execution (Local) team team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Nov 5, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 5, 2024

@tjgq This is stacked on #24010, but the last commit is new and ready for review.

@tjgq
Copy link
Contributor

tjgq commented Nov 7, 2024

@fmeum Mind rebasing before I review?

@fmeum fmeum force-pushed the 23859-unicode-starlark branch from c2226bd to 8670006 Compare November 14, 2024 18:08
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 14, 2024

@tjgq Rebased onto master.

@tjgq tjgq added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 15, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 15, 2024

@bazel-io fork 8.0.0

@iancha1992
Copy link
Member

@fmeum can you please resolve the conflicts? Thanks!

@fmeum fmeum force-pushed the 23859-unicode-starlark branch from 8670006 to 664f29c Compare November 19, 2024 18:35
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 19, 2024

@iancha1992 Rebased

@fmeum fmeum requested a review from tjgq November 19, 2024 18:42
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 19, 2024
@fmeum fmeum deleted the 23859-unicode-starlark branch November 19, 2024 23:17
fmeum added a commit to fmeum/bazel that referenced this pull request Nov 19, 2024
* Use Latin-1 in many native file write rules for consistency with the internal encoding.
* Use Latin-1 for the resolved repository file and the JSON profile.
* Fix `unused_input_list` handling of non-ASCII characters in file names.
* Flip the `legacy_utf8` parameter of `repository_ctx.file` to `False` and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice.
* Change `repository_ctx.template` to operate on raw bytes for consistency with `repository_ctx.read` and to fix substitution with non-ASCII keys/values.
* Move some usages of `UTF_8` closer to their usage site to clarify why they are correct.
* Fixes parsing of dependency files with Unicode character contents (`/showIncludes` and `.d` files)

Closes bazelbuild#24182.

PiperOrigin-RevId: 698111811
Change-Id: Ie43bab9eb5963bf81690dd8985d358f544a711c9
(cherry picked from commit 3fdec93)
github-merge-queue bot pushed a commit that referenced this pull request Nov 20, 2024
* Use Latin-1 in many native file write rules for consistency with the
internal encoding.
* Use Latin-1 for the resolved repository file and the JSON profile.
* Fix `unused_input_list` handling of non-ASCII characters in file
names.
* Flip the `legacy_utf8` parameter of `repository_ctx.file` to `False`
and make it a no-op. With the previous default, any non-ASCII characters
would be written out as double encoded UTF-8, which is not a useful
choice.
* Change `repository_ctx.template` to operate on raw bytes for
consistency with `repository_ctx.read` and to fix substitution with
non-ASCII keys/values.
* Move some usages of `UTF_8` closer to their usage site to clarify why
they are correct.
* Fixes parsing of dependency files with Unicode character contents
(`/showIncludes` and `.d` files)

Closes #24182.

PiperOrigin-RevId: 698111811
Change-Id: Ie43bab9eb5963bf81690dd8985d358f544a711c9 (cherry picked from
commit 3fdec93)

Fixes #24242
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 8.0.0 RC3. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.0.0rc3. Thanks!

@meteorcloudy
Copy link
Member

Looks like this is again rolled-back in e27ab91

@tjgq @katre Can you provide some details on the issue?

@katre
Copy link
Member

katre commented Nov 26, 2024

@tjgq has a better understanding of the cause than I do (I just noticed the failures and initiated the rollback).

@tjgq
Copy link
Contributor

tjgq commented Nov 26, 2024

See b/381060195 for some details. I'm fairly confident that the root cause was in Google code, not Bazel code, and I expect the fix to be very localized. So, absent evidence that it breaks anything externally, I'd recommend sticking with the cherry-pick for now.

ramil-bitrise pushed a commit to bitrise-io/bazel that referenced this pull request Dec 18, 2024
* Use Latin-1 in many native file write rules for consistency with the internal encoding.
* Use Latin-1 for the resolved repository file and the JSON profile.
* Fix `unused_input_list` handling of non-ASCII characters in file names.
* Flip the `legacy_utf8` parameter of `repository_ctx.file` to `False` and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice.
* Change `repository_ctx.template` to operate on raw bytes for consistency with `repository_ctx.read` and to fix substitution with non-ASCII keys/values.
* Move some usages of `UTF_8` closer to their usage site to clarify why they are correct.
* Fixes parsing of dependency files with Unicode character contents (`/showIncludes` and `.d` files)

Closes bazelbuild#24182.

PiperOrigin-RevId: 698111811
Change-Id: Ie43bab9eb5963bf81690dd8985d358f544a711c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Local-Exec Issues and PRs for the Execution (Local) team team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-CPP Issues for C++ rules team-Rules-Java Issues for Java rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants