-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendor bleach and html5lib in pip package #1142
Conversation
91940cb
to
43a0062
Compare
Ping @nfelt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring out a way to get this unblocked!
) | ||
|
||
genrule( | ||
name = "license", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a way to keep the license list closer to the actual repo definitions? I'm concerned that we won't always remember to update this list given that it's kept in a different part of the codebase and nothing would break if we added a new dependency and didn't update this license list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get my approval when adding new dependencies. That way I can make sure we do things like this. There's also other burdens, such as ensuring /** @license ... */
comments appear in HTTP responses for works with notice licenses, which is oftentimes difficult since tooling might remove those comments, or the copyright holders might not have included them in the sources. It's a tough problem.
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & | ||
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does port
have to be a file? Could this just be port=$(perl -ne ...)
and then curl http://localhost:$port
?
Also, I wonder if it'd be worth having TensorBoard write a file like /tmp/tbport.$PID
containing the port number it's using, so that we wouldn't have to grep it out of the STDERR output for test purposes like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's possible to do with $()
syntax. We are in a tmp directory, this doesn't overflow 80 column, and is readable. We'd need to sleep() poll a /tmp/tbport.$PID
file. I prefer the POSIX FIFO since it's faster.
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & | ||
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the ".v10" do? I'm not that familiar with perl and googling it is proving fruitless; a comment would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
character concatenates strings in Perl. v10
is character literal syntax for line feed.
cd venv$1 | ||
. bin/activate | ||
pip install -qU pip | ||
pip install -qU tf-nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the old smoke test allowed testing against specific tensorflow versions, e.g. you could test against a specific TF release, not necessarily the most recent tf-nightly. It'd be nice to preserve that functionality here if we are deleting the other smoke test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that functionality back in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM provided that (as discussed) we don't remove the old smoke test until we've migrated the logic.
echo "Calling tensorboard public python APIs" | ||
echo | ||
|
||
# Check that we can now use TensorBoard's public python APIs as installed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a useful part of the smoke test that has caught at least one problem in my pre-release testing. Could we copy this into the new smoke test logic if we're going to delete this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that functionality back in a later PR. This smoke test is contrib code. I have limited familiarity, want it incorporated into actual process, but we have higher priorities ATM.
echo | ||
(cd "${VENV_TMP_DIR}" && python -c "$TEST_API_CALL") | ||
|
||
# Check tensorboard binary path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also something it'd be good to preserve in the new smoke test. This ensures that the tensorboard binary is actually coming from the freshly built pip package. Otherwise, if we fail to build that binary correctly, then the smoke test might still pass by falling back to a system-wide installed tensorboard binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File restored.
pip install -qU ../dist/*py$1*.whl >/dev/null | ||
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't piping stderr to the fifo mean that we're swallowing any errors that TensorBoard logs if it does in fact fail to start up? Or do those somehow get displayed by the perl command below reading the pipe? (I'm not that familiar with perl and it's hard to google, so a comment would be helpful here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. print STDERR
in the perl one-liner is sort of like | tee /dev/stderr
. I could write an essay explaining the depths of this subroutine. It's not unusual from the perspective of these tools. I'd prefer it stay tiny, without comment.
srcs_version = "PY2AND3", | ||
deps = [ | ||
"@org_pythonhosted_six", | ||
srcs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for the explicit file listing replacing the glob? Was there stuff we wanted to exclude in particular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to have things like html5lib testonly files schlepped into TensorBoard's pip release archives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Is there any chance those could be blacklisted (via glob excludes) so we don't need the long whitelist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like whitelists with BUILD files especially when we're redistributing third party content in this manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
) | ||
|
||
genrule( | ||
name = "license", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get my approval when adding new dependencies. That way I can make sure we do things like this. There's also other burdens, such as ensuring /** @license ... */
comments appear in HTTP responses for works with notice licenses, which is oftentimes difficult since tooling might remove those comments, or the copyright holders might not have included them in the sources. It's a tough problem.
cd venv$1 | ||
. bin/activate | ||
pip install -qU pip | ||
pip install -qU tf-nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that functionality back in a follow-up PR.
pip install -qU ../dist/*py$1*.whl >/dev/null | ||
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. print STDERR
in the perl one-liner is sort of like | tee /dev/stderr
. I could write an essay explaining the depths of this subroutine. It's not unusual from the perspective of these tools. I'd prefer it stay tiny, without comment.
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & | ||
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's possible to do with $()
syntax. We are in a tmp directory, this doesn't overflow 80 column, and is readable. We'd need to sleep() poll a /tmp/tbport.$PID
file. I prefer the POSIX FIFO since it's faster.
mkdir logs | ||
mkfifo pipe | ||
tensorboard --port=0 --logdir=logs 2>pipe & | ||
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
character concatenates strings in Perl. v10
is character literal syntax for line feed.
echo "Calling tensorboard public python APIs" | ||
echo | ||
|
||
# Check that we can now use TensorBoard's public python APIs as installed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that functionality back in a later PR. This smoke test is contrib code. I have limited familiarity, want it incorporated into actual process, but we have higher priorities ATM.
srcs_version = "PY2AND3", | ||
deps = [ | ||
"@org_pythonhosted_six", | ||
srcs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to have things like html5lib testonly files schlepped into TensorBoard's pip release archives.
TensorBoard uses bleach, which depends on html5lib, which made a big breaking change at some point in the past. Since upgrading these dependencies is not possible, we'll shade them under the `tensorboard._vendor` namespace, which is magically created by our build_pip_package script, which was rewritten, to support smoke testing and dash. The new process for building the TensorBoard pip package is: ```sh rm -rf /tmp/tensorboard bazel run //tensorboard/pip_package:build_pip_package pip install -U /tmp/tensorboard/*py2*.pip ``` Fixes tensorflow/tensorflow#16424 Fixes tensorflow#427 Fixes tensorflow#588 Closes tensorflow#607 Could help tensorflow#748
@nfelt We'll want to cherry pick this bug fix.
TensorBoard uses bleach, which depends on html5lib, which made a big breaking
change at some point in the past. Since upgrading these dependencies is not
possible, we'll shade them under the
tensorboard._vendor
namespace, which ismagically created by our build_pip_package script, which was rewritten, to
support smoke testing and dash.
The new process for building the TensorBoard pip package is:
Fixes tensorflow/tensorflow#16424
Fixes #427
Fixes #588
Closes #607
Could help #748