Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor bleach and html5lib in pip package #1142

Merged
merged 1 commit into from
Apr 26, 2018
Merged

Conversation

jart
Copy link
Contributor

@jart jart commented Apr 19, 2018

@nfelt We'll want to cherry pick this bug fix.

TensorBoard uses bleach, which depends on html5lib, which made a big breaking
change at some point in the past. Since upgrading these dependencies is not
possible, we'll shade them under the tensorboard._vendor namespace, which is
magically created by our build_pip_package script, which was rewritten, to
support smoke testing and dash.

The new process for building the TensorBoard pip package is:

rm -rf /tmp/tensorboard
bazel run //tensorboard/pip_package:build_pip_package
pip install -U /tmp/tensorboard/*py2*.pip

Fixes tensorflow/tensorflow#16424
Fixes #427
Fixes #588
Closes #607
Could help #748

@jart jart requested a review from nfelt April 19, 2018 03:09
@jart jart force-pushed the vendoring branch 2 times, most recently from 91940cb to 43a0062 Compare April 19, 2018 05:05
@jart
Copy link
Contributor Author

jart commented Apr 26, 2018

Ping @nfelt

Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring out a way to get this unblocked!

)

genrule(
name = "license",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way to keep the license list closer to the actual repo definitions? I'm concerned that we won't always remember to update this list given that it's kept in a different part of the codebase and nothing would break if we added a new dependency and didn't update this license list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get my approval when adding new dependencies. That way I can make sure we do things like this. There's also other burdens, such as ensuring /** @license ... */ comments appear in HTTP responses for works with notice licenses, which is oftentimes difficult since tooling might remove those comments, or the copyright holders might not have included them in the sources. It's a tough problem.

mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does port have to be a file? Could this just be port=$(perl -ne ...) and then curl http://localhost:$port?

Also, I wonder if it'd be worth having TensorBoard write a file like /tmp/tbport.$PID containing the port number it's using, so that we wouldn't have to grep it out of the STDERR output for test purposes like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's possible to do with $() syntax. We are in a tmp directory, this doesn't overflow 80 column, and is readable. We'd need to sleep() poll a /tmp/tbport.$PID file. I prefer the POSIX FIFO since it's faster.

mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the ".v10" do? I'm not that familiar with perl and googling it is proving fruitless; a comment would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The . character concatenates strings in Perl. v10 is character literal syntax for line feed.

cd venv$1
. bin/activate
pip install -qU pip
pip install -qU tf-nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the old smoke test allowed testing against specific tensorflow versions, e.g. you could test against a specific TF release, not necessarily the most recent tf-nightly. It'd be nice to preserve that functionality here if we are deleting the other smoke test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that functionality back in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM provided that (as discussed) we don't remove the old smoke test until we've migrated the logic.

echo "Calling tensorboard public python APIs"
echo

# Check that we can now use TensorBoard's public python APIs as installed with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful part of the smoke test that has caught at least one problem in my pre-release testing. Could we copy this into the new smoke test logic if we're going to delete this script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that functionality back in a later PR. This smoke test is contrib code. I have limited familiarity, want it incorporated into actual process, but we have higher priorities ATM.

echo
(cd "${VENV_TMP_DIR}" && python -c "$TEST_API_CALL")

# Check tensorboard binary path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also something it'd be good to preserve in the new smoke test. This ensures that the tensorboard binary is actually coming from the freshly built pip package. Otherwise, if we fail to build that binary correctly, then the smoke test might still pass by falling back to a system-wide installed tensorboard binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File restored.

pip install -qU ../dist/*py$1*.whl >/dev/null
mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't piping stderr to the fifo mean that we're swallowing any errors that TensorBoard logs if it does in fact fail to start up? Or do those somehow get displayed by the perl command below reading the pipe? (I'm not that familiar with perl and it's hard to google, so a comment would be helpful here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. print STDERR in the perl one-liner is sort of like | tee /dev/stderr. I could write an essay explaining the depths of this subroutine. It's not unusual from the perspective of these tools. I'd prefer it stay tiny, without comment.

srcs_version = "PY2AND3",
deps = [
"@org_pythonhosted_six",
srcs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for the explicit file listing replacing the glob? Was there stuff we wanted to exclude in particular?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to have things like html5lib testonly files schlepped into TensorBoard's pip release archives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Is there any chance those could be blacklisted (via glob excludes) so we don't need the long whitelist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like whitelists with BUILD files especially when we're redistributing third party content in this manner.

Copy link
Contributor Author

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

)

genrule(
name = "license",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get my approval when adding new dependencies. That way I can make sure we do things like this. There's also other burdens, such as ensuring /** @license ... */ comments appear in HTTP responses for works with notice licenses, which is oftentimes difficult since tooling might remove those comments, or the copyright holders might not have included them in the sources. It's a tough problem.

cd venv$1
. bin/activate
pip install -qU pip
pip install -qU tf-nightly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that functionality back in a follow-up PR.

pip install -qU ../dist/*py$1*.whl >/dev/null
mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. print STDERR in the perl one-liner is sort of like | tee /dev/stderr. I could write an essay explaining the depths of this subroutine. It's not unusual from the perspective of these tools. I'd prefer it stay tiny, without comment.

mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's possible to do with $() syntax. We are in a tmp directory, this doesn't overflow 80 column, and is readable. We'd need to sleep() poll a /tmp/tbport.$PID file. I prefer the POSIX FIFO since it's faster.

mkdir logs
mkfifo pipe
tensorboard --port=0 --logdir=logs 2>pipe &
perl -ne 'print STDERR;/http:.*:(\d+)/ and print $1.v10 and exit 0' <pipe >port
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The . character concatenates strings in Perl. v10 is character literal syntax for line feed.

echo "Calling tensorboard public python APIs"
echo

# Check that we can now use TensorBoard's public python APIs as installed with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that functionality back in a later PR. This smoke test is contrib code. I have limited familiarity, want it incorporated into actual process, but we have higher priorities ATM.

srcs_version = "PY2AND3",
deps = [
"@org_pythonhosted_six",
srcs = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to have things like html5lib testonly files schlepped into TensorBoard's pip release archives.

TensorBoard uses bleach, which depends on html5lib, which made a big breaking
change at some point in the past. Since upgrading these dependencies is not
possible, we'll shade them under the `tensorboard._vendor` namespace, which is
magically created by our build_pip_package script, which was rewritten, to
support smoke testing and dash.

The new process for building the TensorBoard pip package is:

```sh
rm -rf /tmp/tensorboard
bazel run //tensorboard/pip_package:build_pip_package
pip install -U /tmp/tensorboard/*py2*.pip
```

Fixes tensorflow/tensorflow#16424
Fixes tensorflow#427
Fixes tensorflow#588
Closes tensorflow#607
Could help tensorflow#748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants