Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TPU tests on master builds #15349

Merged
merged 6 commits into from
Oct 31, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/tpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ jobs:
kubectl logs -f $pod_name --container=train > /tmp/full_output.txt
grep '<?xml version="1.0" ?>' /tmp/full_output.txt # sanity check
csplit /tmp/full_output.txt '/<?xml version="1.0" ?>/'
cat xx00 # test logs
mv xx01 coverage.xml
exit $status_code
shell: bash
Expand Down
14 changes: 9 additions & 5 deletions dockers/tpu-tests/tpu_test_cases.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,15 @@ local tputests = base.BaseTest {
echo "--- Fetch the SHA's changes ---"
git clone --single-branch --depth 1 https://github.com/Lightning-AI/lightning.git
cd lightning
git fetch origin --depth 1 pull/{PR_NUMBER}/head:test/{PR_NUMBER}
git -c advice.detachedHead=false checkout {SHA}

echo "--- Install PL ---"
PACKAGE_NAME=pytorch FREEZE_REQUIREMENTS=1 pip install -e .[test]
if [ -n "{PR_NUMBER}" ]; then # if PR number is not empty
# PR triggered it, check it out
git fetch origin --depth 1 pull/{PR_NUMBER}/head:test/{PR_NUMBER}
git -c advice.detachedHead=false checkout {SHA}
fi

echo "--- Install packages ---"
PACKAGE_NAME=lite pip install -e .[dev]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stumbled upon something interesting regarding these two lines when I tried setting up a machine on Colab to run my TPU tests.

When you run PACKAGE_NAME=lite pip install -e .[dev], it will write _PACKAGE_NAME = "lite" in setup.py. This is done by these quite surprising lines of code: https://github.com/Lightning-AI/lightning/blob/master/setup.py#L72-L78

When you run PACKAGE_NAME=pytorch pip install -e .[dev] it installs once again lightning_lite because the PACKAGE_NAME variable is simply ignored.

You can see it in the logs of the run on the master branch:

When running PACKAGE_NAME=lite pip install .[dev] without the -e option, it does not override the code in setup.py, so I could successfully set up the libraries like this.

Mysteriously, the tests that require pytorch_lightning run successfully, so it must be installed somehow. I presumed that it may come pre-installed in the docker image, but I couldn't deduce it from the commands given on the docker hub page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I think it is rightthat the next env. variable is ignored because aside from the edit of setup.py also, MANIFEST.in is updated...

TLDR, I think we can drop -e from all CI 🦦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #15489 and #15490. Also related to #15474

PACKAGE_NAME=pytorch pip install -e .[dev]
pip list

echo $KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS
Expand Down