Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test P2P Shuffle in integration tests #597

Merged
merged 30 commits into from
Dec 23, 2022
Merged

Test P2P Shuffle in integration tests #597

merged 30 commits into from
Dec 23, 2022

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Dec 7, 2022

conftest.py Outdated Show resolved Hide resolved
@hendrikmakait
Copy link
Member Author

Skipping larger data size for join tests due to data transfer to client which OOM-kills the pytest workers (#633).

@hendrikmakait hendrikmakait marked this pull request as ready for review December 23, 2022 10:12
@hendrikmakait
Copy link
Member Author

Ready for review. cc @crusaderky, @ncclementi

(0.1, "tasks"),
# shuffling takes a long time with 1 or higher
(0.1, "p2p"),
# (1, "p2p"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#633 should allow us to enable this in a follow-up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we mark them then as marks=pytest.mark.skip(reason="Does not finish, fix when #633 is merged")

If not, open an issue to state that when the #633 is merged we need to uncomment this

Copy link
Contributor

@ncclementi ncclementi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some small comments.

My only concern is that we will be running each of the h2o queries 9 times (csv, pq0.5GB, pq5GB * no_shuffle, task, p2p) This will increase the time of the whole CI, a lot. For most of the queries and this size of data shuffle and p2p might make no sense.

Shouldn't we just move to only 1 biggish pq dataset (thinking of the 50GB) and run this only maybe once a week. What would imply running it as a separate workflow probably.

tests/benchmarks/test_join.py Show resolved Hide resolved
(0.1, "tasks"),
# shuffling takes a long time with 1 or higher
(0.1, "p2p"),
# (1, "p2p"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we mark them then as marks=pytest.mark.skip(reason="Does not finish, fix when #633 is merged")

If not, open an issue to state that when the #633 is merged we need to uncomment this

@hendrikmakait
Copy link
Member Author

My only concern is that we will be running each of the h2o queries 9 times (csv, pq0.5GB, pq5GB * no_shuffle, task, p2p) This will increase the time of the whole CI, a lot. For most of the queries and this size of data shuffle and p2p might make no sense.

FWIW, we only run each query 6 times as we do not test the implicit default aka "no_shuffle". IIUC, tasks is the default behavior when using a distributed cluster, we just make it explicit now and do not need to test "no_shuffle". While there shouldn't be a difference in performance, we want to avoid possible performance regressions. Regarding frequency/increased CI runtime, I've had a chat with @fjetter about this and while tasks is the default behavior and p2p is under active development, we feel like it would be a good idea to test both of them frequently. Note that until the next release, p2p is only tested for upstream.

Shouldn't we just move to only 1 biggish pq dataset (thinking of the 50GB) and run this only maybe once a week. What would imply running it as a separate workflow probably.

Adding testing on a large-scale dataset is yet another thing we should add, but that's out of scope for this PR.

@hendrikmakait
Copy link
Member Author

Shouldn't we mark them then as marks=pytest.mark.skip(reason="Does not finish, fix when #633 is merged")

Good point, I meant to do that after creating #633.

pytest.param(
1,
marks=pytest.mark.skip(reason="Does not finish"),
(1, "p2p"), marks=pytest.mark.skip(reason="client OOMs, see coiled-runtime#633")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XREF: #633

10,
marks=pytest.mark.skip(reason="Does not finish"),
(10, "p2p"),
marks=pytest.mark.skip(reason="client OOMs, see coiled-runtime#633"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XREF: #633

@hendrikmakait
Copy link
Member Author

Running the H2O benchmarks with p2p took me a total of 13 mins. 7 mins were spent on cluster setup and testing 0.5 GB datasets. With that in mind, I do not see any harm in testing p2p on every run.

Granted, the value of testing on tiny datasets is questionable, but should be discussed in another issue.

@ncclementi
Copy link
Contributor

@hendrikmakait We are having some legit failures in https://github.com/coiled/coiled-runtime/actions/runs/3766959977/jobs/6404088081

________________ ERROR collecting tests/benchmarks/test_join.py ________________
tests/benchmarks/test_join.py::test_join_big: in "parametrize" the number of names (2):
  ['mem_mult', 'shuffle']
must be equal to the number of values (1):
  ((1, 'p2p'),)

The ones in https://github.com/coiled/coiled-runtime/actions/runs/3766959977/jobs/6404088006 are platform related.

@ncclementi
Copy link
Contributor

ncclementi commented Dec 23, 2022

@hendrikmakait I'm trying different combinations here #634, but haven't been able to make it work.

That PR, runs only that test so it's easier to debug. in case you want to mess in there.
I tried, this but didn't work

  (
      pytest.param(1, marks=pytest.mark.skip(reason=reason)),
      pytest.param("p2p", marks=pytest.mark.skip(reason=reason)),
  )

@hendrikmakait
Copy link
Member Author

@ncclementi: I think I found the solution (interesting that some stuff seems to work but then fails when it comes to running the specific parametrization (or ignoring it).

Sorry for causing this, I should've really tested "that one last beautification" locally.

@ncclementi
Copy link
Contributor

@hendrikmakait no worries, I've been playing with it trying to fix it because I thought you were off for today. I'm testing it over here too as it should fail faster if it doesn't work #634

It's very unintuitive :/

@ncclementi
Copy link
Contributor

Ci is green,
Thanks @hendrikmakait this goes in!

@ncclementi ncclementi merged commit ce2ea4f into main Dec 23, 2022
@ncclementi ncclementi deleted the shuffle-fixture branch December 23, 2022 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test P2P Shuffle in integration tests
2 participants