Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Migrate many_nodes_actor_tests to new cloud. #31863

Merged
merged 9 commits into from
Jan 25, 2023

Conversation

fishbone
Copy link
Contributor

@fishbone fishbone commented Jan 23, 2023

Why are these changes needed?

This PR make the test run with the new cloud to prevent regression.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
@fishbone fishbone changed the title Large scale nightly [core] Migrate many_nodes_actor_tests to new cloud. Jan 24, 2023
@fishbone fishbone marked this pull request as ready for review January 24, 2023 02:13
Signed-off-by: Yi Cheng <[email protected]>
@fishbone
Copy link
Contributor Author

Given there is regression in both infra and oss side, I'll adjust the number later.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by the regression btw?

@@ -85,7 +87,12 @@ def run_prepare_command(
Command runners may choose to run this differently than the
test command.
"""
return self.run_command(command, env, timeout)
return exponential_backoff_retry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow in the new stack it needs waiting for a while (1-3s) before it's ready to be used

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 24, 2023
@fishbone
Copy link
Contributor Author

what do you mean by the regression btw?

The infra can't start 2k nodes and hangs and oss ray somehow GCS disappear. :(

@rkooo567
Copy link
Contributor

can you run the release test before merging it?

@fishbone
Copy link
Contributor Author

@rkooo567 I triggered the test hours ago (https://buildkite.com/ray-project/release-tests-pr/builds/26327#0185e53f-d293-4e2a-a34b-b6aa149e30ab) but still pending :(
I'll come back later.

@fishbone
Copy link
Contributor Author

passed

@fishbone fishbone merged commit d9dd326 into ray-project:master Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants