Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hotfix] Increase the timeout of the ProxyActor health check #2082

Merged

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Apr 16, 2024

Why are these changes needed?

I observed that NumServeEndpoints changes frequently especially after we start to watch Endpoints in #2080. The error message is:

Get \"http://10.244.0.6:8000/-/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The timeout of the HTTP client is 20 ms. Hence, I increase the timeout to 2 seconds which is the same as the dashboard HTTP client.

I marked it as 'Hotfix' because I think 20 ms should be enough for my very simple setup (single Ray node, local Kind cluster, no requests). Hence, the instability may be a Ray Serve issue.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 marked this pull request as ready for review April 16, 2024 23:50
@kevin85421 kevin85421 requested a review from jjyao April 16, 2024 23:50
return err
}
defer resp.Body.Close()

body, _ := io.ReadAll(resp.Body)
if resp.StatusCode < 200 || resp.StatusCode > 299 {
return fmt.Errorf("RayHttpProxyClient CheckHealth fail: %s %s", resp.Status, string(body))
if resp.StatusCode != 200 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://sourcegraph.com/github.com/ray-project/ray@00a0b9f107397ec102356e09cb2c269e9136b589/-/blob/python/ray/serve/_private/proxy.py?L816

In Ray Serve, if the ProxyActor is healthy, the status code is 200. If not, it should be 503.

@kevin85421 kevin85421 merged commit 981c943 into ray-project:master Apr 17, 2024
24 checks passed
@kevin85421 kevin85421 assigned kevin85421 and unassigned jjyao Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants