Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add the function to cleanup Redis backend. #31782

Merged
merged 7 commits into from
Jan 25, 2023

Conversation

fishbone
Copy link
Contributor

@fishbone fishbone commented Jan 19, 2023

Signed-off-by: Yi Cheng [email protected]

Why are these changes needed?

This feature is a helper tool to clean up the redis storage. It's meant to be a solution to cleanup the old data in redis stored by Ray until we got the work of GCS storage backend done.

This feature support redis cluster and non cluster mode and is for redis cleanup only. The feature is built with cython, so no external libraries needed.

Since _raylet.so depends on redis_client implicitly, so size change in the ray pkg.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Yi Cheng <[email protected]>
@YQ-Wang
Copy link
Contributor

YQ-Wang commented Jan 19, 2023

Quick question - Is the namespace in this PR the same as RAY_external_storage_namespace, that uses to add a prefix to the Redis keys? I am asking because some use patterns are using a shared Redis by a prefix.

except redis.exceptions.RedisClusterException:
cli = redis.Redis(host, port, **kwargs)

cli.delete(storage_namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need handle failure and retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - Is the namespace in this PR the same as RAY_external_storage_namespace, that uses to add a prefix to the Redis keys? I am asking because some use patterns are using a shared Redis by a prefix.

It'll throw exception to the user, I think the user should just either retry or do something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the function won't raise if there's no data to delete?

@fishbone
Copy link
Contributor Author

Quick question - Is the namespace in this PR the same as RAY_external_storage_namespace, that uses to add a prefix to the Redis keys? I am asking because some use patterns are using a shared Redis by a prefix.

@YQ-Wang Yes. And we've changed the Redis storage format by using hash tagged key in GCS to support cluster redis mode.

@edoakes
Copy link
Contributor

edoakes commented Jan 20, 2023

@iycheng is it possible to make this work without requiring an additional python dependency outside of the standard ray installation? Do we not have any redis client bundled anymore?

@YQ-Wang
Copy link
Contributor

YQ-Wang commented Jan 20, 2023

In what situation will this clean-up logic be triggered for the Ray cluster?

@edoakes
Copy link
Contributor

edoakes commented Jan 20, 2023

In what situation will this clean-up logic be triggered for the Ray cluster?

For now this must be manually called by the user (or external automation that they build). It's intended to be used for clusters using external storage for GCS FT after they've been intentionally shut down.

@fishbone
Copy link
Contributor Author

@iycheng is it possible to make this work without requiring an additional python dependency outside of the standard ray installation? Do we not have any redis client bundled anymore?

I can make a binary tool for this. But, I'm not sure how big it's going to increase the pip package size.

storage_namespace: The namespace of the storage to be deleted.
"""

from ray._raylet import del_key_from_storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the name del_key_from_storage accurate? this should clear all keys in the given namespace, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is delete a key. we use hash key. so delete the key of the hash table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can you add a comment explaining this? It was not obvious to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i'll.

Comment on lines +587 to +592
Args:
host: The host address of the Redis.
port: The port of the Redis.
password: The password of the Redis.
use_ssl: Whether to encrypt the connection.
storage_namespace: The namespace of the storage to be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make these all default to the same environment variables that the GCS does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so called ray start and this utility with the same env variables "just works"

Copy link
Contributor Author

@fishbone fishbone Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's doable unless we baked everything into OS envs. I'd say maybe app layer can have a wrapper on top of this (host/port can be retrieved from RAY_REDIS_ADDRESS and storage_namespace can be retrieved from RAY_external_storage_namespace). These should be given by the user.

For password it seems not good (you might not need to use this if it's a closed network).
I'm trying to make the utils not deep coupled with the existing ray api since the current API might be updated in the future (we'll make it backward compatible) and as you can see the current API is really bad, for example, redis address and redis password are passed to ray in different style.

Let me know if it's hard to do it in app layer and we'll see what we can do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the current API is definitely pretty bad lol.

Ok, I think it's fine to not pull them from the env vars.

Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
if not isinstance(storage_namespace, str):
raise ValueError("storage namespace must be a string")

# Right now, GCS store all data into a hash set key by storage_namespace.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added here.

Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
@fishbone
Copy link
Contributor Author





ray/_raylet.pyi:6: error: Class ray._raylet.ObjectRef has abstract attributes "__await__"
  | ray/_raylet.pyi:6: note: If it is meant to be abstract, add 'abc.ABCMeta' as an explicit metaclass
  | ray/_raylet.pyi:10: error: Class ray._raylet.ObjectID has abstract attributes "__await__"
  | ray/_raylet.pyi:10: note: If it is meant to be abstract, add 'abc.ABCMeta' as an explicit metaclass
  | ray/_private/gcs_utils.py:597: error: Module "ray._raylet" has no attribute "del_key_from_storage"
 

<br class="Apple-interchange-newline">

Not sure what's going on. trying to skip the check for these lines.

Signed-off-by: Yi Cheng <[email protected]>
@fishbone fishbone merged commit 41e1685 into ray-project:master Jan 25, 2023
@YQ-Wang
Copy link
Contributor

YQ-Wang commented Feb 27, 2023

@iycheng @edoakes Do we have any docs or instructions on how to run this cleanup tool? The function is under _private so not sure whether the user should call this.

@edoakes
Copy link
Contributor

edoakes commented Feb 27, 2023

@YQ-Wang this is currently experimental and just being used for testing, so it shouldn't be relied on as a public API (it may change across versions). It should work and we don't expected major changes, but it's still "use at your own risk."

If you have a requirement for this functionality to be stabilized could you please file a github issue and tag me and @iycheng?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants