-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Add the function to cleanup Redis backend. #31782
Conversation
Signed-off-by: Yi Cheng <[email protected]>
Quick question - Is the namespace in this PR the same as RAY_external_storage_namespace, that uses to add a prefix to the Redis keys? I am asking because some use patterns are using a shared Redis by a prefix. |
python/ray/_private/gcs_utils.py
Outdated
except redis.exceptions.RedisClusterException: | ||
cli = redis.Redis(host, port, **kwargs) | ||
|
||
cli.delete(storage_namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need handle failure and retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question - Is the namespace in this PR the same as RAY_external_storage_namespace, that uses to add a prefix to the Redis keys? I am asking because some use patterns are using a shared Redis by a prefix.
It'll throw exception to the user, I think the user should just either retry or do something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the function won't raise if there's no data to delete?
@YQ-Wang Yes. And we've changed the Redis storage format by using hash tagged key in GCS to support cluster redis mode. |
@iycheng is it possible to make this work without requiring an additional python dependency outside of the standard |
In what situation will this clean-up logic be triggered for the Ray cluster? |
For now this must be manually called by the user (or external automation that they build). It's intended to be used for clusters using external storage for GCS FT after they've been intentionally shut down. |
I can make a binary tool for this. But, I'm not sure how big it's going to increase the pip package size. |
python/ray/_private/gcs_utils.py
Outdated
storage_namespace: The namespace of the storage to be deleted. | ||
""" | ||
|
||
from ray._raylet import del_key_from_storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the name del_key_from_storage
accurate? this should clear all keys in the given namespace, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is delete a key. we use hash key. so delete the key of the hash table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can you add a comment explaining this? It was not obvious to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i'll.
Args: | ||
host: The host address of the Redis. | ||
port: The port of the Redis. | ||
password: The password of the Redis. | ||
use_ssl: Whether to encrypt the connection. | ||
storage_namespace: The namespace of the storage to be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make these all default to the same environment variables that the GCS does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so called ray start
and this utility with the same env variables "just works"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's doable unless we baked everything into OS envs. I'd say maybe app layer can have a wrapper on top of this (host/port can be retrieved from RAY_REDIS_ADDRESS and storage_namespace can be retrieved from RAY_external_storage_namespace). These should be given by the user.
For password it seems not good (you might not need to use this if it's a closed network).
I'm trying to make the utils not deep coupled with the existing ray api since the current API might be updated in the future (we'll make it backward compatible) and as you can see the current API is really bad, for example, redis address and redis password are passed to ray in different style.
Let me know if it's hard to do it in app layer and we'll see what we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the current API is definitely pretty bad lol.
Ok, I think it's fine to not pull them from the env vars.
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
if not isinstance(storage_namespace, str): | ||
raise ValueError("storage namespace must be a string") | ||
|
||
# Right now, GCS store all data into a hash set key by storage_namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added here.
Signed-off-by: Yi Cheng <[email protected]>
Signed-off-by: Yi Cheng <[email protected]>
Not sure what's going on. trying to skip the check for these lines. |
Signed-off-by: Yi Cheng <[email protected]>
@iycheng @edoakes Do we have any docs or instructions on how to run this cleanup tool? The function is under _private so not sure whether the user should call this. |
@YQ-Wang this is currently experimental and just being used for testing, so it shouldn't be relied on as a public API (it may change across versions). It should work and we don't expected major changes, but it's still "use at your own risk." If you have a requirement for this functionality to be stabilized could you please file a github issue and tag me and @iycheng? |
Signed-off-by: Yi Cheng [email protected]
Why are these changes needed?
This feature is a helper tool to clean up the redis storage. It's meant to be a solution to cleanup the old data in redis stored by Ray until we got the work of GCS storage backend done.
This feature support redis cluster and non cluster mode and is for redis cleanup only. The feature is built with cython, so no external libraries needed.
Since
_raylet.so
depends on redis_client implicitly, so size change in the ray pkg.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.