-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpytDistributions #35
Conversation
def get_base_cluster_config(global_conf, spark_cluster_version, params, base_discovery_path=None, client=None): | ||
def get_spyt_distributions(client, params=None) -> SpytDistributions: | ||
if params is None: | ||
params = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explicitly resolve yt_root parameter here and pass it to SpytDistributions constructor so it will be more readable
@@ -43,6 +43,7 @@ class SparkDefaultArguments(object): | |||
@staticmethod | |||
def get_params(): | |||
return { | |||
"spyt_distributions": { "yt_root": "//home/spark" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this parameter to spyt_distribution
or spyt_distribution_root
and set it to //home/spark
by default because it is the only sensible value in this structure for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this entity parameter nested so it would be possible to support other way of distributing YT, for example – "spyt_distributions": {"local_path": "/opt/spyt" }
. I don't think that this should be a flat string parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this pull request you don't support it. I think it should be kept as simple as possible, and if we decide to support other ways of specifying distribution (btw, is there a need for this?) we just refactor it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've meant that if we will want to support it we will have to add a new conflicting parameter entity instead of extending an existing one. That's why I didn't want to make it a flat string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what's the possibility of such situation? I think it's almost none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that in the future we will most likely want to support ways of distributing SPYT different from a YT catalog, for example in Docker images, like the Spark does.
@@ -26,6 +18,71 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
class SpytDistributions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this class and the get_spyt_distributions
method to SpytDistribution
because plural form confuses. It actually points to a different root, but it's not multiple distributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is really not singular, but plural, because this yt_root contains multiple SPYT and Spark distributions and to choose one you have to provide specific version to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to support multiple spyt and spark distributions for a single cluster or job? I think there must be specific versions for each YT operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catalog //home/spark
is organized in a such way that there are multiple versions in it. If I am giving a way to change this path to something else, I am giving way to customize multiple SPYT distributions, not a single one, even though only one will be used.
class SpytDistributions: | ||
def __init__(self, client: YtClient, yt_root: str): | ||
self.client = client | ||
self.yt_root = YPath(yt_root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename yt_root
to base_path
so it would be SpytDistribution.base_path
, I think it makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called root in the publisher scripts: https://github.com/ytsaurus/ytsaurus-spyt/tree/382ae1656208601175b6f004fd31a39affb17b83/tools/release/publisher
Also it is not just any root or path, but a YT one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think base_path
in the context of this class is more suitable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In context of this class – maybe, but in context of public interface I believe that this should be coherent with the --root
parameter from publisher scripts.
@@ -3,19 +3,11 @@ | |||
from spyt.dependency_utils import require_yt_client | |||
require_yt_client() | |||
|
|||
from yt.wrapper import get, YPath, list as yt_list, exists # noqa: E402 | |||
from yt.wrapper import get, YPath, list as yt_list, exists, YtClient # noqa: E402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of importing and using get
method because now you use YtClient
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside the SpytDistributions
– yes, but I am not sure about the external methods: client
can still be None
in those.
def latest_ytserver_proxy_path(self, cluster_version): | ||
if cluster_version: | ||
return None | ||
global_conf = self.global_conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This local variable is redundant, you can directly use self.global_conf
on the next line
"Please update your local ytsaurus-spyt".format(spark_cluster_version, SELF_VERSION)) | ||
|
||
def get_available_cluster_versions(self): | ||
subdirs = yt_list(self.conf_base_path.join(RELEASES_SUBDIR), client=self.client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subdirs
can be obtained by calling self.get_available_spyt_versions()
so it will remove code duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different paths used there, so this method cannot be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, conf_base_path
and spyt_base_path
, you're right
def _get_version_conf_path(self, cluster_version): | ||
return self.conf_base_path.join(self._version_subdir(cluster_version)).join(cluster_version).join("spark-launch-conf") | ||
|
||
def _version_subdir(self, version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be embedded into _get_version_conf_path
because it is the only place where it is used.
No description provided.