SpytDistributions #35

faucct · 2024-10-03T08:55:12Z

No description provided.

alextokarew · 2024-10-04T03:30:52Z

spyt-package/src/main/python/spyt/standalone.py

-def get_base_cluster_config(global_conf, spark_cluster_version, params, base_discovery_path=None, client=None):
+def get_spyt_distributions(client, params=None) -> SpytDistributions:
+    if params is None:
+        params = {}


I think we should explicitly resolve yt_root parameter here and pass it to SpytDistributions constructor so it will be more readable

alextokarew · 2024-10-04T03:34:23Z

spyt-package/src/main/python/spyt/spec.py

@@ -43,6 +43,7 @@ class SparkDefaultArguments(object):
    @staticmethod
    def get_params():
        return {
+            "spyt_distributions": { "yt_root": "//home/spark" },


I'd rename this parameter to spyt_distribution or spyt_distribution_root and set it to //home/spark by default because it is the only sensible value in this structure for now.

I made this entity parameter nested so it would be possible to support other way of distributing YT, for example – "spyt_distributions": {"local_path": "/opt/spyt" }. I don't think that this should be a flat string parameter.

But in this pull request you don't support it. I think it should be kept as simple as possible, and if we decide to support other ways of specifying distribution (btw, is there a need for this?) we just refactor it

I've meant that if we will want to support it we will have to add a new conflicting parameter entity instead of extending an existing one. That's why I didn't want to make it a flat string.

But what's the possibility of such situation? I think it's almost none

I believe that in the future we will most likely want to support ways of distributing SPYT different from a YT catalog, for example in Docker images, like the Spark does.

alextokarew · 2024-10-04T03:38:50Z

spyt-package/src/main/python/spyt/conf.py

@@ -26,6 +18,71 @@
 logger = logging.getLogger(__name__)


+class SpytDistributions:


I'd rename this class and the get_spyt_distributions method to SpytDistribution because plural form confuses. It actually points to a different root, but it's not multiple distributions.

But it is really not singular, but plural, because this yt_root contains multiple SPYT and Spark distributions and to choose one you have to provide specific version to it.

Why do you need to support multiple spyt and spark distributions for a single cluster or job? I think there must be specific versions for each YT operation.

The catalog //home/spark is organized in a such way that there are multiple versions in it. If I am giving a way to change this path to something else, I am giving way to customize multiple SPYT distributions, not a single one, even though only one will be used.

alextokarew · 2024-10-04T03:41:17Z

spyt-package/src/main/python/spyt/conf.py

+class SpytDistributions:
+    def __init__(self, client: YtClient, yt_root: str):
+        self.client = client
+        self.yt_root = YPath(yt_root)


I'd rename yt_root to base_path so it would be SpytDistribution.base_path, I think it makes more sense.

It is called root in the publisher scripts: https://github.com/ytsaurus/ytsaurus-spyt/tree/382ae1656208601175b6f004fd31a39affb17b83/tools/release/publisher
Also it is not just any root or path, but a YT one.

But I think base_path in the context of this class is more suitable

In context of this class – maybe, but in context of public interface I believe that this should be coherent with the --root parameter from publisher scripts.

alextokarew · 2024-10-04T03:47:44Z

spyt-package/src/main/python/spyt/conf.py

@@ -3,19 +3,11 @@
 from spyt.dependency_utils import require_yt_client
 require_yt_client()

-from yt.wrapper import get, YPath, list as yt_list, exists  # noqa: E402
+from yt.wrapper import get, YPath, list as yt_list, exists, YtClient  # noqa: E402


I think we can get rid of importing and using get method because now you use YtClient instead

Inside the SpytDistributions – yes, but I am not sure about the external methods: client can still be None in those.

alextokarew · 2024-10-04T03:49:35Z

spyt-package/src/main/python/spyt/conf.py

+    def latest_ytserver_proxy_path(self, cluster_version):
+        if cluster_version:
+            return None
+        global_conf = self.global_conf


This local variable is redundant, you can directly use self.global_conf on the next line

alextokarew · 2024-10-04T03:55:26Z

spyt-package/src/main/python/spyt/conf.py

+                           "Please update your local ytsaurus-spyt".format(spark_cluster_version, SELF_VERSION))
+
+    def get_available_cluster_versions(self):
+        subdirs = yt_list(self.conf_base_path.join(RELEASES_SUBDIR), client=self.client)


subdirs can be obtained by calling self.get_available_spyt_versions() so it will remove code duplication

There are two different paths used there, so this method cannot be used.

Oh, I see, conf_base_path and spyt_base_path, you're right

alextokarew · 2024-10-04T03:58:26Z

spyt-package/src/main/python/spyt/conf.py

+    def _get_version_conf_path(self, cluster_version):
+        return self.conf_base_path.join(self._version_subdir(cluster_version)).join(cluster_version).join("spark-launch-conf")
+
+    def _version_subdir(self, version):


This function can be embedded into _get_version_conf_path because it is the only place where it is used.

faucct added 2 commits October 3, 2024 12:54

SpytDistributions

97ce5b9

expose SpytDistribution in _build_spark_conf

a53cf99

faucct requested a review from alextokarew October 3, 2024 10:04

faucct marked this pull request as ready for review October 3, 2024 10:04

alextokarew requested changes Oct 4, 2024

View reviewed changes

faucct requested a review from alextokarew October 4, 2024 05:49

groom conf.py

ae83df7

faucct closed this Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpytDistributions #35

SpytDistributions #35

faucct commented Oct 3, 2024

alextokarew Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

alextokarew Oct 4, 2024

faucct Oct 4, 2024

alextokarew Oct 4, 2024

alextokarew Oct 4, 2024

		@@ -26,6 +18,71 @@
		logger = logging.getLogger(__name__)


		class SpytDistributions:

SpytDistributions #35

SpytDistributions #35

Conversation

faucct commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment