Limitation, regarding Spark #121

tianyin · 2022-06-16T17:18:28Z

First, when we discuss limitations, we want to be careful in differentiating fundamental limitations versus limitations of our own implementations, for research projects. Take #120 as an example, not implementing Java code analysis (which again if someone wants to learn, it'll be fun) is an implementation limitation, but not a fundamental limitation -- if we know how to do it in Go, it's straightforward to do the same analysis in Java (which is even simpler [1]).

OTOH, the fundamental limitation is that some of the FP pruning requires static source-code analysis: (1) It require the availability of source code, which may not apply to proprietary, closed-sourced operators; however, given that our target users are developers of operators, so it is not a concern; (2) It is not as applicable as a pure language-agnostic approach, because one needs to implement the same analysis for every language; (3) It's prohibitively difficult to implement imprecise static analysis, so it inevitably leads to soundness and completeness issues.

Back to the topic, I don't have an understanding that whether the short-lived operator is a fundamental research limitation or a limitation of the current Acto implementation. If it's the latter, it worries me less. Note that I understand engineering challenges are nontrivial. But, if it is fundamental, we should find a time to discuss it in depth.

[1] Java is simpler to analysis, because there are more mature tools.

tylergu · 2022-06-16T19:45:57Z

I think it is something fundamental.

When we design Acto, we assumed the deployed application will reach to some stable state. This is why we always wait for the system to converge and then collect the cluster state. However, this assumption breaks in the case of the spark-operator. Each CR in this spark-operator is a workload. When users submit a CR, the operator submits a spark job and run it. Some executor pods will be spawned and terminated once finished running. @kevchentw @kevchentw Can you confirm that spark-operator deletes resources once the job finishes? I just tried it and it seems there are some pods being deleted.

The design of this spark application breaks our assumption that these systems have a stable state and makes it hard for Acto to capture the system state.

In fact, such cases happen in other operators too. There are sometimes one CRD for the system itself, another CRD for submitting the actually workload. We usually configure Acto to test the CRD for the system only.

In my opinion, we can argue that the workload of these systems are not in the management plane so it's out of the scope.

tianyin · 2022-06-21T20:47:52Z

In my opinion, we can argue that the workload of these systems are not in the management plane so it's out of the scope.

It feels like a weak argument. Put indeed it's a low priority problem. Let's discuss in person and close this issue, leaving it for now.

tianyin · 2022-06-22T23:27:42Z

had a discussion with @tylergu offline.

Acto relies on the observability of states while the Spark operator is used for running jobs which are often ephemeral. What's needed is a hook at the termination time to do a snapshot of the system state. Currently Acto does not support that. We need some investigation how to do it and what support is needed from the lower level (even we do not implement it).
The Spark operator actually does not follow the basic level-triggering principle of operators. It's arguable whether it's an ill fit or not.

tianyin · 2022-06-22T23:28:53Z

dc9dc5f

tianyin added the Discussion label Jun 16, 2022

tianyin closed this as completed Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limitation, regarding Spark #121

Limitation, regarding Spark #121

tianyin commented Jun 16, 2022 •

edited

Loading

tylergu commented Jun 16, 2022

tianyin commented Jun 21, 2022

tianyin commented Jun 22, 2022

tianyin commented Jun 22, 2022

Limitation, regarding Spark #121

Limitation, regarding Spark #121

Comments

tianyin commented Jun 16, 2022 • edited Loading

tylergu commented Jun 16, 2022

tianyin commented Jun 21, 2022

tianyin commented Jun 22, 2022

tianyin commented Jun 22, 2022

tianyin commented Jun 16, 2022 •

edited

Loading