-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitation, regarding Spark #121
Comments
I think it is something fundamental. When we design Acto, we assumed the deployed application will reach to some stable state. This is why we always wait for the system to converge and then collect the cluster state. However, this assumption breaks in the case of the spark-operator. Each CR in this spark-operator is a workload. When users submit a CR, the operator submits a spark job and run it. Some executor pods will be spawned and terminated once finished running. @kevchentw @kevchentw Can you confirm that spark-operator deletes resources once the job finishes? I just tried it and it seems there are some pods being deleted. The design of this spark application breaks our assumption that these systems have a stable state and makes it hard for Acto to capture the system state. In fact, such cases happen in other operators too. There are sometimes one CRD for the system itself, another CRD for submitting the actually workload. We usually configure Acto to test the CRD for the system only. In my opinion, we can argue that the workload of these systems are not in the management plane so it's out of the scope. |
It feels like a weak argument. Put indeed it's a low priority problem. Let's discuss in person and close this issue, leaving it for now. |
had a discussion with @tylergu offline.
|
First, when we discuss limitations, we want to be careful in differentiating fundamental limitations versus limitations of our own implementations, for research projects. Take #120 as an example, not implementing Java code analysis (which again if someone wants to learn, it'll be fun) is an implementation limitation, but not a fundamental limitation -- if we know how to do it in Go, it's straightforward to do the same analysis in Java (which is even simpler [1]).
OTOH, the fundamental limitation is that some of the FP pruning requires static source-code analysis: (1) It require the availability of source code, which may not apply to proprietary, closed-sourced operators; however, given that our target users are developers of operators, so it is not a concern; (2) It is not as applicable as a pure language-agnostic approach, because one needs to implement the same analysis for every language; (3) It's prohibitively difficult to implement imprecise static analysis, so it inevitably leads to soundness and completeness issues.
Back to the topic, I don't have an understanding that whether the short-lived operator is a fundamental research limitation or a limitation of the current Acto implementation. If it's the latter, it worries me less. Note that I understand engineering challenges are nontrivial. But, if it is fundamental, we should find a time to discuss it in depth.
[1] Java is simpler to analysis, because there are more mature tools.
The text was updated successfully, but these errors were encountered: