What's needed in the final paper? #132

tianyin · 2022-06-30T17:42:00Z

@wangchen615 raised a great question about a paper plan. We occasionally discussed that in the past but I think it would benefit everyone to systematically write it down so we have a concrete goal on the same page.

NSDI looks a good one with a 9/20 deadline; if we can't catch it, OSDI will be EOY. I would say we should focus on the work and build all the cool stuff we want.

Let me organize the work based on the evaluation. For a testing tool, the following are the metrics:

Effectiveness: Can the tool find new bug? How many new bugs it finds? How many of them are important bugs?
Efficiency: Can the tool find bugs fast? How long it takes? Is it using the time efficiently?
Trustworthiness: Is every alarm indicates a bug? If not, how many false alarms one needs to suffer to find a true alarm?
Usability: How automatic is the tool? How easy can someone use the tool?

Also, I will put MUST-TO-HAVE and NICE-TO-HAVE tags for the technical tasks. As the names suggest, I will be against a paper submission if any MUST-TO-HAVE is not done.

Effectiveness

Goal: Testing 10+ operators and find as many bugs as possible.

[MUST-TO-HAVE] Testing 10 operators [9/10]
[VERY-NICE-TO-HAVE] Domain-specific support List of next steps #131 (e.g., New oracle needed to detect if the changed config is rolled out #61)
[VERY-NICE-TO-HAVE] KNative operator Try IBM-related operators #120, https://github.com/knative/operator
[NICE-TO-HAVE] Testing more operators than 10
[NICE-TO-HAVE] Back and forth testing List of next steps #131

Why 10? because Sieve did 10 so we should match the scale of the state of the art. I hope we can do a little bit more (e.g., 12) but it's a nice-to-have.

Overall, we are doing pretty well now on the effectiveness metric. So far, we tested 7 operators and found 30 bugs; we find bugs in every single operator we tested. @Essoz and @unw9527 are doing an amazing job! We're confident that the efforts will continue and we have even better numbers.

Note that we should be more selective on the operators to evaluate. @pdettori suggested the KNative operator and we should check whether we can use it in the evaluation.

I also highly suggest to do the domain-specific support feature, which can help us find more bugs. But, I won't use it against a paper submission if everything else looks strong.

Efficiency

Goal: Make Acto as fast as it can.

[MUST-TO-HAVE] Multi-input testing List of next steps #131
[MUST-TO-HAVE] Multi-cluster parallelism feat: parallelize acto - multi cluster #86

We've already improved the speed. @Essoz built the multi-cluster parallel run feature (#86) which is a huge system-level improvement (thank you!!!)

I think we should develop the multiple input technique (see #131) which will help us add more algorithmic novelty.

Trustworthiness

Goal: Kill all the FPs that can be systematically addressed.

[MUST-TO-HAVE] Static program analysis
[MUST-TO-HAVE] Deal with spurious error logs
- Disable error log oracle
- Post-mortem analysis
[MUST-TO-HAVE] Deal with invalid inputs that are called out by operator logs feat: invalid input handling #141

@tylergu has spent the whole month of work on FP reduction to improve this metric. The static analysis is on the way and it looks effective. The simple three cases (no copy, default-value, dominantor) look all be implemented based on today's meeting, tho the existing static analysis is only applied to three operators. The last one (control dependency analysis) is the hardest, but the algorithm is known and discussed.

We discuss the other two MUST-TO-HAVE and have figure out solutions (which I think should work and are sufficient). Please address them.

Usability

Goal: A tool for everyone.

[MUST-TO-HAVE] Results for multi-level usability (with domain knowledge, with code analysis, without code analysis)

The point to make is that Acto is a tool that can be used in different use cases; certainly, there is no free lunch; the more information Acto has, the better it works.

tianyin assigned vazirim, Essoz, wangchen615, tylergu, unw9527 and pdettori Jun 30, 2022

tianyin added documentation Improvements or additions to documentation Discussion action plan labels Jun 30, 2022

tianyin pinned this issue Jul 15, 2022

This was referenced Jul 15, 2022

Try IBM-related operators #120

Closed

List of next steps #131

Closed

Action plan till 7/28 #142

Closed

Essoz unpinned this issue Jul 18, 2022

tylergu pinned this issue Jul 18, 2022

Essoz unpinned this issue Aug 7, 2022

Essoz pinned this issue Aug 8, 2022

tianyin unpinned this issue Apr 6, 2023

tianyin closed this as completed Jul 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's needed in the final paper? #132

What's needed in the final paper? #132

tianyin commented Jun 30, 2022 •

edited by tylergu

Loading

What's needed in the final paper? #132

What's needed in the final paper? #132

Comments

tianyin commented Jun 30, 2022 • edited by tylergu Loading

Effectiveness

Efficiency

Trustworthiness

Usability

tianyin commented Jun 30, 2022 •

edited by tylergu

Loading