You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@wangchen615 raised a great question about a paper plan. We occasionally discussed that in the past but I think it would benefit everyone to systematically write it down so we have a concrete goal on the same page.
NSDI looks a good one with a 9/20 deadline; if we can't catch it, OSDI will be EOY. I would say we should focus on the work and build all the cool stuff we want.
Let me organize the work based on the evaluation. For a testing tool, the following are the metrics:
Effectiveness: Can the tool find new bug? How many new bugs it finds? How many of them are important bugs?
Efficiency: Can the tool find bugs fast? How long it takes? Is it using the time efficiently?
Trustworthiness: Is every alarm indicates a bug? If not, how many false alarms one needs to suffer to find a true alarm?
Usability: How automatic is the tool? How easy can someone use the tool?
Also, I will put MUST-TO-HAVE and NICE-TO-HAVE tags for the technical tasks. As the names suggest, I will be against a paper submission if any MUST-TO-HAVE is not done.
Effectiveness
Goal: Testing 10+ operators and find as many bugs as possible.
Why 10? because Sieve did 10 so we should match the scale of the state of the art. I hope we can do a little bit more (e.g., 12) but it's a nice-to-have.
Overall, we are doing pretty well now on the effectiveness metric. So far, we tested 7 operators and found 30 bugs; we find bugs in every single operator we tested. @Essoz and @unw9527 are doing an amazing job! We're confident that the efforts will continue and we have even better numbers.
Note that we should be more selective on the operators to evaluate. @pdettori suggested the KNative operator and we should check whether we can use it in the evaluation.
I also highly suggest to do the domain-specific support feature, which can help us find more bugs. But, I won't use it against a paper submission if everything else looks strong.
@tylergu has spent the whole month of work on FP reduction to improve this metric. The static analysis is on the way and it looks effective. The simple three cases (no copy, default-value, dominantor) look all be implemented based on today's meeting, tho the existing static analysis is only applied to three operators. The last one (control dependency analysis) is the hardest, but the algorithm is known and discussed.
We discuss the other two MUST-TO-HAVE and have figure out solutions (which I think should work and are sufficient). Please address them.
Usability
Goal: A tool for everyone.
[MUST-TO-HAVE] Results for multi-level usability (with domain knowledge, with code analysis, without code analysis)
The point to make is that Acto is a tool that can be used in different use cases; certainly, there is no free lunch; the more information Acto has, the better it works.
The text was updated successfully, but these errors were encountered:
@wangchen615 raised a great question about a paper plan. We occasionally discussed that in the past but I think it would benefit everyone to systematically write it down so we have a concrete goal on the same page.
NSDI looks a good one with a 9/20 deadline; if we can't catch it, OSDI will be EOY. I would say we should focus on the work and build all the cool stuff we want.
Let me organize the work based on the evaluation. For a testing tool, the following are the metrics:
Also, I will put MUST-TO-HAVE and NICE-TO-HAVE tags for the technical tasks. As the names suggest, I will be against a paper submission if any MUST-TO-HAVE is not done.
Effectiveness
Goal: Testing 10+ operators and find as many bugs as possible.
Why 10? because Sieve did 10 so we should match the scale of the state of the art. I hope we can do a little bit more (e.g., 12) but it's a nice-to-have.
Overall, we are doing pretty well now on the effectiveness metric. So far, we tested 7 operators and found 30 bugs; we find bugs in every single operator we tested. @Essoz and @unw9527 are doing an amazing job! We're confident that the efforts will continue and we have even better numbers.
Note that we should be more selective on the operators to evaluate. @pdettori suggested the KNative operator and we should check whether we can use it in the evaluation.
I also highly suggest to do the domain-specific support feature, which can help us find more bugs. But, I won't use it against a paper submission if everything else looks strong.
Efficiency
Goal: Make Acto as fast as it can.
We've already improved the speed. @Essoz built the multi-cluster parallel run feature (#86) which is a huge system-level improvement (thank you!!!)
I think we should develop the multiple input technique (see #131) which will help us add more algorithmic novelty.
Trustworthiness
Goal: Kill all the FPs that can be systematically addressed.
@tylergu has spent the whole month of work on FP reduction to improve this metric. The static analysis is on the way and it looks effective. The simple three cases (no copy, default-value, dominantor) look all be implemented based on today's meeting, tho the existing static analysis is only applied to three operators. The last one (control dependency analysis) is the hardest, but the algorithm is known and discussed.
We discuss the other two MUST-TO-HAVE and have figure out solutions (which I think should work and are sufficient). Please address them.
Usability
Goal: A tool for everyone.
The point to make is that Acto is a tool that can be used in different use cases; certainly, there is no free lunch; the more information Acto has, the better it works.
The text was updated successfully, but these errors were encountered: