[mdb-community] A summary of evaluation results for mongodb-community operator #166

taham0 · 2022-08-17T03:51:04Z

Evaluation Results

Acto ran 153 test cases over 77 fields
It produced about 120 alarms in the initial evaluation with 108 false alarms and 12 true alarms
It produced 61 alarms in the most recent evaluation with 48 false alarms, 3 true alarms
The remaining 10 alarms were produced without any output from the state oracle
The reduction in alarms can be largely attributed to the inclusion of secrets in Acto

False Alarms	48
Category	Subtotal
Ineffective input generation	17
Invalid input	16
Input applied despite operator crash	7
Field doesn't map to a state in application	5
Null != default	2
Inconsistent operator state	1
Grand Total	48
True Alarms	3

True Alarms

The initial evaluation had reproduced 4 bugs. A total of 3 true alarms reproduced 2 bugs in the most recent evaluation.

Operator crashes in face of an incomplete TLS configuration mongodb/mongodb-kubernetes-operator#1054
- 2 of the alarms reproduced this bug.
Operator crashes when spec.security.modes list is empty mongodb/mongodb-kubernetes-operator#1055
- 1 of the alarms reproduced this bug.
Mongodb system is down and unable to recover when the featureCompatibilityVersion is not specified and changed to an invalid value mongodb/mongodb-kubernetes-operator#1072
- None of the alarms reproduced this bug.
- I am yet to find the reason why this was not reproduced
Changing scramCredentialsSecretName causes resource leak mongodb/mongodb-kubernetes-operator#1074
- None of the alarms reproduced this bug.
- As discussed in Observability #160, this is because changing the scramCredentialsSecretName creates a new secret
- After the inclusion of secrets in Acto, a match between system and input delta occurs so no alarm is raised

False Alarms

The initial evaluation had 108 false alarms. The number of false alarms reduced to 48 in the most recent evaluation.

17 of these alarms occured because of ineffective input generation
- All of these alarms occured because a non-nullable field was set to null so the CR failed to be applied due to a validation error
16 of these alarms occured because of invalid input
- 9 of these alarms occured because process names were invalid.
  The spec.automationConfig contains an array of processes. Each process has specific fields and corresponding values specified in the input to override the current processes in the operator-created automationConfig by merging. Specifically, the operator searches for the process names specified in the input among the current processes. Since the input process names are invalid, no match is found and the input is not merged and the automationConfig remains unchanged.
- 7 of these alarms occured because an invalid resource reference is provided
  In order to enable TLS, a caCertificateSecret (or a caConfigMap) and a certificateKeySecret is required. None of these objects exist and Acto provides an invalid reference to one or more of these objects which is identified by the operator and results in a warning.
7 of these alarms occured because a new configuration was applied despite that the operator crashed due to a previous mutation
5 of these alarms occured because the field did not map to a state in the application
2 of these alarms occured because the null field was changed to a default value
1 of these alarms occured because of the inconsistent operator state due to a previously applied configuration
A previously applied configuration left the operator in an inconsistent state. Since agent version does not match the goal state, the replicaSet is not ready. Consequently, the operator is unable to proceed towards creating / updating the connectionStringSecret.

Evaluation Result 28/08/22

The following changes were made:

Warn level in operator log included to eliminate invalid input
Seed CR changed
Format issue resolved for resource units (mapping 0.2 to 200M)
Detecting failure to apply configuration from cli output

True Alarms | 5
False Alarms | 13

False Alarm Category	Subtotal
Field doesn't map to a state in application	4
CR applied over crashed operator	4
Input value is the same as previous value / default value (in impact)	2
Inconsistent operator state	2
Invalid input	1

Evaluation Result 31/08/22

True Alarms | 5
False Alarms | 4

False Alarm Category	Subtotal
Field doesn't map to a state in application	2
Input value is the same as previous value / default value (in impact)	1
Ineffective input generation	1

The text was updated successfully, but these errors were encountered:

tianyin · 2022-08-17T04:05:39Z

Thanks @taham0 for the detailed explanation! The high FP rate has been the key challenge of Acto now. In this case, the FP is almost 94% (48/51) which is hard to make a case of a usable tool. @tylergu has been working hard on reducing FP and the FPs are indeed reduced by >2 times. However, it does not seem to be sufficient at this point. I don't know how the results of the other operators look like. My guess is that the numbers are low, as the FP reduction was designed based on the understanding of the other operators (we likely have overfitting issues).

@tylergu @Essoz can you look into the FP?

tylergu · 2022-08-17T04:14:10Z

Thanks for the write up. Let’s setup a synchronous meeting to discuss the results.

Some of the false alarms seem to be true alarms, e.g. the 7 alarms caused because operator crashed.

@taham0 , did you directly analyze the results after running Acto? This is the results without the static analysis support, I think after applying the static analysis the some false alarms should be gone.

A lot of the false alarms seem to due to invalid input, if the invalid input is also indicated on the warning level, then we should also try capture the warning level log

The 17 false alarms caused because input getting rejected is a bug in Acto we just discovered recently, Acto should recognize this is an invalid input if any error message appear from kubectl’s stderr

I am more curious why the true alarm number has decreased so much, which true alarms can not be reproduced anymore.

taham0 · 2022-08-17T04:22:55Z

@tylergu

As I mentioned, True Alarms # 3 and # 4 were not reproduced.
Yes I realized we could reduce the 17 FAs by fixing the bug, its great that it has been fixed
I did apply the static analysis support, lets discuss it in the sync meeting
Let me clarify that the 7 alarms are not directly because the operator crashed (in that case they would be TAs). They are because the operator crashed in a previous mutation (due to one of the mentioned TAs) and then Acto applied a new configuration over the crashed operator
I am available for a meeting right now, or let me know whatever time is suitable for you

taham0 · 2022-08-17T06:35:15Z

@tianyin @tylergu
I think since a large number of alarms were due to the Acto bug and preceding operator crashes, the results can improve a lot. I am currently running the newest acto and will report the new results as soon as the process completes and after our meeting later today

tianyin · 2022-08-17T06:36:10Z

That's awesome!! Thank you for all the hard work @taham0 !

taham0 · 2022-08-31T16:02:20Z

The FP rate for mongodb-community-operator has reduced to 44.4% (4 / 9 FA) after some improvements and the latest two evaluation results have been included above. All bugs were reproduced and a by-product bug was found.

taham0 assigned tylergu and taham0 Aug 17, 2022

tianyin mentioned this issue Aug 17, 2022

False alarm categories and solutions #165

Closed

tianyin assigned Essoz Aug 17, 2022

taham0 closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mdb-community] A summary of evaluation results for mongodb-community operator #166

[mdb-community] A summary of evaluation results for mongodb-community operator #166

taham0 commented Aug 17, 2022 •

edited

Loading

tianyin commented Aug 17, 2022

tylergu commented Aug 17, 2022

taham0 commented Aug 17, 2022 •

edited

Loading

taham0 commented Aug 17, 2022

tianyin commented Aug 17, 2022

taham0 commented Aug 31, 2022 •

edited

Loading

[mdb-community] A summary of evaluation results for mongodb-community operator #166

[mdb-community] A summary of evaluation results for mongodb-community operator #166

Comments

taham0 commented Aug 17, 2022 • edited Loading

Evaluation Results

True Alarms

False Alarms

Evaluation Result 28/08/22

Evaluation Result 31/08/22

tianyin commented Aug 17, 2022

tylergu commented Aug 17, 2022

taham0 commented Aug 17, 2022 • edited Loading

taham0 commented Aug 17, 2022

tianyin commented Aug 17, 2022

taham0 commented Aug 31, 2022 • edited Loading

taham0 commented Aug 17, 2022 •

edited

Loading

taham0 commented Aug 17, 2022 •

edited

Loading

taham0 commented Aug 31, 2022 •

edited

Loading