Automatically computing the regex for indeterministic properties #323
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Automate the process of figuring out the regex for the indeterministic properties. The algorithm is slightly complex:
Iterate on the original trials, in principle they should be the same
If they are not, the diffs should be the indeterministic fields
to be skipped when doing the comparison.
Naively, we can just append all the indeterministic fields and return
them, however, there are cases where the field path itself is not
deterministic (e.g. the name of the secret could be randomly generated)
To handle this randomness in the name, we can use the first two
original trials to compare the system state to get the initial
regex for skipping.
If we do not have random names, the subsequent trials should not
have additional indeterministic fields.
Then, we keep iterating on the rest of the original results, and
check if we have additional indeterministic fields. If we do, we
collect them and try to figure out the proper regex to skip them.
To come up with minimum regex for the paths with indeterministic names,
(e.g. root['secret']['my-secret-vczvds']),
without having an overly-general regex, we take the following approach:
2.1 If we currently don't have a proposed regex, or the current proposed regex does not match the current path, then try to propose a new regex. First generate a tentative regex by looking ahead the next item, and use difflib to get the matched substrings. Then use this tentative regex to check with the next next item to see if it matches. If matched, then we have a proposed regex, if not, then we give up and use the absolute path to be skipped.
2.2 If we already have a regex and the current path matches with it, we just continue