Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string to regex approximation used to strengthen membership constraints #4610

Merged
merged 2 commits into from
Aug 1, 2020

Conversation

veanes
Copy link
Collaborator

@veanes veanes commented Aug 1, 2020

As a preprocessing step of regex membership constraints.
Converts a non-ground sequence into an additional regex and strengthens the original regex constraint into an intersection.
For example (x ++ "a" ++ y) in b* is converted to (x ++ "a" ++ y) in intersect((.* ++ "a" ++ .), b).
This will guarantee termination for some unsat cases that would otherwise not terminate.

Returns true iff e is the epsilon regex.
*/
bool seq_util::re::is_epsilon(expr* r) const {
expr* r1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this s, because the type is a sequence or string, not a regex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean r1? here r is a regex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expr* s = nullptr;
return is_to_re(r, s) && u.str.is_empty(s);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is my mistake, r1 should be named s

@@ -90,6 +90,8 @@ namespace smt {
expr* s = nullptr, *r = nullptr;
expr* e = ctx.bool_var2expr(lit.var());
VERIFY(str().is_in_re(e, s, r));
expr_ref _r(r, m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added because I am worried about line 123, where we assign r = re().mk_inter(r, s_approx); but r is an expr* not an expr_ref so this does not save the result to make sure it is not deallocated. However I made a mistake -- _r needs to be set again to point to r right after line 123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also to guard against r and s being deallocated in get_overapprox_regex. Basically to ensure ownership over r and s for the duration of this function. However there may be a clearer way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to just change r and s to expr_ref, then we don't have strange variables _r and _s to deal with. That may require some things to be changed to satisfy typechecking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e owns r, s. So the reference is on e.

Copy link
Contributor

@cdstanford cdstanford Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, e owns r at the beginning, but we later modify r without saving it. When we make re().mk_inter no one owns it.

unsigned int n = 0;
if (str().is_concat(s)) {
str().get_concat(s, es);
n = es.size() - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

es.empty()?

// make sure to simplify so that epsilons are eliminated in
// concatenations -- e.g. a sequence
// (x ++ "" ++ y ++ "") will be approximated by .*
for (unsigned i = n; i != (unsigned)(-1); i--) {
Copy link
Contributor

@NikolajBjorner NikolajBjorner Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (unsigned i = es.size(); i-- > 0; ) {

}

// (x ++ "" ++ y ++ "") will be approximated by .*
for (unsigned i = n; i != (unsigned)(-1); i--) {
expr_ref elem_i = get_overapprox_regex(es.get(i));
if (i == n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just set s_approx to null and instead of if (i == n) check if (!s_approx)

if (i == n) {
s_approx = elem_i;
}
else if (!re().is_epsilon(elem_i)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you wanted to also avoid concatenating dotstar

Copy link
Contributor

@NikolajBjorner NikolajBjorner Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, you could have

expr_ref result(m), last(m);
for (expr* e : es) {
    s_approx = get_overapprox_regex(e);
    if (!result) 
        result = s_approx;
     else if (last != dotstar || s_approx != dotstar)
        result = str.mk_concat(result, s_approx)
     last = s_approx;
}

@NikolajBjorner NikolajBjorner merged commit 8137143 into Z3Prover:master Aug 1, 2020
@veanes veanes deleted the string-to-regex-approx branch August 10, 2020 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants