-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string to regex approximation used to strengthen membership constraints #4610
Conversation
src/ast/seq_decl_plugin.cpp
Outdated
Returns true iff e is the epsilon regex. | ||
*/ | ||
bool seq_util::re::is_epsilon(expr* r) const { | ||
expr* r1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this s, because the type is a sequence or string, not a regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean r1? here r is a regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expr* s = nullptr;
return is_to_re(r, s) && u.str.is_empty(s);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is my mistake, r1 should be named s
src/smt/seq_regex.cpp
Outdated
@@ -90,6 +90,8 @@ namespace smt { | |||
expr* s = nullptr, *r = nullptr; | |||
expr* e = ctx.bool_var2expr(lit.var()); | |||
VERIFY(str().is_in_re(e, s, r)); | |||
expr_ref _r(r, m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added because I am worried about line 123, where we assign r = re().mk_inter(r, s_approx);
but r
is an expr* not an expr_ref so this does not save the result to make sure it is not deallocated. However I made a mistake -- _r
needs to be set again to point to r
right after line 123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also to guard against r and s being deallocated in get_overapprox_regex
. Basically to ensure ownership over r
and s
for the duration of this function. However there may be a clearer way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be best to just change r and s to expr_ref
, then we don't have strange variables _r
and _s
to deal with. That may require some things to be changed to satisfy typechecking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e
owns r, s. So the reference is on e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, e
owns r
at the beginning, but we later modify r
without saving it. When we make re().mk_inter
no one owns it.
src/smt/seq_regex.cpp
Outdated
unsigned int n = 0; | ||
if (str().is_concat(s)) { | ||
str().get_concat(s, es); | ||
n = es.size() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
es.empty()?
src/smt/seq_regex.cpp
Outdated
// make sure to simplify so that epsilons are eliminated in | ||
// concatenations -- e.g. a sequence | ||
// (x ++ "" ++ y ++ "") will be approximated by .* | ||
for (unsigned i = n; i != (unsigned)(-1); i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (unsigned i = es.size(); i-- > 0; ) {
}
src/smt/seq_regex.cpp
Outdated
// (x ++ "" ++ y ++ "") will be approximated by .* | ||
for (unsigned i = n; i != (unsigned)(-1); i--) { | ||
expr_ref elem_i = get_overapprox_regex(es.get(i)); | ||
if (i == n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just set s_approx to null and instead of if (i == n) check if (!s_approx)
src/smt/seq_regex.cpp
Outdated
if (i == n) { | ||
s_approx = elem_i; | ||
} | ||
else if (!re().is_epsilon(elem_i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you wanted to also avoid concatenating dotstar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you could have
expr_ref result(m), last(m);
for (expr* e : es) {
s_approx = get_overapprox_regex(e);
if (!result)
result = s_approx;
else if (last != dotstar || s_approx != dotstar)
result = str.mk_concat(result, s_approx)
last = s_approx;
}
As a preprocessing step of regex membership constraints.
Converts a non-ground sequence into an additional regex and strengthens the original regex constraint into an intersection.
For example (x ++ "a" ++ y) in b* is converted to (x ++ "a" ++ y) in intersect((.* ++ "a" ++ .), b).
This will guarantee termination for some unsat cases that would otherwise not terminate.