E-Graphs Are Deterministic Finite Tree Automata (DFTAs) #104
Replies: 4 comments 3 replies
-
As Jules Jacobs points out in this twitter thread, there are likely a couple of caveats to the minimality assertion in this post. First, for upward merging to produce a minimal DFTA the e-graph roots (resp. DFTA accepting states) must have a special structure. Having a single root seems sufficient, but there might be a weaker property such as there must exist a path through the e-graph connecting all the roots. Second, unlike DFTA state merging minimization techniques, upward merging "learns" equivalences. This means upward merging changes the language recognized by the e-graph before and after merging. If we take a fully merged e-graph to be the "correct" version, then this likely does not pose a significant issue. Rather it shows that upward merging does not turn an arbitrary DFTA into a minimal DFTA the way DFTA minimization algorithms do. We suspect that, for the family of languages recognizable by an (single-rooted) e-graph (which we have not formally defined!), fully-merged e-graphs are minimal DFTAs for their languages. |
Beta Was this translation helpful? Give feedback.
-
This post inspired a short paper showing a similar result for version space algebras. Everything is DFTAs. https://arxiv.org/abs/2107.12568 |
Beta Was this translation helpful? Give feedback.
-
Very interesting post! Is it not disjointedness of states though, rather than being minimal, that would ensure the representation is closed under the congruence operator? If any two states share a term, they must be merged, and then it is up to re-determinisation to push any equivalences upwards. |
Beta Was this translation helpful? Give feedback.
-
UPDATE! E-graphs are DFTAs, but they are only minimal under certain conditions. Read this short paper for the latest and greatest.
By Josh Pollock and Altan Haan
In this post we'll explain how e-graphs, PL's golden egg, are deterministic finite tree automata (DFTAs) in disguise.
E-Graphs
This section provides a quick background on e-graphs. Feel free to skip this section if you already know what they are!
For a more detailed introduction to e-graphs and the egg library, check out the egg website.
What is an e-graph?
“An e-graph is a data structure to compactly store an equivalence relation (really a congruence relation) over terms.” - Fast and Extensible Equality Saturation with egg
A term is a tree of function symbols. An e-graph is often constructed via a collection of rewrite rules. Each rewrite rule has a left-hand and right-hand side. If we match the LHS against the current e-graph, we learn that the LHS = RHS and add this equality (and the RHS term if it doesn’t exist) to the e-graph. We repeat until we want to stop or we hit saturation. Additionally, we may learn that two terms with equivalent root symbols are congruent, which means that their children are in the same equivalence classes and so the functions themselves must be equivalent.
What is it used for?
Quoting again from Fast and Extensible Equality Saturation with egg:
An e-graph is a useful structure “for building compilers, optimizers, and synthesizers across many domains.” For example:
The egg paper won a distinguished paper award at POPL 2021.
Here is an example e-graph that is built up by several rewrites:
Note: For the purposes of this blog post, the exact details of how rewrite rules work and are applied is not important. We only care about the properties of the final e-graph. The final e-graph has discovered all congruences between terms and sub-terms it represents.
Deterministic Finite Automata (DFAs): A Refresher
Even if you've seen DFAs before, you might still want to read this section. We introduce DFAs in a slightly different way that generalizes more readily to DFTAs.
A deterministic finite automaton (DFA) is a finite-state machine that takes in a word (i.e. a string of symbols pulled from a finite alphabet) and either accepts or rejects that word. It operates on one symbol at a time. A DFA is deterministic, because it progresses through exactly one state at a time.
Here's an example DFA that reads in a string of 0's and 1's and accepts the string iff all the symbols are 1's (i.e. "and"-ing together all the symbols yields true):
Q = {q₀, q₁}
Qf = {q₁}
F = {0, 1}
Δ
:0 -> q₀
1 -> q₁
q₀0 -> q₀
q₀1 -> q₀
q₁0 -> q₀
q₁1 -> q₁
Most descriptions of DFAs specify starting states. In our formalism there are no start states, just special transition rules (
0 -> q₀
and1 -> q₁
) that don't take previous states.It's probably easiest to understand how this DFA works by looking at an example execution:
The machine first sees a 1 and doesn't have any existing state, so the
1 -> q₁
transition rule fires. The DFA is now in stateq₁
, which we show by replacing the first symbol withq₁
. Next, the DFA is in stateq₁
and sees symbol 1, so the ruleq₁1 -> q₁
fires, and the DFA transitions to stateq₁
again. This process continues until the DFA has consumed the entire input word. At the end, the DFA is in stateq₀
, which is not an accepting state so the input string is rejected.We can draw a nice picture of the DFA like so:
Each transition rule corresponds to a labelled edge in the graph. The label is the input token. The source (if there is one) is the current state of the DFA. The target is the state the DFA will transition into.
Representing a DFA as a graph is useful for evaluating input words by hand, since you can simulate the execution of a DFA by traversing the edges of the graph.
DFTAs: Deterministic Finite Tree Automata
DFTAs are like DFAs except instead of recognizing words, DFTAs recognize trees. The core pieces of the definitions remain the same. DFTAs are especially useful in PL, because we often work with ASTs, which are trees.
In the rest of this post we borrow heavily from Tree Automata Techniques and Applications (TATA) for definitions and theorems about DFTAs.
Formally a DFTA is a collection of the following:
A set of states
Q
.A subset of states
Qf ⊆ Q
that accept.An alphabet of function symbols
F = {f(-, ..., -)}
where eachf
has fixed (possibly 0) arity.A set of transition rules
{f(q₁, ..., qₙ) -> q}
whereq₁, ..., qₙ, q ∈ Q
andf ∈ F
. Since the DFTA is deterministic, the left-hand sides of the transition rules must be unique.This generalizes the DFA definition. There are multiple ways to simulate a DFA using a DFTA including having a special nullary symbol
#
and writing e.g.abc
asc(b(a(#)))
. In the treatment we gave in the previous section, we would introduce each symbol as both a nullary and unary function and then write e.g.abc
asc(b(a))
.Let’s see how a DFTA works by generalizing our DFA example significantly. We define the language of boolean expressions involving
0
,1
,~
, and&
. We can then define transition rules to determine whether the expression evaluates to true or false. The transitions rules are essentially just the semantics of the expressions. We borrowed this example from Xinyu Wang’s DFTA work.States:
Q = {q₀, q₁}
Accepting States:
Qf = {q₁}
Alphabet:
F = {0, 1, ∼(−), &(−, −)}
Transition Rules
Δ
:0 -> q₀
1 -> q₁
~(q₀) -> q₁
~(q₁) -> q₀
&(q₀, q₀) -> q₀
&(q₀, q₁) -> q₀
&(q₁, q₀) -> q₀
&(q₁, q₁) -> q₁
Here's an example execution of the DFTA on the input tree
~(1 & ~0)
, which we can write without syntactic sugar as~(&(1, ~(0)))
.Unlike a DFA where we need to track just one state at a time. We begin our DFTA execution by keeping track of a state at every leaf. We first transition
1
toq₁
and0
toq₀
using the transition rules0 -> q₀
and1 -> q₁
, respectively. Next we transition~(q₀)
toq₁
. So far other than having multiple simultaneous starting points, this is very similar to executing a DFA. The next step is more interesting. We encounter the binary operation&
. To figure out what state to transition to, we need to examine the states of its arguments. Since both of its arguments are in stateq₁
, we apply the rule&(q₁, q₁) -> q₁
to advance to stateq₁
. Finally the DFTA advances to stateq₀
, which is not an accepting state, and so the input is rejected (i.e. it does not evaluate to true).As with a DFA, we can make a compact representation of a DFTA. However, instead of having at most one source, we now have one source per function argument! We still have just one target per transition.
We use the same legend as before:
We first add
0
,1
, and the two~
transitions. (Notice this is actually a DFA! It recognizes words starting with0
or1
followed by zero or more~
s.) Nothing fancy going on yet.After this we add the four transitions for
&
. Each of them has two inputs/sources and one output/target.Formally, each transition rule is a labelled hyperedge (which is just another way of saying an edge with multiple sources and targets).
Since these diagrams grow very large very quickly, we can take advantage of the fact that each transition rule has exactly one target to make a more compact representation. We make a box for each state of the DFTA. Then we push each label pointing to that state into the box.
Much simpler!
This section is inaccurate. See this paper for more details. Myhill-Nerode Theorem (and Congruence Closure)
The astute reader may have noticed the visualization above looks a lot like an e-graph! We're almost there, but our DFTA formalism is still missing one crucial feature: congruence. In this section we'll see how congruence "pops out" of the Myhill-Nerode theorem for DFTAs.
There is an important theorem associated with DFTAs that is a generalization of a corresponding theorem for DFAs. This theorem gives us a way to describe a canonical minimal DFTA for any recognizable language (that is, for any language we can recognize with some, not necessarily minimal, DFTA). This is nice, because it allows us to optimize our DFTAs and also quickly identify whether or not two DFTAs are equivalent.
Theorem. Every DFTA
A = (Q, Qf, F, Δ)
recognizing a languageL
has a unique (and complete*) minimal DFTAA_min
which recognizesL
(i.e. has the minimal number of states), up to relabeling. (See TATA for a precise statement and proof.)Proof sketch. This theorem is a direct corollary of the Myhill-Nerode theorem for DFTAs (which itself is a generalization of the Myhill-Nerode theorem for DFAs). Here’s a proof sketch for Myhill-Nerode.
DFTA states are "forgetful." If two terms end up in the same state, the DFTA cannot distinguish between them since it remembers only their final state and not where they came from. We can run this implication in reverse and describe precisely which terms our DFTA doesn’t need to distinguish between while still recognizing
L
. The minimal DFTA is maximally forgetful. If two termsu
andv
can be substituted for each other in any context without affecting inclusion or exclusion inL
, then the minimal DFTA doesn’t need to distinguish between them. In particular we can define the congruence≡L
onT(F)
, the set of terms using symbols inF
, byu ≡L v := ∀f ∈ F. f(..., u, ...) ∈ L ⟺ f(..., v, ...) ∈ L
Note the surrounding ellipses denoted by
...
must be equal on either side of the iff and may be arbitrary terms inT(F)
.Essentially,
u ≡L v
whenu
andv
can be "substituted" for each other in any context without affecting inclusion inL
. Now,A_min
can be defined as follows:Q_min
are the equivalence classes of≡L
.Qf_min
are the equivalence classes of all terms inL ⊆ T(F)
.f([u₁], ..., [uₙ]) -> [f(u₁, ..., uₙ)]
for eachf ∈ F
where[u]
is the equivalence class ofu
.We refer the reader to TATA for a minimization procedure that computes
A_min
usingA
.*We say a DFTA is complete when there is precisely one transition for each
f ∈ F
and input statesqᵢ
. It is possible to obtain an (incomplete) DFTA which recognizesL
that is one state smaller than the one described here. In particular, consider the set of termsπ = {u ∣ ∀f ∈ F. f(..., u, ...) ∉ L} ⊆ T(F)
If
π ≠ ∅
, thenπ ∈ Q_min
as it defines an equivalence class of≡L
. Additionally, for anyf
andq
such thatf(..., π, ...) -> q
, we have thatq
satisfies the same property asπ
(as otherwiseπ
would have an accepting context). Hence it follows thatq = π
, and all transition rules withπ
on the left-hand side transition toπ
. Effectively,π
is a "garbage state" (and is explicitly constructed when completing incomplete DFTAs). Once some sub-term enters stateπ
, all future transitions return back toπ
. If we don't care about completeness, we can safely delete this state and all associated transitions. The resulting incomplete DFTA will just "get stuck" without a transition whenever a term would have previously looped on the garbage state.Note: Garbage states often have many transitions, so they usually aren't worth keeping around.
E-graphs ⊆ DFTAs
Notice
Q_min
are e-classes, and each transitionf([u_1], ..., [u_n]) -> [f(u_1, ..., u_n)]
is an e-node with symbolf
belonging in e-class[f(u_1, ..., u_n)]
with child e-classes[u_1], ..., [u_n]
. Thus minimal DFTAs and e-graphs are really the same thing!Note: An e-graph induces a tree language
L
of the set of all terms that can be extracted from some distinguished set of e-classes. In practice this is the typically the root e-class.To be more concrete, consider the example from before:
We can redraw this in the box notation we gave earlier.
The corresponding DFTA is then given by:
Q = {q_a, q_1, q_2, q_{a*2}}
Qf = {q_a}
F = {a, 1, 2, *(-, -), <<(-, -), /(-, -)}
Δ
:a -> q_a
*(q_a, q_1) -> q_a
/(q_{a*2}, q_2) -> q_a
1 -> q_1
/(q_2, q_2) -> q_1
2 -> q_2
*(q_a, q_2) -> q_{a*2}
<<(q_a, q_1) -> q_{a*2}
Et voilà!
Discussion
We are aware of a couple papers that have considered the relationship between DFTAs and e-graphs. (Thanks to Remy Wang for finding these!) Gulwani et al. relate tree automata to EDAGs, which is another word for e-graphs. Bachmair et al. relate tree automata to D-rules. In contrast, our work uses minimal DFTAs and the Myhill-Nerode theorem to give a more precise correspondence between tree automata and e-graphs. Our post is also the first to discuss this correspondence using e-graph terminology.
Independently from the e-graph literature, PL researchers have recently found uses for tree automata. For example, tree grammars have been used to write better parsers. Tree automata intersection has been used for PBE synthesis combined with abstraction-refinement. In fact we first realized the connection between tree automata and e-graphs by trying to translate these synthesis papers to e-graphs! You can read more about e-graph intersection here.
The correspondence between e-graphs and DFTAs prompts us to ask questions about how to apply techniques from both bodies of literature. For example, could the synthesis and parsing techniques above be further enhanced using equality saturation? How might inverse tree homomorphisms be used in e-graphs?
Additionally, egg provides a convenient API for experimenting with and implementing tree automata techniques, even when equivalence classes aren't required.
We hope this post gives you another perspective on e-graphs and some promising new directions to explore!
Acknowledgements
Thanks to Max Willsey for reviewing this post! Thanks to Jimmy Koppel for encouraging us to write it.
Beta Was this translation helpful? Give feedback.
All reactions