-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement marker trees using algebraic decision diagrams #5898
Conversation
f86e792
to
2b5c772
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is next-level awesome. Solves a huge problem we've been having with marker expressions and solves it well.
What do you think about making MarkerTree::and
and MarkerTree::or
smart constructors instead of routines that mutate trees in place? Or perhaps that's for another PR, in order to keep the amount of code changed outside of pep508_rs
smaller.
//! (> '3.7') -> os_name: | ||
//! (> 'Linux') -> FALSE | ||
//! (== 'Linux') -> TRUE | ||
//! (< 'Linux') -> FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this isn't (!= 'Linux') -> FALSE
and (== 'Linux') -> TRUE
? Why the inequalities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had to guess, I'd say it's because the inequalities are more general and easier to merge with other expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Even if we did merge the two FALSE
children, we would have to represent the != 'Linux'
as (< 'Linux') or (> 'Linux')
with pubgrub ranges. We could merge them to reduce the size of the tree, but for now this is simpler.
edges | ||
} | ||
|
||
/// Merge two [`Edges`], applying the given function to all disjoint, intersecting edges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some trouble grok'ing this routine. The map2
name I think also confused me a bit. Perhaps expand the comment here some?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more general documentation. Is it clearer now?
/// expression. | ||
/// | ||
/// If the marker is `true`, this method will return `None`. | ||
/// If the marker is `false`, the marker is represented as the normalized expression, `python_version < '0'`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a nice API decision. Prevents one from emitting an "invalid" marker, but still makes it easy to print a debug repr.
5b18a9c
to
db25cf1
Compare
Yes, I also want to remove uses of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ADDs are much better than my naive DNF idea. All the maths looks well-implemented.
/// For non-boolean variables, this is more complex. See `apply_ranges` for details. | ||
/// | ||
/// Note that the LHS and RHS must be of the same [`Edges`] variant. | ||
fn apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever used with something other than AND?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but it's possible that we may want to reuse this for other operations in the future (such as XOR
or IFF
).
"python_full_version <= '3.10' and python_version < '3.10'", | ||
"python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10'", | ||
"python_full_version <= '3.10' and python_version > '3.10'", | ||
"python_full_version > '3.10' and python_version > '3.10'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -6471,23 +6471,23 @@ fn no_strip_markers_multiple_markers() -> Result<()> { | |||
----- stdout ----- | |||
# This file was autogenerated by uv via the following command: | |||
# uv pip compile --cache-dir [CACHE_DIR] requirements.in --no-strip-markers --python-platform windows | |||
attrs==23.2.0 ; python_version > '3.11' or sys_platform == 'win32' | |||
attrs==23.2.0 ; sys_platform == 'win32' or python_version > '3.11' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the ordering defined now, is it the ordering of the marker variable flowing down? It looks like in conjunctions and disjunctions the ordering is now different. Not high priority as long as the ordering is deterministic and robust, but might be easy enough to justify changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the ordering of the markers flowing down yeah. In this case the original expression would have been (python_version <= '3.11' and sys_platform == 'win32') or python_version > '3.11
before simplifying, which is why the order appears reversed despite python_version
being the highest order variable.
I guess we could sort these the same way we used to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, doing a final sort on simplified markers is preferential for users not seeing a change in equivelent output if something about the resolution itself changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolution is still deterministic, but sorting may result in a more syntactically consistent order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in a scenario where the input changed, (e.g. the user changed the order, or added requirements that were previously transative dependencies, or new versions of packages changed the requirements in some similiar sense), but the output was equivelent (e.g. same packages, same versions, same markers), but perhaps the marker order could be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should never happen. The decision diagram is canonical for any functionally equivalent marker and the simplification routine is deterministic. All sorting could do is give us a different canonical output for a given marker, but it shouldn't affect whether or not it is canonical.
|
||
// Simplify all marker expressions based on the requires-python bound. | ||
// | ||
// This is necessary to ensure the a `Lock` deserialized from a lockfile compares |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between fresh and from lock? We always know the current requires-python in the resolver (we determine it on fork construction), so marker construction should already have this information.
Also note that requires-python is the bound for the entire lockfile, but individual forks can have stricter bounds, e.g. say overall is >=3.7
, but there's one fork >=3.7,<3.8
and one fork >=3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is emulating the call to simplify_python_versions
when constructing the ResolutionGraph
.
I agree that marker construction should always have this information, I would like to avoid simplifying requires-python and extras in a separate step.
@@ -106,7 +106,8 @@ impl<'a, Context: BuildContext> SourceTreeResolver<'a, Context> { | |||
origin: Some(origin.clone()), | |||
marker: requirement | |||
.marker | |||
.and_then(|marker| marker.simplify_extras(extras)), | |||
.map(|marker| marker.simplify_extras(extras)) | |||
.filter(|marker| !marker.is_true()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is_false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think !marker.is_true()
does not imply marker.is_false()
. I do like the succinctness of the method names, but they do suggest a mutual exclusivity that isn't there. Perhaps is_always_true
and is_always_false
are better names? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, totally missed that
Could you also a top level doc comment sentence that we implement all |
8ed4dab
to
8cb5abe
Compare
8cb5abe
to
cccbe8c
Compare
This PR rewrites the `MarkerTree` type to use algebraic decision diagrams (ADD). This has many benefits: - The diagram is canonical for a given marker function. It is impossible to create two functionally equivalent marker trees that don't refer to the same underlying ADD. This also means that any trivially true or unsatisfiable markers are represented by the same constants. - The diagram can handle complex operations (conjunction/disjunction) in polynomial time, as well as constant-time negation. - The diagram can be converted to a simplified DNF form for user-facing output. The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163). Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the `marker` module. I'd suggest reading through the `marker/algebra.rs`, `marker/simplify.rs`, and `marker/tree.rs` files for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed: - [We now use release-only comparisons for `python_full_version`, where we previously only did for `python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522). I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect. - [Meaningless marker expressions are now ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502). This means that a marker such as `'x' == 'x'` will always evaluate to `true` (as if the expression did not exist), whereas we previously treated this as always `false`. It's negation however, remains `false`. - [Unsatisfiable markers are written as `python_version < '0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329). - The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub` crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends on the `pubgrub` crate for the `Range` type, we probably want to move `pubgrub::Range` into a separate crate to break this, but I don't think that should block this PR (cc @konstin). There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver. - We still use `Option<MarkerTree>` throughout uv, which is unnecessary now that `MarkerTree::TRUE` is canonical. - The `MarkerTree` type is now interned globally and can potentially implement `Copy`. However, it's unclear if we want to add more information to marker trees that would make it `!Copy`. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction. - We don't currently combine `python_full_version` and `python_version` markers. - I also have not spent too much time investigating performance and there is probably some low-hanging fruit. Many of the test cases I did run actually saw large performance improvements due to the markers being simplified internally, reducing the stress on the old `normalize` routine, especially for the extremely large markers seen in `transformers` and other projects. Resolves #5660, #5179.
Amazing work.
Does this mean we no longer compute unions, etc., to combine |
We compute unions of |
Makes sense thank you. |
Summary
This PR rewrites the
MarkerTree
type to use algebraic decision diagrams (ADD). This has many benefits:The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163).
Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the
marker
module. I'd suggest reading through themarker/algebra.rs
,marker/simplify.rs
, andmarker/tree.rs
files for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed:python_full_version
, where we previously only did forpython_version
. I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect.'x' == 'x'
will always evaluate totrue
(as if the expression did not exist), whereas we previously treated this as alwaysfalse
. It's negation however, remainsfalse
.python_version < '0'
.PubGrubSpecifier
type has been moved to the newuv-pubgrub
crate, shared bypep508-rs
anduv-resolver
.pep508-rs
also depends on thepubgrub
crate for theRange
type, we probably want to movepubgrub::Range
into a separate crate to break this, but I don't think that should block this PR (cc @konstin).There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver.
Option<MarkerTree>
throughout uv, which is unnecessary now thatMarkerTree::TRUE
is canonical.MarkerTree
type is now interned globally and can potentially implementCopy
. However, it's unclear if we want to add more information to marker trees that would make it!Copy
. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction.python_full_version
andpython_version
markers.normalize
routine, especially for the extremely large markers seen intransformers
and other projects.Resolves #5660, #5179.