Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how packages describe how they contain the components of an embedded package #744

Merged
merged 20 commits into from
Oct 30, 2024

Conversation

jrray
Copy link
Collaborator

@jrray jrray commented Jun 2, 2023

Addresses the question from #742.

The original spec schema had multiple places where an embedded package could be defined:

pkg: demo/1.0.0
install:
  components:
    - name: comp1
      embedded:
        # inside a component
        - pkg: embedded/2.0.0
  embedded:
    # outside a component
    - pkg: embedded/2.0.0

It becomes confusing if the embedded package also has components. Where do those components get defined? How can the parent package say which of its components contains the embedded packages' components?

A motivating example

The package boost contains a lot of headers only needed during build so it benefits from having those files live only in the "build" component. A second package, boost-python, is also a full build of boost (plus extra python things), so it needs to embed boost. Since the boost-python package is a full build of boost, it also benefits from having the headers only live in the "build" component. It needs to be able to say that the boost-python:build component embeds the boost:build component.

pkg: boost-python/1.76.0
install:
  components:
    - name: build
      files:
        - include/
        - lib/cmake/
        - lib/*.a
        - lib/*.so
      uses:
        - run
      embedded_components:    # <-- new
        - boost:build
    - name: run
      files:
        - "*"
      file_match_mode: Remaining
      embedded_components:    # <-- new
        - boost:run
  embedded:
    - pkg: boost/1.76.0
      install:
        components:
          - name: build
            uses:
              - run
          - name: run

This shows how the boost-python spec can now be written. The embedded boost package is defined once under install.embedded and its components are defined here. The new install.components[].embeded_components field is used to specify for each component of boost-python, which components of boost are present.

Solver logic changes

There can be a situation where a certain combination of requests will cause the solver to fail even though there is nothing "wrong" about those requests.

First, there needs to be a request for a specific component of a real package: real-package:comp1. Assuming real-package:comp1 exists, this request will solve successfully.

Then, some other package needs to exist that embeds real-package, let's call it fake-package. Assume fake-package:comp1 embeds real-package:comp1 and other components exist inside fake-package.

Now, add a request for a specific component of fake-package that isn't comp1: fake-package:comp2. As in, spk env real-package:comp1 fake-package:comp2.

The problem arises when fake-package replaces real-package in the solve, but only fake-package:comp2 is part of the solution, because this does not include (embed) real-package:comp1.

The solver now needs to detect that the current set of requests do not bring in all the necessary components of the parent package in order to have the required components of the embedded package satisfied. It effectively needs to translate the real-package:comp1 request into a request for fake-package:comp1. When this situation is detected, the state will be rolled back to just before fake-package is added to the solution, and a request for fake-package:comp1 is added. This combines with the explicit request for fake-package:comp2, making the merged request into fake-package:{comp1,comp2}. Now when fake-package is added to the solution it will include comp1 and the request for real-package:comp1 will be satisfied.

Comment on lines 1834 to 1835
{"name": "comp1"},
{"name": "comp2"},
Copy link
Collaborator Author

@jrray jrray Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solve fails if both the components aren't named here, because when it tries to find a build of victim it sees this spec and claims real-pkg doesn't have a component named comp2 ("does not define requested components").

@jrray
Copy link
Collaborator Author

jrray commented Jun 2, 2023

Test output here for reference:

-------------- TEST START --------------
State Requests: 
State Resolved: 
State  VarReqs: ""
State  Options: {}
 INITIAL REQUEST fake-pkg:comp1/* (PreReleasePolicy: ExcludeAll, InclusionPolicy: Always, RequiredCompat: Binary)
 INITIAL REQUEST victim:run/* (PreReleasePolicy: ExcludeAll, InclusionPolicy: Always, RequiredCompat: Binary)
State Requests: fake-pkg:comp1/* (RequiredCompat: Binary), victim:run/* (RequiredCompat: Binary)
State Resolved: 
State  VarReqs: ""
State  Options: {}
> RESOLVE fake-pkg/1.0.0/3I42H3S6  (requested by spk's test suite)
. REQUEST real-pkg/=1.0.0/embedded (RequiredCompat: Binary)
. RESOLVE real-pkg/1.0.0/embedded  (requested by embedded in fake-pkg/1.0.0/3I42H3S6)
. ASSIGN {fake-pkg=~1.0.0}
State Requests: fake-pkg:comp1/* (RequiredCompat: Binary), victim:run/* (RequiredCompat: Binary), real-pkg/=1.0.0/embedded (RequiredCompat: Binary)
State Resolved: fake-pkg/1.0.0/3I42H3S6, real-pkg/1.0.0/embedded
State  VarReqs: ""
State  Options: {fake-pkg=~1.0.0}
>> RESOLVE victim/1.0.0/3I42H3S6  (requested by spk's test suite)
.. REQUEST real-pkg:comp2/1.0.0
.. ASSIGN {victim=~1.0.0}
Installed Packages:
  mem/fake-pkg:comp1/=1.0.0/3I42H3S6 highest (required by spk's test suite) {} 
  real-pkg/=1.0.0/embedded           highest (required by embedded in fake-pkg/1.0.0/3I42H3S6, victim/1.0.0/3I42H3S6) {} 
  mem/victim:run/=1.0.0/3I42H3S6     highest (required by spk's test suite) {} 
 Number of Packages: 3
thread 'solver::solver_test::test_solver_components_interaction_with_embeds' panicked at 'assertion failed: `(left == right)`
  left: `{Named("comp1")}`,
 right: `{Named("comp1"), Named("comp2")}`', crates/spk-solve/src/./solver_test.rs:1891:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test solver::solver_test::test_solver_components_interaction_with_embeds ... FAILED

@jrray
Copy link
Collaborator Author

jrray commented Jun 2, 2023

So I think what is missing is in two parts:

  1. There needs to be a way for the embedd-er to name which components of the embedd-ee are embedded within the components of the embedd-er.
  2. From the added info provided by [1], the solver needs to detect that a request for a component hasn't been satisfied and inject the additional components of the embedd_er_ that will satisfy them. Or, this could just be detected as unsatisfiable and require the requests to be changed.

@jrray
Copy link
Collaborator Author

jrray commented Jun 2, 2023

Now I'm thinking the problem runs deeper, and the spec schema has a problem that two incompatible specs can be presented for the same embedded package in different components. Example:

pkg: embedder/1.0.0
install:
  components:
    - name: comp1
      embedded:
        - pkg: embedee/1.0.0
          install:
            components:
              - name: comp1
    - name: comp2
      embedded:
        - pkg: embedee/2.0.0
          install:
            components:
              - name: comp2

The TODOs in the test mention how this causes solver problems, and this goes a step further to show how two different incompatible versions of embedee are described between the two components.

What piece of information that is needed in the install.components[].embedded position is what components of embedded packages are present in that (parent) component. This should refer to an embedded package that has been defined once in the non-component way. Example:

pkg: embedder/1.0.0
install:
  components:
    - name: comp1
      embedded_packages:
        - embedee:comp1
    - name: comp2
      embedded_packages:
        - embedee:{comp2,comp3}
  embedded:
    - pkg: embedee/1.0.0
      install:
        components:
          - name: comp1
          - name: comp2
          - name: comp3

The embedded package embedee is defined once, in full, in the position install.embedded[]. Then, in install.components[].embedded_components there are optionally references to that package to specify which components are present.

Now it is clear what the spec for the embedded stub should be (it only appears once in the parent spec) and it's clearer that this spec needs to define all the components of the embedded package.

@jrray
Copy link
Collaborator Author

jrray commented Jun 3, 2023

I checked in some code while I noodle around with ideas. There definitely isn't good test coverage for this territory because all the tests are passing, but this code is almost certainly still broken.

docs/ref/spec.md Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 3, 2023

Codecov Report

Attention: 67 lines in your changes are missing coverage. Please review.

Comparison is base (e8cf549) 53.55% compared to head (1ec935c) 54.50%.
Report is 135 commits behind head on main.

❗ Current head 1ec935c differs from pull request most recent head 80776d3. Consider uploading reports for the commit 80776d3 to get more accurate results

Files Patch % Lines
...es/spk-schema/crates/ident/src/ident_optversion.rs 16.00% 21 Missing ⚠️
...spk-schema/crates/foundation/src/version/compat.rs 0.00% 12 Missing ⚠️
...ates/spk-schema/src/component_embedded_packages.rs 75.00% 8 Missing ⚠️
crates/spk-solve/crates/graph/src/graph.rs 87.75% 6 Missing ⚠️
...rates/spk-schema/crates/ident/src/parsing/ident.rs 0.00% 4 Missing ⚠️
crates/spk-schema/src/install_spec.rs 88.23% 4 Missing ⚠️
...ates/validation/src/validators/embedded_package.rs 78.94% 4 Missing ⚠️
crates/spk-solve/src/solver.rs 91.48% 4 Missing ⚠️
...lve/crates/validation/src/validators/components.rs 0.00% 2 Missing ⚠️
...ve/crates/validation/src/validators/pkg_request.rs 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #744      +/-   ##
==========================================
+ Coverage   53.55%   54.50%   +0.95%     
==========================================
  Files         258      254       -4     
  Lines       20466    20107     -359     
==========================================
- Hits        10960    10959       -1     
+ Misses       9506     9148     -358     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrray jrray force-pushed the jrray/issue742 branch from 593ccd0 to f11c0e8 Compare June 5, 2023 18:46
@jrray jrray added the agenda item Items to be brought up at the next dev meeting label Jun 5, 2023
@jrray jrray self-assigned this Jun 5, 2023
@jrray jrray force-pushed the jrray/issue742 branch from f11c0e8 to 09766dc Compare June 5, 2023 20:56
@jrray jrray changed the title Draft: Add solver test for components and embeds Rework how packages describe how they contain the components of an embedded package Jun 5, 2023
@jrray jrray linked an issue Jun 5, 2023 that may be closed by this pull request
@rydrman
Copy link
Collaborator

rydrman commented Jun 7, 2023

From the meeting today:

  • generally a reasonable approach
  • consider changing embedded to virtual packages that get embedded into components
    • what do we call packages that embedded other packages? (parent? host?)
  • if you have a virt package defined that is not included in a component, is that an error? is that a lint issue? warning?
    • better to be an error at build time, but not at use/run time
  • relates to / might conflict with Formalize embedded packages as a generic type #564
  • this all might be more for v1, so in this PR the TODO is to flesh out more test cases and capture edge case behaviours

@jrray jrray force-pushed the jrray/issue742 branch 2 times, most recently from 0632e01 to ddf8caa Compare June 9, 2023 22:07
@jrray
Copy link
Collaborator Author

jrray commented Jun 9, 2023

I added support for embedded multiple versions of the same package. It complicated the code by a large factor.

@jrray
Copy link
Collaborator Author

jrray commented Jun 9, 2023

At this stage I think embedded_packages is a better name than embedded_components. It fits better since this is describing what packages are embedded within the component, with the added information of which components of the embedded package are being embedded.

@jrray jrray force-pushed the jrray/issue742 branch 2 times, most recently from 191a250 to b714d99 Compare June 10, 2023 00:00
@jrray jrray changed the base branch from main to embedded-requirements-test June 11, 2023 21:20
Comment on lines +2211 to +2273
{"name": "build", "embedded_packages": ["dep-e1:all"]},
{"name": "run", "embedded_packages": ["dep-e2:all"]},
],
"embedded": [
{"pkg": "dep-e1/1.0.0"},
{"pkg": "dep-e2/1.0.0"},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this change since it is the only test that had to be modified due to the changes in this PR. The spirit of the test is unchanged but the spec had to be modified.

@jrray jrray force-pushed the jrray/issue742 branch 2 times, most recently from 37cbd47 to 8494b3e Compare June 11, 2023 22:03
Base automatically changed from embedded-requirements-test to main June 22, 2023 23:54
@jrray
Copy link
Collaborator Author

jrray commented Aug 23, 2024

Another angle to this problem came up today. Although argparse is bundled in the python standard library, it is also sometimes listed as a dependency of packages in pypi and in our environment the use of spk convert pip -- unittest2 ended up creating and depending on a python-argparse package.

I'd say there's a general problem here with respect to packages bundled in the system library that are also in pypi, and there's a category of that problem that argparse falls into where the package is old and not getting updates.

The python package basically needs a way to specify that it contains python-argparse, but the twist is that we don't want it to prevent (in the general sense) allowing a newer version of python-argparse to be brought into the environment. So part of this requires the functionality in this PR to add python-argparse as both a component and embedded, then the expected behavior would be:

  1. make the argparse component included in the environment by default (because it is expected to be there via the standard library); but
  2. allow a request for python-argparse to overrule the default request for the argparse and conflict with an explicit request for the argparse component; and
  3. prevent the external python-argparse package from being a version lower than the version in the argparse component.

I think this might all be possible with this PR and the existing install.requirements functionality.

  1. the argparse component will embed python-argparse but not be included by default
  2. the top-level python package will have an install.requirement for python-argparse of at least the version that is bundled in python

When solving for python, the install.requirement will force the solver into either picking up a newer "real" python-argparse or it will pick up the argparse component via finding it through an embedded stub.

What I'm not sure already works [with this PR] is the solver finding a stub for an embedded package that comes from a component and having that add a request for that component to an existing request for the provider package.

Edit: I have learned a little more about our situation with argparse, and it involves a DCC that includes python but we're not "embedding" python in this case but allowing the DCC python and "real" python to coexist in the environment. To get things to (mostly) work some $PYTHONPATH shenanigans are making python find the site-package's argparse.py before the standard library's copy. I'm not sure if this fact entirely nullifies the issue being raised here.

@rydrman
Copy link
Collaborator

rydrman commented Sep 18, 2024

From the meeting today:

  • default run components embed all install.embedded entries
  • default build components DO NOT, but they do need to use the run component
install:
  components:
  - name: cmpt
    embedded: 
      - name:component
      - name:all # all components
      - name # this is an error, must specify components
  embedded:
  - pkg: name

@jrray jrray force-pushed the jrray/issue742 branch 2 times, most recently from b793bd9 to 9822d73 Compare September 19, 2024 01:20
@jrray
Copy link
Collaborator Author

jrray commented Sep 19, 2024

From the meeting today:

* default `run` components embed all `install.embedded` entries

* default `build` components DO NOT, but they do need to `use` the `run` component

I'm not convinced that the extra complexity of implementing this adds any value over what this PR is already doing. If install.components[].embedded is empty, it is given a "fabricated" value where the components of install.embedded[] are mapped 1:1.

So for example:

pkg: demo/1.0.0
install:
  embedded:
    - pkg: embedded1/1.0.0
    - pkg: embedded2/1.0.0

expands to:

pkg: demo/1.0.0
install:
  components:
    - name: build
      embedded:
        - embedded1:build/1.0.0
        - embedded2:build/1.0.0
    - name: run
      embedded:
        - embedded1:run/1.0.0
        - embedded2:run/1.0.0
  embedded:
    - pkg: embedded1/1.0.0
      install:
        components:
          - name: build
          - name: run
    - pkg: embedded2/1.0.0
      install:
        components:
          - name: build
          - name: run

Changing how the default build and run components are generated feels a bit out of scope for this PR.


/// Parse a package name with optional components and optional version.
///
/// The package name must either be followed by a `/` or the end of input.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems like it's wrong for this function...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not wrong but I changed it a little.

jrray added 19 commits October 29, 2024 13:53
Remove embedded from ComponentSpec.

This new field provides a way to specify which components of the embedded
package are actually present inside the component(s) of the parent package.

It is now expected that an embedded package is defined only once in an
entry in InstallSpec.embedded.

Now that install.components[].embedded is removed, it is no longer possible
to nest embedded packages.

Signed-off-by: J Robert Ray <[email protected]>
This is needed to describe which components are being embedded.

Signed-off-by: J Robert Ray <[email protected]>
Build on the ComponentsMissing type to be able to detect when missing
components are from some package not embedding them (or requests for them
are missing).

    .. TRY victim/1.0.0/3I42H3S6 - fake-pkg embeds real-pkg but does not provide all required components: needed comp2, have comp1

Signed-off-by: J Robert Ray <[email protected]>
If a package asks for some component `dep:comp1` and `dep` is already in
the solution because it comes embedded in some other package `other`, but
`dep:comp1` is embedded in some component of `other:comp2` and no request
for `other:comp2` is present, then in order to satisfy the requirement for
`dep:comp1`, automatically detect this situation, find what component(s) of
`other` provide the missing components, roll back the state to before
`other` was resolved, and inject a request for those components.

Signed-off-by: J Robert Ray <[email protected]>
Demonstrate functionality and correct solution for a request
that mixes components and embedded components.

Signed-off-by: J Robert Ray <[email protected]>
If embedded_components aren't explicitly defined, default to mapping the
components of the embedded package(s) to the components of the host package
with the same name(s).

Signed-off-by: J Robert Ray <[email protected]>
If the embedded_components field was populated by calculating default
values, don't serialize the "fabricated" value.

Signed-off-by: J Robert Ray <[email protected]>
Verify the rest of the expected default behavior.

Signed-off-by: J Robert Ray <[email protected]>
Demonstrate the expected behavior for packages that embed packages that
define components that the host package does not define. It is up to the
package author to manually assign the extra component to one or more host
components.

Signed-off-by: J Robert Ray <[email protected]>
Like VersionIdent but the Version is optional. Although strings without a
version can be parsed into a VersionIdent, the knowledge of if the version
was explicitly specified is lost.

Signed-off-by: J Robert Ray <[email protected]>
This is needed to be able to specify the version number of an embedded
package, in the event that two different versions of the same package are
being embedded.

Signed-off-by: J Robert Ray <[email protected]>
If an embedded package is explicitly defined in embedded_components, but
no components are named, then treat this as if ":all" was given, since
embedding an empty component set is not useful.

Signed-off-by: J Robert Ray <[email protected]>
When validating, and adding requests, only add requests for the embedded
package(s) that are enabled in the active component(s) of the host package.

Signed-off-by: J Robert Ray <[email protected]>
It's now clearer how the original validation that was happening here can
be translated into the new code. Refactor the validation code from
EmbeddedPackageValidator since the logic is identical in these two places.

Signed-off-by: J Robert Ray <[email protected]>
This validation is possible again, using
`packages_matching_embedded_component`.

Signed-off-by: J Robert Ray <[email protected]>
The EmbeddedComponentsList type is now called ComponentEmbeddedPackageList,
as this clarifies it is expected to be nested in a component, and there was
already a EmbeddedPackagesList type.

Signed-off-by: J Robert Ray <[email protected]>
This new test reaches the validation check at the end of
PkgRequirementsValidator::validate_request_against_existing_state which
is checking that a new component requirement is not bringing in a new
embedded package that conflicts with the current state.

Signed-off-by: J Robert Ray <[email protected]>
Changing name to just "embedded" as discussed.

Signed-off-by: J Robert Ray <[email protected]>
As discussed. It is no longer possible to imply :All by leaving the
components set empty.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray merged commit 2e722ec into main Oct 30, 2024
5 checks passed
@jrray jrray deleted the jrray/issue742 branch October 30, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda item Items to be brought up at the next dev meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to make a package with components that embed components?
2 participants