Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pants: add dependencies to the st2common python_distribution #5925

Merged
merged 8 commits into from
Mar 10, 2023

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Mar 9, 2023

Background

This is another part of introducing pants, as discussed in various TSC meetings.

Related PRs can be found in:

Overview of this PR

The primary focus of this PR is:

  • add dependencies to the st2common python_distribution so that those dependencies get included in the wheel.

Secondarily, I also:

  • other st2common/**/BUILD changes to make those dependencies pull in the expected files.
  • include (and mark executable) the script: st2common/bin/migrations/v3.8/st2-drop-st2exporter-marker-collections (also fix lint issues).

I recommend reading over the following description, and then review each commit + commit message separately.

python_distribution dependencies

This PR adds a bunch of dependencies to the python_distribution for st2common. #5928 and #5929 will add a few other dependencies for the other python_distributions. I chose to do this one separately, because it is the most involved.

When pants calculates what files should be included in a wheel, it considers the dependencies--both explicit and inferred, direct and transitive--of the python_distribution target in the BUILD metadata. When it considers the deps, it figures out which python_distribution owns which files, and then includes the current wheel's "owned" files, and includes a requirement on the wheels that provide any other dependency files. Basically anything that is in the tree of subdirectories is "owned" by the current python_distribution / wheel.

In most cases, the dependencies are fairly simple to "infer" for a python_distribution based on the scripts and entry_points defined for it. Pants can look at all of those scripts and entry_points and infer that the wheel should include those scripts and all of their dependencies. For packages like st2api, that works just fine, because the primary entry point effectively imports all of the code in the wheel. But, for library packages like st2common, we have to give pants more information about what we expect it to include in the wheel.

I looked for ways to minimize how many dependencies we have to maintain here.

  • It would be great if we could--say in a pants-plugin for example--look at all of the other python_distributions and include any files that they need in the st2common python_distribution. But then that would mean pants would have to enumerate all the dependencies of all the files in the repo to find the python_distributions and which files they need. That would not be very cacheable.
  • It would be great to have something like the setuptools.find_packages function we are currently using in setup.py, but that does not exist in pants. Dependencies cannot include globs as that would complicate pants' creation of a cache-key, so we cannot just include "everything".
  • Then we get into including individual sub-directories, at which point I want to list as few as possible.

One benefit of pants' approach to managing what gets included in the python_distribution, is that when nothing imports something--making it dead code--it just naturally falls out of the python_distribution. Also, pants will never include the BUILD files that we have scattered throughout the our repo. I suspect our current setup.py will include them in the packaged wheel.

There are a few reasons to include something in the st2common library (python_distribution/wheel):

  1. Anything that pants infers based on the various scripts / entry_points we already have to list separately,
  2. When designed for use by actions/sensors or other pack content
  3. When designed to be referenced from config (the primary example of this is logging config)
  4. When needed by one of our st2* packages
  5. When required by one or more of the runners

The first point is simple - we get that out-of-the-box with pants because we have to define those scripts/entry_points manually no matter how we want to build the wheel. The other 4 points require thought: What is public API? What is an internal API? Is this code used anywhere? Is this legacy/dead code?

So, I used a few commands to compare:

  • the list of files in the st2common/st2common directory (the candidates that can be included);
    find st2common/st2common -name '*.py' -and -not -name '__init__.py' | sort
    
  • the list of files that pants already knows/infers that should be included in the st2common library;
    ./pants dependencies --transitive st2common:st2common | grep -v -e : -e __init__.py | grep st2common/st2common | sort
    
  • the list of files required by all the other wheels (ie every python_distribution except st2common) in the repo.
    ./pants list --filter-target-type=python_distribution --filter-address-regex=-st2common:st2common :: | xargs ./pants dependencies --transitive | grep st2common/st2common | sort
    

The key pieces of these commands are ./pants dependencies (to list the dependencies of a given file or target). Also note, that I only looked for python files, and I excluded __init__.py (because pants will always automatically handle including those as needed) and anything with : because I wanted to compare a list of files and : indicates a target (which can be multiple files) not a single file.

I tried to leave a comment about every dependency to explain why I included it. In several cases, I included an entire directory to keep things as simple as possible. In others, like with util/, I opted to do file-by-file to ensure dead code automatically drops out of the wheel as soon as possible.

I also did some git+github spelunking to figure out why all the remaining files were unused. I noted that ./st2common/util/gunicorn_workers.py is probably dead code that was added in #2571. And, I left a comment listing code that I've validated is actually dead code. Follow-up PRs could remove them (and maybe we should file good-first-issue issues to let others deal with deleting that).

Pants documentation

@cognifloyd cognifloyd added this to the pants milestone Mar 9, 2023
@cognifloyd cognifloyd self-assigned this Mar 9, 2023
@pull-request-size pull-request-size bot added the size/M PR that changes 30-99 lines. Good size to review. label Mar 9, 2023
@cognifloyd cognifloyd force-pushed the pants-python_distributions-deps branch from 24562fb to 415a2cd Compare March 9, 2023 20:10
@@ -56,7 +54,7 @@ def main():
db_setup()

try:
delete_marker_collections(display_prompt=not cfg.CONF.yes)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint complained about this as the display_prompt kwarg doesn't exist. Oops.

I believe I copy/pasted this from the v3.5 migration script.

@@ -24,8 +24,6 @@ and the collections were not configured to be created automatically.
import sys
import traceback

import mongoengine as me
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 noticed that this import was unused. Oops. Dropped.

@cognifloyd cognifloyd requested a review from a team March 10, 2023 08:21
Running `./pants package ::` results in the following error. We included
the v3.5 migration script in `setup.py` and in the `pants_distribution`.
But, that script does not end in `.py`, so the `python_sources` target
did not "own" it. In order to correct this, we explicitly include the
extension-less binaries.

> ValueError: The explicit dependency
> `st2common/bin/migrations/v3.5/st2-migrate-db-dict-field-values` of
> the target at `st2common:st2common` does not provide enough address
> parameters to identify which parametrization of the dependency target
> should be used.
> Target `st2common/bin/migrations/v3.5:v3.5` can be addressed as:
>   * st2common/bin/migrations/v3.5:v3.5
>   * st2common/bin/migrations/v3.5/__init__.py
>   * st2common/bin/migrations/v3.5/st2_migrate_db_dict_field_values.py

In fixing this, I also noticed that we did not include the v3.8
migration script, so I marked that executable and included it as well.
This should probably be deleted. But for now, we just leave a note about why
it is not included in the st2common python_distribution.
Since this code is not imported by one of our scripts or entry points,
we need to add an explicit dependency. This starts with several things
that form part of the official "API" of the st2common library.
@cognifloyd cognifloyd force-pushed the pants-python_distributions-deps branch from a3b01ab to 2fa16d8 Compare March 10, 2023 16:44
@cognifloyd cognifloyd merged commit 1a0d028 into master Mar 10, 2023
@cognifloyd cognifloyd deleted the pants-python_distributions-deps branch March 10, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pantsbuild size/M PR that changes 30-99 lines. Good size to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants