Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support matrices and stdout for all output types #48

Closed
wants to merge 9 commits into from

Conversation

trxcllnt
Copy link
Contributor

  • Support stdout for all output types via the new --stdout flag
  • Allow passing --file_key, --output, or --matrix (or any combination of the three)
  • Prepends the comment to the header for all file types
  • Support multiple --file_key and --output arguments

Fixes #46

Enables the dependencies.yaml changes in this branch.

Example generating requirements.txt:
$ rapids-dependency-file-generator --output requirements -f py_build_cudf -f py_run_cudf --matrix "cuda=11.8;arch=$(uname -m)" --stdout
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
cmake>=3.23.1,!=3.25.0
cython>=0.29,<0.30
git+https://github.com/python-streamz/streamz.git@master
ninja
numpy>=1.21,<1.24
protoc-wheel
pyarrow==11.0.0.*
rmm-cu11==23.6.*
scikit-build>=0.13.1,<0.17.2
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
--extra-index-url=https://pypi.nvidia.com
cachetools
cubinlinker-cu11
cuda-python>=11.7.1,<12.0
cupy-cuda11x>=12.0.0
fsspec>=0.6.0
numba>=0.56.4,<0.57
numpy>=1.21,<1.24
nvtx>=0.2.1
packaging
pandas>=1.3,<1.6.0dev0
protobuf>=4.21.6,<4.22
ptxcompiler-cu11
pyarrow==11.*
rmm-cu11==23.6.*
typing_extensions
Example generating pyproject.toml
$ rapids-dependency-file-generator --output pyproject -f py_build_cudf -f py_run_cudf --matrix "cuda=11.8;arch=$(uname -m)" --stdout
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
# Copyright (c) 2021-2023, NVIDIA CORPORATION.

[build-system]
build-backend = "setuptools.build_meta"
requires = [
"cmake>=3.23.1,!=3.25.0",
"cython>=0.29,<0.30",
"ninja",
"numpy>=1.21,<1.24",
"protoc-wheel",
"pyarrow==11.0.0.*",
"rmm-cu11==23.6.*",
"scikit-build>=0.13.1,<0.17.2",
"setuptools",
"wheel",
]

[project]
name = "cudf-cu11"
version = "23.6.0"
description = "cuDF - GPU Dataframe"
readme = { file = "README.md", content-type = "text/markdown" }
authors = [
{ name = "NVIDIA Corporation" },
]
license = { text = "Apache 2.0" }
requires-python = ">=3.9"
dependencies = [
"cachetools",
"cubinlinker-cu11",
"cuda-python>=11.7.1,<12.0",
"cupy-cuda11x>=12.0.0",
"fsspec>=0.6.0",
"numba>=0.56.4,<0.57",
"numpy>=1.21,<1.24",
"nvtx>=0.2.1",
"packaging",
"pandas>=1.3,<1.6.0dev0",
"protobuf>=4.21.6,<4.22",
"ptxcompiler-cu11",
"pyarrow==11.*",
"rmm-cu11==23.6.*",
"typing_extensions",
]
classifiers = [
"Intended Audience :: Developers",
"Topic :: Database",
"Topic :: Scientific/Engineering",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
]

[project.optional-dependencies]
test = [
"fastavro>=0.22.9",
"hypothesis",
"mimesis>=4.1.0",
"msgpack",
"pyorc",
"pytest",
"pytest-benchmark",
"pytest-cases",
"pytest-cov",
"pytest-xdist",
"python-snappy>=0.6.0",
"scipy",
"tokenizers==0.13.1",
"transformers==4.24.0",
"tzdata",
]

[project.urls]
Homepage = "https://github.com/rapidsai/cudf"
Documentation = "https://docs.rapids.ai/api/cudf/stable/"

[tool.setuptools]
license-files = ["LICENSE"]

[tool.isort]
line_length = 79
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
combine_as_imports = true
order_by_type = true
known_dask = [
"dask",
"distributed",
"dask_cuda",
]
known_rapids = [
"rmm",
]
known_first_party = [
"cudf",
]
default_section = "THIRDPARTY"
sections = [
"FUTURE",
"STDLIB",
"THIRDPARTY",
"DASK",
"RAPIDS",
"FIRSTPARTY",
"LOCALFOLDER",
]
skip = [
"thirdparty",
".eggs",
".git",
".hg",
".mypy_cache",
".tox",
".venv",
"_build",
"buck-out",
"build",
"dist",
"__init__.py",
]

@trxcllnt trxcllnt changed the title feat(rapids_dependency_file_generator.py): support stdout for all output types feat: support matrices and stdout for all output types May 19, 2023
@ajschmidt8
Copy link
Member

I think these changes are considered breaking because of the new stdout flag/behavior.

It will break test scripts like this: https://github.com/rapidsai/cudf/blob/72c067726ccfb6e87033d34ab07b4dc79b5e4a3e/ci/test_python_common.sh#L10-L14

@trxcllnt, can you add a BREAKING CHANGE note to your PR body as mentioned here https://github.com/rapidsai/dependency-file-generator/blob/main/CONTRIBUTING.md? That will make sure the next release increments the major version.

Our CI images pin to the current major version of dfg: https://github.com/rapidsai/ci-imgs/blob/cac1028880574b466ed37a4aec8aaf93d3eab0b2/Dockerfile#L117-L119

But we'll need a way to incrementally roll this out to each repository before we update the version in our CI images so that we don't break CI for everyone.

I thought about this in the past, but never took any action on it due to time constraints.

I think one way we can fix it is to add a new optional input, install_dfg_version, to the relevant shared workflows here: https://github.com/rapidsai/shared-action-workflows/tree/branch-23.06/.github/workflows.

install_dfg_version will be a version specifier for rapids-dependency-file-generator. When it's set, it will install the specified version. When it is not set, it will simply do nothing.

Then each repo can manually opt-in to the new major version. Once all the repos are using the new version, we can update the CI image version accordingly and then go back and clean up all of the optional install_dfg_version arguments in each repository.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have a couple of concerns about the current implementation but I think the core work around extending the CLI is a step in the right direction.


# If --clean was passed without arguments, default to cleaning from the root of the
# tree where the config file is.
if args.clean == "":
args.clean = os.path.dirname(os.path.abspath(args.config))

args.file_key = list(sorted(list(set(sorted(args.file_key)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is equivalent?

Suggested change
args.file_key = list(sorted(list(set(sorted(args.file_key)))))
args.file_key = sorted(set(args.file_key))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it gets converted to a list:

>>> abc = set((1,2,3,4,5))
>>> abc
{1, 2, 3, 4, 5}
>>> sorted(abc)
[1, 2, 3, 4, 5]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and in general there are a lot of extra ops here.

@@ -323,7 +315,34 @@ def should_use_specific_entry(matrix_combo, specific_entry_matrix):
)


def make_dependency_files(parsed_config, config_file_path, to_stdout):
def name_with_cuda_suffix(name, cuda_version=None, cuda_suffix="-cu"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this logic in dfg. It feels like too much scope creep. I would rather dependencies.yaml files use matrix entries to handle this in the short term. In the longer term, if we decide we need this functionality I would suggest that we add some sort of generic support for variables and string interpolation into dfg as a more general solution. I'd like to be able to use the same solution for our packages and cupy, for instance, and the two use different naming conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and do) put the -cuXX names in the dependencies.yaml matrix entries. This is explicitly about the name key of the pyproject.toml.

Overall pyproject.toml should probably be fully auto-generated instead of read and mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyasr where do you think the logic to define the project.name key in the pyproject.toml should live?

Copy link
Contributor

@vyasr vyasr Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and do) put the -cuXX names in the dependencies.yaml matrix entries. This is explicitly about the name key of the pyproject.toml.

Yup, I realized that later in my review around #48 (comment) 😅

Overall pyproject.toml should probably be fully auto-generated instead of read and mutated.

When you say fully auto-generated, what are you envisioning? I could see using something like a Jinja templated pyproject.toml.in and filling in a suitable set of fields, perhaps. There are large swathes of the file that are necessary for things like 1) running linters, 2) specifying "extra" dependency lists, 3) configuring build backends, and more that have to be encoded somewhere else. None of those should be dependency-file-generator's responsibility, and at least the first one requires the file to already exist somewhere in the repo for normal usage.

Copy link
Contributor

@vyasr vyasr Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the project.name, I honestly do not know. My approach to taking the magic out of the wheel building process so far has been very incremental, fixing one problem at a time. I've been viewing the name as sort of a final frontier, one for which I don't have a good answer for yet unfortunately. I don't think a tool dedicated to dependency management is the right place to put that, though.

I'd love to work with you on resolving that problem. I completely agree that the current approach with apply_wheel_modification.sh is not a very good one.

src/rapids_dependency_file_generator/cli.py Show resolved Hide resolved
Comment on lines +137 to +138
for file_key, file_config in parsed_config["files"].items():
file_config["matrix"] = matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for file_key, file_config in parsed_config["files"].items():
file_config["matrix"] = matrix
for file_config in parsed_config["files"].values():
file_config["matrix"] = matrix

@@ -394,7 +410,7 @@ def make_dependency_files(parsed_config, config_file_path, to_stdout):
# exists. In that case we save the fallback_entry result
# and only use it at the end if nothing more
# specific is found.
if not specific_matrices_entry["matrix"]:
if not specific_matrices_entry.get("matrix", None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the matrix key optional? For a specific entry it should be required, even with the new CLI functionality, right?

Comment on lines +459 to +466
# Append `-cuXX` to `[package.name]`
results[output_file_path]["project"][
"name"
] = name_with_cuda_suffix(
results[output_file_path]["project"]["name"],
matrix_combo.get("cuda", None),
cuda_suffix,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first saw the function I actually thought the goal was to handle dependency suffixes rather than package name suffixes. I'm even less comfortable putting this logic into this tool. Package renaming to support our wheels-specific workflows is definitely scope creep. I'd love to find a better solution to what we're currently doing in our wheel builds, but I don't think this is it.

"""
)

if isinstance(data, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implicitly relying on pyproject outputs generating a dict here while other outputs just contain a long string of text? We should make that condition explicit if so, otherwise it's confusing why tomlkit is getting used in a generic dict path.

f.write(contents)
def write_output(data, output_dir, f):

relpath_to_config_file = os.path.relpath(config_file_path, output_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hoist this logic into the calling loop then the write_output function could be moved outside instead of defined as a nested function right? I think that would be cleaner if we do decide to keep it.

@vyasr
Copy link
Contributor

vyasr commented Jun 9, 2023

@trxcllnt apologies for the delay in reviewing this PR. I propose that we we split this up so that the uncontroversial pieces can get merged quickly, if you're open to that.

  • I think everyone is happy with adding support for stdout to all file types (especially pyproject.toml)
  • My guess is that generalizing the CLI to support only passing a subset of arguments is probably uncontroversial in theory, but a bit more problematic because it's a breaking change. I know you said you and @ajschmidt8 had discussed that piece a bit further, and maybe there's a path forward there that involves refactoring the Python functionality and then using it in a new CLI? That would be another good option.
  • The project name/CUDA suffix piece will need some more debate, and I don't want to hold up you getting the other useful pieces in over that.

@vyasr
Copy link
Contributor

vyasr commented Jun 30, 2023

@trxcllnt would you like some help finishing this up? Let me know if you want to chat about it or need some extra person-hour help.

@bdice
Copy link
Contributor

bdice commented Jan 17, 2024

@trxcllnt @vyasr Can we get this PR to a completed state? It keeps biting us that we don't have pyproject matrix support. I can help with a review, if that's what is needed, but it seems like there is some code work to be done still.

@vyasr
Copy link
Contributor

vyasr commented Jan 17, 2024

I'll defer to @trxcllnt here. He had an idea of how best to rewrite the generator to better support this behavior.

@vyasr
Copy link
Contributor

vyasr commented Apr 18, 2024

Replaced by #74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support matrix entries for pyproject.toml
5 participants