feat: support matrices and stdout for all output types #48

trxcllnt · 2023-05-19T00:53:54Z

Support stdout for all output types via the new --stdout flag
Allow passing --file_key, --output, or --matrix (or any combination of the three)
Prepends the comment to the header for all file types
Support multiple --file_key and --output arguments

Fixes #46

Enables the dependencies.yaml changes in this branch.

Example generating requirements.txt:

$ rapids-dependency-file-generator --output requirements -f py_build_cudf -f py_run_cudf --matrix "cuda=11.8;arch=$(uname -m)" --stdout
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
cmake>=3.23.1,!=3.25.0
cython>=0.29,<0.30
git+https://github.com/python-streamz/streamz.git@master
ninja
numpy>=1.21,<1.24
protoc-wheel
pyarrow==11.0.0.*
rmm-cu11==23.6.*
scikit-build>=0.13.1,<0.17.2
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
--extra-index-url=https://pypi.nvidia.com
cachetools
cubinlinker-cu11
cuda-python>=11.7.1,<12.0
cupy-cuda11x>=12.0.0
fsspec>=0.6.0
numba>=0.56.4,<0.57
numpy>=1.21,<1.24
nvtx>=0.2.1
packaging
pandas>=1.3,<1.6.0dev0
protobuf>=4.21.6,<4.22
ptxcompiler-cu11
pyarrow==11.*
rmm-cu11==23.6.*
typing_extensions

Example generating pyproject.toml

$ rapids-dependency-file-generator --output pyproject -f py_build_cudf -f py_run_cudf --matrix "cuda=11.8;arch=$(uname -m)" --stdout
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
# Copyright (c) 2021-2023, NVIDIA CORPORATION.

[build-system]
build-backend = "setuptools.build_meta"
requires = [
    "cmake>=3.23.1,!=3.25.0",
    "cython>=0.29,<0.30",
    "ninja",
    "numpy>=1.21,<1.24",
    "protoc-wheel",
    "pyarrow==11.0.0.*",
    "rmm-cu11==23.6.*",
    "scikit-build>=0.13.1,<0.17.2",
    "setuptools",
    "wheel",
]

[project]
name = "cudf-cu11"
version = "23.6.0"
description = "cuDF - GPU Dataframe"
readme = { file = "README.md", content-type = "text/markdown" }
authors = [
    { name = "NVIDIA Corporation" },
]
license = { text = "Apache 2.0" }
requires-python = ">=3.9"
dependencies = [
    "cachetools",
    "cubinlinker-cu11",
    "cuda-python>=11.7.1,<12.0",
    "cupy-cuda11x>=12.0.0",
    "fsspec>=0.6.0",
    "numba>=0.56.4,<0.57",
    "numpy>=1.21,<1.24",
    "nvtx>=0.2.1",
    "packaging",
    "pandas>=1.3,<1.6.0dev0",
    "protobuf>=4.21.6,<4.22",
    "ptxcompiler-cu11",
    "pyarrow==11.*",
    "rmm-cu11==23.6.*",
    "typing_extensions",
]
classifiers = [
    "Intended Audience :: Developers",
    "Topic :: Database",
    "Topic :: Scientific/Engineering",
    "License :: OSI Approved :: Apache Software License",
    "Programming Language :: Python",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
]

[project.optional-dependencies]
test = [
    "fastavro>=0.22.9",
    "hypothesis",
    "mimesis>=4.1.0",
    "msgpack",
    "pyorc",
    "pytest",
    "pytest-benchmark",
    "pytest-cases",
    "pytest-cov",
    "pytest-xdist",
    "python-snappy>=0.6.0",
    "scipy",
    "tokenizers==0.13.1",
    "transformers==4.24.0",
    "tzdata",
]

[project.urls]
Homepage = "https://github.com/rapidsai/cudf"
Documentation = "https://docs.rapids.ai/api/cudf/stable/"

[tool.setuptools]
license-files = ["LICENSE"]

[tool.isort]
line_length = 79
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
combine_as_imports = true
order_by_type = true
known_dask = [
    "dask",
    "distributed",
    "dask_cuda",
]
known_rapids = [
    "rmm",
]
known_first_party = [
    "cudf",
]
default_section = "THIRDPARTY"
sections = [
    "FUTURE",
    "STDLIB",
    "THIRDPARTY",
    "DASK",
    "RAPIDS",
    "FIRSTPARTY",
    "LOCALFOLDER",
]
skip = [
    "thirdparty",
    ".eggs",
    ".git",
    ".hg",
    ".mypy_cache",
    ".tox",
    ".venv",
    "_build",
    "buck-out",
    "build",
    "dist",
    "__init__.py",
]

…--matrix (or any combination of the three)

ajschmidt8 · 2023-05-19T13:58:29Z

I think these changes are considered breaking because of the new stdout flag/behavior.

It will break test scripts like this: https://github.com/rapidsai/cudf/blob/72c067726ccfb6e87033d34ab07b4dc79b5e4a3e/ci/test_python_common.sh#L10-L14

@trxcllnt, can you add a BREAKING CHANGE note to your PR body as mentioned here https://github.com/rapidsai/dependency-file-generator/blob/main/CONTRIBUTING.md? That will make sure the next release increments the major version.

Our CI images pin to the current major version of dfg: https://github.com/rapidsai/ci-imgs/blob/cac1028880574b466ed37a4aec8aaf93d3eab0b2/Dockerfile#L117-L119

But we'll need a way to incrementally roll this out to each repository before we update the version in our CI images so that we don't break CI for everyone.

I thought about this in the past, but never took any action on it due to time constraints.

I think one way we can fix it is to add a new optional input, install_dfg_version, to the relevant shared workflows here: https://github.com/rapidsai/shared-action-workflows/tree/branch-23.06/.github/workflows.

install_dfg_version will be a version specifier for rapids-dependency-file-generator. When it's set, it will install the specified version. When it is not set, it will simply do nothing.

Then each repo can manually opt-in to the new major version. Once all the repos are using the new version, we can update the CI image version accordingly and then go back and clean up all of the optional install_dfg_version arguments in each repository.

vyasr

Thanks for the PR! I have a couple of concerns about the current implementation but I think the core work around extending the CLI is a step in the right direction.

vyasr · 2023-05-23T16:46:56Z

src/rapids_dependency_file_generator/cli.py


    # If --clean was passed without arguments, default to cleaning from the root of the
    # tree where the config file is.
    if args.clean == "":
        args.clean = os.path.dirname(os.path.abspath(args.config))

+    args.file_key = list(sorted(list(set(sorted(args.file_key)))))


I'm pretty sure this is equivalent?

Suggested change

args.file_key = list(sorted(list(set(sorted(args.file_key)))))

args.file_key = sorted(set(args.file_key))

Yes, it gets converted to a list:

>>> abc = set((1,2,3,4,5)) >>> abc {1, 2, 3, 4, 5} >>> sorted(abc) [1, 2, 3, 4, 5]

Right, and in general there are a lot of extra ops here.

vyasr · 2023-05-23T16:58:21Z

src/rapids_dependency_file_generator/rapids_dependency_file_generator.py

@@ -323,7 +315,34 @@ def should_use_specific_entry(matrix_combo, specific_entry_matrix):
    )


-def make_dependency_files(parsed_config, config_file_path, to_stdout):
+def name_with_cuda_suffix(name, cuda_version=None, cuda_suffix="-cu"):


I don't think we want this logic in dfg. It feels like too much scope creep. I would rather dependencies.yaml files use matrix entries to handle this in the short term. In the longer term, if we decide we need this functionality I would suggest that we add some sort of generic support for variables and string interpolation into dfg as a more general solution. I'd like to be able to use the same solution for our packages and cupy, for instance, and the two use different naming conventions.

We can (and do) put the -cuXX names in the dependencies.yaml matrix entries. This is explicitly about the name key of the pyproject.toml.

Overall pyproject.toml should probably be fully auto-generated instead of read and mutated.

@vyasr where do you think the logic to define the project.name key in the pyproject.toml should live?

We can (and do) put the -cuXX names in the dependencies.yaml matrix entries. This is explicitly about the name key of the pyproject.toml.

Yup, I realized that later in my review around #48 (comment) 😅

Overall pyproject.toml should probably be fully auto-generated instead of read and mutated.

When you say fully auto-generated, what are you envisioning? I could see using something like a Jinja templated pyproject.toml.in and filling in a suitable set of fields, perhaps. There are large swathes of the file that are necessary for things like 1) running linters, 2) specifying "extra" dependency lists, 3) configuring build backends, and more that have to be encoded somewhere else. None of those should be dependency-file-generator's responsibility, and at least the first one requires the file to already exist somewhere in the repo for normal usage.

Regarding the project.name, I honestly do not know. My approach to taking the magic out of the wheel building process so far has been very incremental, fixing one problem at a time. I've been viewing the name as sort of a final frontier, one for which I don't have a good answer for yet unfortunately. I don't think a tool dedicated to dependency management is the right place to put that, though.

I'd love to work with you on resolving that problem. I completely agree that the current approach with apply_wheel_modification.sh is not a very good one.

src/rapids_dependency_file_generator/cli.py

vyasr · 2023-05-23T17:41:45Z

src/rapids_dependency_file_generator/cli.py

+        for file_key, file_config in parsed_config["files"].items():
+            file_config["matrix"] = matrix


Suggested change

for file_key, file_config in parsed_config["files"].items():

file_config["matrix"] = matrix

for file_config in parsed_config["files"].values():

file_config["matrix"] = matrix

vyasr · 2023-05-23T18:05:52Z

src/rapids_dependency_file_generator/rapids_dependency_file_generator.py

@@ -394,7 +410,7 @@ def make_dependency_files(parsed_config, config_file_path, to_stdout):
                            # exists. In that case we save the fallback_entry result
                            # and only use it at the end if nothing more
                            # specific is found.
-                            if not specific_matrices_entry["matrix"]:
+                            if not specific_matrices_entry.get("matrix", None):


When is the matrix key optional? For a specific entry it should be required, even with the new CLI functionality, right?

vyasr · 2023-05-23T18:18:18Z

src/rapids_dependency_file_generator/rapids_dependency_file_generator.py

+                            # Append `-cuXX` to `[package.name]`
+                            results[output_file_path]["project"][
+                                "name"
+                            ] = name_with_cuda_suffix(
+                                results[output_file_path]["project"]["name"],
+                                matrix_combo.get("cuda", None),
+                                cuda_suffix,
+                            )


When I first saw the function I actually thought the goal was to handle dependency suffixes rather than package name suffixes. I'm even less comfortable putting this logic into this tool. Package renaming to support our wheels-specific workflows is definitely scope creep. I'd love to find a better solution to what we're currently doing in our wheel builds, but I don't think this is it.

vyasr · 2023-05-23T18:20:55Z

src/rapids_dependency_file_generator/rapids_dependency_file_generator.py

+            """
+        )
+
+        if isinstance(data, dict):


Is this implicitly relying on pyproject outputs generating a dict here while other outputs just contain a long string of text? We should make that condition explicit if so, otherwise it's confusing why tomlkit is getting used in a generic dict path.

vyasr · 2023-05-23T18:25:16Z

src/rapids_dependency_file_generator/rapids_dependency_file_generator.py

-                            f.write(contents)
+    def write_output(data, output_dir, f):
+
+        relpath_to_config_file = os.path.relpath(config_file_path, output_dir)


If we hoist this logic into the calling loop then the write_output function could be moved outside instead of defined as a nested function right? I think that would be cleaner if we do decide to keep it.

vyasr · 2023-06-09T22:08:33Z

@trxcllnt apologies for the delay in reviewing this PR. I propose that we we split this up so that the uncontroversial pieces can get merged quickly, if you're open to that.

I think everyone is happy with adding support for stdout to all file types (especially pyproject.toml)
My guess is that generalizing the CLI to support only passing a subset of arguments is probably uncontroversial in theory, but a bit more problematic because it's a breaking change. I know you said you and @ajschmidt8 had discussed that piece a bit further, and maybe there's a path forward there that involves refactoring the Python functionality and then using it in a new CLI? That would be another good option.
The project name/CUDA suffix piece will need some more debate, and I don't want to hold up you getting the other useful pieces in over that.

vyasr · 2023-06-30T00:35:03Z

@trxcllnt would you like some help finishing this up? Let me know if you want to chat about it or need some extra person-hour help.

bdice · 2024-01-17T15:08:34Z

@trxcllnt @vyasr Can we get this PR to a completed state? It keeps biting us that we don't have pyproject matrix support. I can help with a review, if that's what is needed, but it seems like there is some code work to be done still.

vyasr · 2024-01-17T15:34:46Z

I'll defer to @trxcllnt here. He had an idea of how best to rewrite the generator to better support this behavior.

vyasr · 2024-04-18T16:21:55Z

Replaced by #74

trxcllnt added 7 commits May 18, 2023 17:37

support stdout for all types, allow passing --file_key, --output, or …

897303d

…--matrix (or any combination of the three)

fix lint

29d5ee7

fix lint again

5136fc6

handle unknown file_key arguments

18ec47d

make --stdout opt-in

bb65cc4

fix or update tests

8744b46

fix lint

a54a435

trxcllnt changed the title ~~feat(rapids_dependency_file_generator.py): support stdout for all output types~~ feat: support matrices and stdout for all output types May 19, 2023

ajschmidt8 and others added 2 commits May 19, 2023 09:34

empty commit to rerun GHAs

9b3ede2

Merge branch 'main' into fea/support-stdout

25b2449

vyasr requested changes May 23, 2023

View reviewed changes

trxcllnt mentioned this pull request Jun 1, 2023

cuSpatial pip packages rapidsai/cuspatial#1148

Merged

3 tasks

trxcllnt mentioned this pull request Sep 7, 2023

Add RAFT devcontainers rapidsai/raft#1791

Merged

3 tasks

vyasr mentioned this pull request Oct 6, 2023

Allow nightly dependencies and set up consistent nightly versions for conda and pip packages rapidsai/cuml#5607

Merged

msarahan mentioned this pull request Jan 9, 2024

add initial devcontainers rapidsai/ucxx#155

Draft

KyleFromNVIDIA mentioned this pull request Mar 21, 2024

feat: Add matrix support to pyproject.toml #70

Merged

vyasr closed this Apr 18, 2024

jameslamb mentioned this pull request May 9, 2024

remove note about rapids-dependency-file-generator rapidsai/rapids-build-backend#28

Merged

KyleFromNVIDIA mentioned this pull request Jul 2, 2024

feat(cli): remove --file_key and --prepend-channels arguments #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support matrices and stdout for all output types #48

feat: support matrices and stdout for all output types #48

trxcllnt commented May 19, 2023

ajschmidt8 commented May 19, 2023

vyasr left a comment

vyasr May 23, 2023

msarahan Jan 9, 2024

vyasr Jan 11, 2024

vyasr May 23, 2023

trxcllnt May 23, 2023

trxcllnt Jun 7, 2023

vyasr Jun 9, 2023 •

edited

Loading

vyasr Jun 9, 2023 •

edited

Loading

vyasr May 23, 2023

vyasr May 23, 2023

vyasr May 23, 2023

vyasr May 23, 2023

vyasr May 23, 2023

vyasr commented Jun 9, 2023

vyasr commented Jun 30, 2023

bdice commented Jan 17, 2024

vyasr commented Jan 17, 2024

vyasr commented Apr 18, 2024

	args.file_key = list(sorted(list(set(sorted(args.file_key)))))
	args.file_key = sorted(set(args.file_key))

		for file_key, file_config in parsed_config["files"].items():
		file_config["matrix"] = matrix

feat: support matrices and stdout for all output types #48

feat: support matrices and stdout for all output types #48

Conversation

trxcllnt commented May 19, 2023

ajschmidt8 commented May 19, 2023

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

vyasr Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr commented Jun 9, 2023

vyasr commented Jun 30, 2023

bdice commented Jan 17, 2024

vyasr commented Jan 17, 2024

vyasr commented Apr 18, 2024

vyasr Jun 9, 2023 •

edited

Loading

vyasr Jun 9, 2023 •

edited

Loading