Replies: 5 comments 10 replies
-
I'm glad the project is maturing to a point where it is considering a dependency policy. I believe your intentions are good, but the solutions proposed could become very problematic if implemented incorrectly. The most comprehensive write-up I've seen on the topic is this one: Problems with option 1 (strict or exact == requirements): It's described better in the document that I linked, but in summary: Do not cap by default - capping dependencies makes your software incompatible with other libraries that also have strict lower limits on dependencies, and limits future fixes. Anyone can fix a missing cap, but users cannot fix an over restrictive cap causing solver errors. Any user or consumer of your package can fix a missing upper bound if there is a breakage, but if you remove this capability (by publishing the constraint inside the package), you are disempowering users. By publishing a package with strictly pinned dependencies, you are effectively forcing your dependency closure onto your users. And dbt has a lot of dependencies, so this is quite user-hostile. So I hope your proposal isn't considering doing this. Problems with option 2 (asking users to install from source?): This framing here is quite user-hostile. It requires a user to know about git and how to checkout a particular revision. Or it requires users to perform obscure configuration with pip or their dependency manager of choice. It's also against python packaging best-practice, which is to publish prebuilt wheels (with loose version constraints). Other items
This has ended up being a much larger post than I intended. My main message is this: Please, please, please don't publish distributions (sdist or wheel) to pypi with strict dependencies. Let users pin upper bounds as necessary. Document how to do so if necessary. Document how users can "pin" their environments with pip-tools or poetry or whatever environment manager they feel they need to use for their environments. |
Beta Was this translation helpful? Give feedback.
-
Playing devil's advocate here, is there any risk to waiting for the Python community to figure this out and then adopting their proposed solution? There's a lot of chatter around Python packaging12 so maybe relief is around the corner? Footnotes |
Beta Was this translation helpful? Give feedback.
-
I was asked if I'd like to contribute to the discussion via Twitter. Here are some notes / thoughts on loosening the pinning. My environment is I run dbt-core and dbt-snowflake in Airflow (with AWS MWAA), and I ran into issues updating from:
To:
Note the final issue I ran into was lack of support for hologram My penultimate issue was with Also, for anyone reading this who is also running Airfow 2.5 and struggling to set up dbt-core: I found that setting my requirements to Reasonable expectations about breaking changesWell maintained packages adhere to semantic versioning; poorly maintained packages may not, but you should avoid using such packages anyway. What this means is you can reasonably foresee breaking changes to the API.
So for example, it's unclear why it should be Take advantage of internal packagesdbt-labs maintains Ultimately, this is the issue that I ran into when updating my environment, since the current public release of dbt But I guess my question is, why? It makes sense to be weary of changes to Jinja2 because that is a separate project. But dbt-labs should not find it surprising when It makes sense that TLDR: instead of a dependency of If dbt-labs is not doing this, then there is not a whole ton of benefit to maintaining separate packages in the first place. I see asynchronous updates across packages as one of the biggest advantages of having a fragmented ecosystem across multiple packages. dbt-labs should take advantage of it. CI with unpinned dependencies implicitly tests for common installations.If you have the capacity to do quick patch deployments (this is a big "if", I know), then breaking changes are not a massive deal for a package like dbt-core in the rare circumstances that they occur. If you ever run into an unforeseen breaking issue with unpinned dependencies, your CI runs frequently enough that you'll catch it pretty easily. Users without frozen dependencies or who are doing fresh installs will run into issues for a day, but users who freeze their dependencies won't notice. The most adversely impacted group of users in this case would be people who freeze down to the patch version of dbt-core but who do not freeze anything else. But really, all of this really should be extraordinarily rare. I think it is rare to the case where most package developers don't worry about it. I'd say once a year I run into an issue like this. (Off the top of my head, over the past 3 years, I recall having this issue with isort, black, and temporarily with tensorflow, where patch versions completely break and become obsolete as naked installations due to unpinned dependencies.) (For dbt cloud deployments, I would say frozen dependencies or a managed pypi proxy should be used to avoid this issue completely. However, this is separate of the packaging ecosystem.) More matrices in Github WorkflowsCorrect me if I am wrong, but Microsoft foots the bill for Github Workflows for open source projects. This means that dbt-labs can support, for example, multiple minor versions of Jinja2 with more matrices. Right now, I see the unit testing workflow only tests for different Python versions: strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"] However, you could imagine doing the following to support more versions of Jinja2: strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
jinja2-version: [3.0.*, 3.1.*] It may be the case that Jinja 3.0 does not work with dbt 1.5, so in the case of dbt 1.5 it would be very uninteresting. But imagine Jinja2 releases version 3.2, available only on Python 3.8+. You could then safely test that it works on (let's say for example) dbt-core 1.5.3, and then enforce that both are supported with the matrix strategy: strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
jinja2-version: ["3.1.*", "3.2.*"]
exclude:
- python-version: "3.7"
jinja2-version: "3.2.*" And then, once confirmed to be working, you do a patch bump from dbt-core 1.5.3 to 1.5.4 (which would just be a one-line change in TLDR
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure where to report this, but mashumaro 3.7 was released, and according to the the test-suite, and my few tests, it's not making any trouble. Unfortunately, due to the aggressive pinning, an update to the most recent mashumaro is breaking dbt, as it refuses to start alongside an incompatible version. Should we bump the pin (meh), or better start accepting ranges open on top. (>=3.6) ? See #7534 as a support to bootstrap the discussion further. |
Beta Was this translation helpful? Give feedback.
-
@Fatal1ty offered some advice here related to lower bounds, exclusions, and pinning. I've lightly edited the quote below to assume that dbt-core depends on a Python package named antigravity:
|
Beta Was this translation helpful? Give feedback.
-
Managing dependencies in Python — what could be more fun?
Adapted from #4748 (comment)
Current approach
setup.py
dbt-core
with a set of versioned packages, then you can expect it to work with those versions of those packagesWe keep relatively tight pins on any "critical" dependencies (
Jinja
,agate
,mashumaro
,networkx
), where even subtle changes can have unintended or breaking consequences for end users. Asdbt-core
maintainers, we manage dependency upgrades within the larger process of preparing new dbt-core minor versions. Users try out new dependency versions as part of trying out a new minor version; there's a clear channel for feedback, and a clear next step (downgrade to previous minor version) if something goes awry.The downside of this approach is that it's much harder to install
dbt-core
alongside other popular Python packages, e.g. Airflow (who built a whole feature given the frequent complaints of incompatibility). To date, we've accepted that downside as part of a trade-off — it's also less likely thatpip install dbt-core
will, on any given day, stop working because of a weird behind-the-scenes change in a patch release of a critical third-party package.Proposed approach
We should maintain two sets of dependencies:
==
) dependency versions that are guaranteed to work!=
for versions with known bugs/incompatibilitiesI'm not proposing a specific implementation for either the former or latter option — there are several options we could take to achieve it (
setup.py
,pyproject.toml
/poetry
, prebuilt "snapshots" for specific OS) — but that set of two options should be our desired result. Whatever approach we take should also be extensible for adapter plugins maintained by third parties. Then, we should encourage maintainers of adapter plugins to follow our approach, by loosening the dependencies required insetup.py
(or equivalent), while also publishing a stricter set of guaranteed dependencies.This mirrors the approach taken by Airflow, among other popular Python projects: Loose Pip Constraints & Specific Officially Supported Constraints
Then, users can pick between these two options:
pip install dbt-core
if you need to manage dbt-core within a more-complex Python environment, alongside other dependencies. We'd aim to set lower bounds only insetup.py
. This carries the risk of uncovering incompatibilities with new versions of dependencies, before we have a fix for those issues. We'd do our best here; the end user choosing this option would be knowingly opting into that risk.Work in progress
Beta Was this translation helpful? Give feedback.
All reactions