-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1845] [Feature] Write a file of installed versions during dbt deps
#6643
Comments
dbt deps
dbt deps
Thanks for the brilliant write-up @dbeatty10!! As you mentioned, while this isn't something our team has capacity to pick up right now, we'd be very open to supporting a motivated community contributor who would want to start by tackling phase 1. I believe it's directionally correct for the larger vision you've outlined, while also yielding some clear & uncontroversial benefits in the meantime. |
Love all of this. There is a 15% chance I'm the very community contributor you're looking for, but it might be a couple of months away so if someone else comes along and wants to get stuck in, feel free! |
My $0.02: It's okay if the user experience for these is a bit less compelling than it is for Hub packages. That's the point of the Hub, we can make nicer things happen by making additional metadata available. The analogy here is to the difference between For each of these cases, as we imagine moving into a phase 2 (
|
hello dbt team 👋 i just started working on some code locally for this ticket today and would be happy to help contribute to this ticket/discussions surrounding it moving forward. @dbeatty10 i wanted to ask a few questions on what you outlined especially within phase 1 and 2.
as part of this ticket, would it also be okay to contribute a new feature on top of this? what i'm proposing is to break out
the
i've already started working on this locally and i don't think it would add much work to the overall effort of this ticket. here's what it may look like within the CLI: i think this would be a helpful addition to the user experience of working with packages (via |
Awesome to hear your interest @justbldwn! Quick thoughts from me on the piece around new CLI commands:
|
One of the most commonly requested enhancements to the dbt Cloud IDE is automating the Feels like having dbt commands do a check of the @dbeatty10 -- definitely keen on pitching in here (likely in conjunction with @joellabes and @justbldwn !) Let me know how to best plug in here! |
thanks for all the info @jtcohen6 ! if it's okay, i'll make a separate PR eventually relating to #5302 for the expansion of dbt deps into different sub-commands. good to know this issue is already outstanding, i hadn't seen it before! as for the scope of this ticket, i just submitted an initial draft PR (#6735) for the ability to collect/write a json file to the |
After chatting with @dbeatty10 about phase 1, I think we should invert our approach to more closely resemble
This approach has a few advantages:
|
hey @aranke, thanks for the feedback on this! i have some thoughts/questions below.
would the idea here be to have
or
with that, could this start to intrude on @jtcohen6 's stance in this comment that
the way i look at this, essentially the
do you mind diving into this point a little bit futher? i'm not fully following this, as i believe i'm happy to contribute more to this to keep momentum going, i'm just hoping for a bit more direction on exactly which way you'd like to take this. thanks again! |
I think this is brilliant and would have saved several people hours yesterday. We need to save the commit hash because what I think happened in our chase is that a package we were using kept the same version after a change was made, so those who installed the package prior to the change were seeing a different behavior than those who installed it after the change. Making the install deterministic with a type of lock file in the repo would be ideal IMO. |
@justbldwn Thanks for your detailed response and your energy in tackling this issue, it is definitely appreciated. I see the purpose of the lock file to be two-fold:
For example, consider the following
Since
With the introduction of the lock file, we now need to support three behaviors:
It's getting late here, and this comment is already really long, so I'll wrap up for now. There are two things I'll cover in a follow-up post (or @jtcohen6 can weigh in):
Once again, thank you so much for your willingness to help here and look forward to your contributions! |
@aranke thank you again so much for providing all this info and context when you did. it was extremely helpful and definitely allowed me to proceed forward. i'm hoping to submit a PR soon with changes 🎉 . one question i came across when working with deps code is related to the lets say i have 5 packages listed in my packages:
- package: dbt-labs/dbt_utils
version: 0.8.4
- git: "https://github.com/splitgraph/ab2ft_stripe"
revision: 0.1.0
- package: dbt-labs/dbt_utils
version: 0.8.4
- git: "https://github.com/google/fhir-dbt-analytics"
revision: 386e8430a4b9735466c90c3313ae67c31dbef58f
- package: everpeace/dbt_models_metadata
version: 0.1.0 and i decide that i only want my project's packages:
- package: dbt-labs/dbt_utils
version: 1.0.0 should we wipe hope that makes sense - let me know if you have any comments/questions! |
@justbldwn Good catch, yes, we should make sure that the state in For now, this can take the form of wiping and recreating the directory, but in the future we should do the following instead:
Furthermore, I think we should call the file |
PR submitted #6735 and marked ready for review! there is a lot of context and info included in the PR summary. not sure if discussion should continue here or there, but happy to help whichever way i can 👍 |
#8408 only achieved half of this. |
Hello @ChenyuLInx @justbldwn @aranke Now the problem is with the method Now that I understand what you're trying to do here and there might be something I can do about the The lock file created doesn't respect the packages.yaml: packages:
- package: "calogica/dbt_date"
version: [ ">=0.5.4", "<0.8.0" ]
- package: "dbt-labs/dbt_utils"
version: [ ">=1.0.0", "<2.0.0" ]
- git: "[email protected]:acme/git-repo.git"
subdirectory: "dbt-source-pkg-1"
revision: "dbt-source-pkg-1-v0.1.12"
- git: "[email protected]:acme/git-repo.git"
subdirectory: "dbt-source-pkg-2"
revision: "dbt-source-pkg-2-v0.1.2" It will instead create the following lock file ❯ cat package-lock.yml
packages:
- package: calogica/dbt_date
version: 0.7.2
- package: dbt-labs/dbt_utils
version: 1.1.1
- git: [email protected]:acme/git-repo.git/dbt-source-pkg-1
revision: dbt-source-pkg-1-v0.1.12
- git: [email protected]:acme/git-repo.git/dbt-source-pkg-2
revision: dbt-source-pkg-1-v0.1.2
sha1_hash: c42f2<redacted>f258 Resulting in a problem when executing |
@philippeboyd, I am using http not ssh for the git repos. These are still resolving and cloning properly even with the subdirectory on the end. I just needed to add the subdirectory line into the lock file so dbt deps knew to find the |
Hi @ryan-pip, of course if you manually modify the lock file to add the subdirectory at the proper place it will work but it doesn't fix the problem that I mentioned in my previous post. packages:
- git: [email protected]:philippeboyd/dbt-monorepo.git
subdirectory: dbt-utils-main
revision: v0.1.0
- git: [email protected]:philippeboyd/dbt-monorepo.git
subdirectory: dbt-date-main
revision: v0.1.0
sha1_hash: d3336208deda31c19fe1230443c19f7000d3e40e
Doubt. See my example below. To replicate the issue easily, I created a repository https://github.com/philippeboyd/dbt-monorepo.git with I tried both the following (git and https) and got the same result packages:
- git: "[email protected]:philippeboyd/dbt-monorepo.git"
subdirectory: "dbt-utils-main"
revision: "v0.1.0"
- git: "[email protected]:philippeboyd/dbt-monorepo.git"
subdirectory: "dbt-date-main"
revision: "v0.1.0" packages:
- git: "https://github.com/philippeboyd/dbt-monorepo.git"
subdirectory: "dbt-utils-main"
revision: "v0.1.0"
- git: "https://github.com/philippeboyd/dbt-monorepo.git"
subdirectory: "dbt-date-main"
revision: "v0.1.0" Respective packages:
- git: [email protected]:philippeboyd/dbt-monorepo.git/dbt-utils-main
revision: v0.1.0
- git: [email protected]:philippeboyd/dbt-monorepo.git/dbt-date-main
revision: v0.1.0
sha1_hash: d3336208deda31c19fe1230443c19f7000d3e40e packages:
- git: https://github.com/philippeboyd/dbt-monorepo.git/dbt-utils-main
revision: v0.1.0
- git: https://github.com/philippeboyd/dbt-monorepo.git/dbt-date-main
revision: v0.1.0
sha1_hash: 140e9796f695ebcf548343394d4aa90c3480f477 Both will result in the respective error ❯ dbt deps
02:58:44 Running with dbt=1.7.0-rc1
02:58:51 Updating lock file in file path: /tmp/tmp.lIjWExOoUt/tester/package-lock.yml
02:58:51 Installing [email protected]:philippeboyd/dbt-monorepo.git/dbt-utils-main
02:58:51 Encountered an error:
Internal Error
Error checking out spec='None' for repo [email protected]:philippeboyd/dbt-monorepo.git/dbt-utils-main
Cloning into 'cbe032b9b9b68984ccc507ae7f57690c'...
fatal: remote error:
philippeboyd/dbt-monorepo.git/dbt-utils-main is not a valid repository name
Visit https://support.github.com/ for help ❯ dbt deps
02:59:22 Running with dbt=1.7.0-rc1
02:59:25 Updating lock file in file path: /tmp/tmp.lIjWExOoUt/tester/package-lock.yml
02:59:25 Installing https://github.com/philippeboyd/dbt-monorepo.git/dbt-utils-main
02:59:25 Encountered an error:
Internal Error
Error checking out spec='None' for repo https://github.com/philippeboyd/dbt-monorepo.git/dbt-utils-main
Cloning into 'fded50563e6648f3ddd7ba6d2bf6fc14'...
remote: Not Found
fatal: repository 'https://github.com/philippeboyd/dbt-monorepo.git/dbt-utils-main/' not found Now if I install it with prior version ❯ dbt deps
03:07:08 Running with dbt=1.7.0-b2
03:07:15 Installing [email protected]:philippeboyd/dbt-monorepo.git/dbt-utils-main
03:07:16 Installed from revision v0.1.0
03:07:16 and subdirectory dbt-utils-main
03:07:16 Installing [email protected]:philippeboyd/dbt-monorepo.git/dbt-date-main
03:07:17 Installed from revision v0.1.0
03:07:17 and subdirectory dbt-date-main |
@philippeboyd Sorry for seeing this late. And thanks a lot for making that public repo and make this easy to reproduce |
Hi all. Before I open a bug as a separate issue, I want to run it past the people collaborating on this issue and see if anyone disagrees with my suggestion?
We'd like it to pass Thoughts? |
Is this your first time submitting a feature request?
Describe the feature
Problem
It's hard to determine the exact package versions that were installed during the
dbt deps
process.Multi-part proposal
We should consider a feature with three phases, each subsequent one being optional:
dbt deps
, write a file with the version of each of the installed packagespackages.json
.gitignore
'd directory (dbt_packages
,target
, or something else)pip freeze
orpip-tools compile
dbt deps
automatically as part of every relevant sub-commanddbt deps
such a delightful and pleasurable experience, why not include it within every sub-command?The advantage provided by the 1st phase is that users (and systems!) would know the exact package versions that were installed during the
dbt deps
process.I'm initially keeping these three phases together to preserve their common context, but we can split these out into separate issues at any time.
The 1st phase described above is actionable right now, but by the time we get to the 3rd one, we're definitely in speculative territory.
Describe alternatives you've considered
Currently, all dbt packages have a required
version
that is specified withindbt_project.yml
. The problem is that this field has been vestigial its entire lifespan, and it can deviate (and has!) from its associate git tag version numbers that are used by the dbt Package Hub. Furthermore, the implementation of #6603 will remove thatversion
. So we can't count on it.In my view, the best things to consider are all the sub-alternatives and design decisions within the 1st phase proposal:
packages.json
vs. another name)dbt_packages
vs.target
vs. something else)version
field withindbt_project.yaml
that is guaranteed not to disagree with any associated git tags...Who will this benefit?
This could benefit multiple types of consumers:
packages.yml
for repeatable buildsAre you interested in contributing this feature?
Can serve in a supporting role to whoever else picks this up
Anything else?
No response
The text was updated successfully, but these errors were encountered: