-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1767] [Feature] dbt parse
should return a manifest
#6547
Comments
dbt parse
should return a manifestdbt parse
should return a manifest
An alternative would be to return just The downside is, we're not giving users access to the ergonomics of our built-in classes/methods. It's really no different from reading in The upside would be, we wouldn't be exposing any of the internals that we might not be comfortable committing to maintaining as a public interface. |
We're already a bit inconsistent as far as the return type of various commands. The "building" commands (
>>> from dbt.cli.main import dbtRunner
>>> dbt = dbtRunner()
>>> results, success = dbt.invoke(['docs', 'generate'])
07:44:09 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 401 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:44:09
07:44:09 Concurrency: 1 threads (target='dev')
07:44:09
07:44:09 Done.
07:44:09 Building catalog
07:44:09 Catalog written to /Users/jerco/dev/scratch/testy/target/catalog.json
>>> type(results)
<class 'dbt.contracts.results.CatalogArtifact'> |
The Out of scope (but worth doing later): creating a base result class, and having all commands return a subclass of that base class. |
@jtcohen6 I'm a little confused by the AC for this ticket. It's easy enough to return the manifest as generated in pre-flight (here's the PR to do so), but...
What would be the point in that, exactly? As pointed out, we're already not at all consistent in what
All great questions, but there's zero AC to guide decision-making here. |
@iknox-fa The comment above was from live notes during a scoping/estimation meeting with @ChenyuLInx @aranke last month, where folks had different opinions. Goals:
To be honest, I'm perfectly fine with just returning a
|
@jtcohen6 |
Re: docs site: I think it will make sense to include as part of dbt-labs/docs.getdbt.com#3118 |
Prerequisite:
dbt parse
supported as top-level command in new API (#5550)dbt parse
should return theManifest
as its result. This makes it clearer that the output of parsing is the manifest*, and it would also enable programmatic users to do handy things like:*There are caveats to this statement, of course—the manifest is modified during
compile
, and technically there is other config resolution (profile, project, packages) that happens during parsing and is not included in the manifest—but it still holds 85% true today, and I'd like to see us getting to a place where it's 95% true.Current behavior
Currently,
dbt parse
returnsNone
as its results.It's possible to call the (undocumented, internal)
get_full_manifest
method directly and return a manifest. This is whatdbt.lib.parse_to_manifest
does:dbt-core/core/dbt/lib.py
Lines 186 to 189 in 94d6d19
The proposal in this issue is to expose that capability within
dbt-core
's documented, contracted, top-level API.Motivation
Programmatic, read-only access to project manifests, as part of
dbt-core
's initial public Python API + "library" capabilities.This would make it quite easy to write "project quality" rules, that run in
pre-commit
or CI. Because this would be the actual Manifest Python object, it would also makes available "helper" methods specific to each node class, which are lost during serialization to flat dictionaries (Jinjamodel
+graph
variables) and JSON (manifest.json
).With that great power, comes great responsibility: we'd need to be very clear about which classes, attributes, & methods can be considered public versus private. (To date, all
dbt-core
Python internals are considered private, and any direct use of them is considered undocumented & un-guaranteed functionality.) Is this an API we'd be ready & willing to stand behind? Related: #6391. If we're saying that the real contract is aproto
class, and any Python dataclasses used to generate those data structures are implementation details (liable to change) rather than contracted APIs in & of themselves, that's something we'd need to document very clearly.Alternatives
manifest.json
after it's written bydbt parse
(see [CT-1759]dbt parse
should (over)writemanifest.json
by default #6534) — with the additional cost of writing to / reading from disk, and without the helper methods described aboveThe text was updated successfully, but these errors were encountered: