[CT-1767] [Feature] `dbt parse` should return a manifest #6547

jtcohen6 · 2023-01-08T17:57:54Z

Prerequisite: dbt parse supported as top-level command in new API (#5550)

dbt parse should return the Manifest as its result. This makes it clearer that the output of parsing is the manifest*, and it would also enable programmatic users to do handy things like:

# just for type hint below
from dbt.contracts.graph.manifest import Manifest

# initialize
from dbt.cli.main import dbtRunner
dbt = dbtRunner()

# 'parse' command
results, success = dbt.invoke(['parse'])

# hand waving this part for now -- we'd want 'results' to have a consistent type for all 'invoke' calls
manifest: Manifest = results...['manifest']

# now I can programmatically inspect the project contents!
# pseudo code: assert every non-ephemeral model has an "owner"
for model in [n for n in manifest.nodes if n.resource_type == 'model']:
    if model.materialized != 'ephemeral':
        assert (model.meta['owner'] is not None)

*There are caveats to this statement, of course—the manifest is modified during compile, and technically there is other config resolution (profile, project, packages) that happens during parsing and is not included in the manifest—but it still holds 85% true today, and I'd like to see us getting to a place where it's 95% true.

Current behavior

Currently, dbt parse returns None as its results.

It's possible to call the (undocumented, internal) get_full_manifest method directly and return a manifest. This is what dbt.lib.parse_to_manifest does:

dbt-core/core/dbt/lib.py

Lines 186 to 189 in 94d6d19

    
           def parse_to_manifest(config): 
        
               from dbt.parser.manifest import ManifestLoader 
        
               return ManifestLoader.get_full_manifest(config)

The proposal in this issue is to expose that capability within dbt-core's documented, contracted, top-level API.

Motivation

Programmatic, read-only access to project manifests, as part of dbt-core's initial public Python API + "library" capabilities.

This would make it quite easy to write "project quality" rules, that run in pre-commit or CI. Because this would be the actual Manifest Python object, it would also makes available "helper" methods specific to each node class, which are lost during serialization to flat dictionaries (Jinja model + graph variables) and JSON (manifest.json).

With that great power, comes great responsibility: we'd need to be very clear about which classes, attributes, & methods can be considered public versus private. (To date, all dbt-core Python internals are considered private, and any direct use of them is considered undocumented & un-guaranteed functionality.) Is this an API we'd be ready & willing to stand behind? Related: #6391. If we're saying that the real contract is a proto class, and any Python dataclasses used to generate those data structures are implementation details (liable to change) rather than contracted APIs in & of themselves, that's something we'd need to document very clearly.

Alternatives

Parsing manifest.json after it's written by dbt parse (see [CT-1759] dbt parse should (over)write manifest.json by default #6534) — with the additional cost of writing to / reading from disk, and without the helper methods described above
Writing these rules in Jinja today. Because it's Jinja, it's gnarlier to implement & harder to test, but it does have the appeal of keeping it all in "dbt code"—which also means it can be installed as a dbt package, and run in dbt Cloud—without requiring additional Python scripting / infrastructure to run. The dbt Labs ProServ team has shown it's possible: https://github.com/dbt-labs/dbt-project-evaluator

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2023-01-22T15:36:09Z

An alternative would be to return just Manifest.flat_graph, which serializes everything to vanilla data structures (lists/dicts), and more closely matches the existing contents of manifest.json and the Jinja {{ graph }} context variable.

The downside is, we're not giving users access to the ergonomics of our built-in classes/methods. It's really no different from reading in manifest.json, just without the extra step of needing to write/read a file.

The upside would be, we wouldn't be exposing any of the internals that we might not be comfortable committing to maintaining as a public interface.

jtcohen6 · 2023-01-31T07:47:35Z

We're already a bit inconsistent as far as the return type of various commands. The "building" commands (run, test, snapshot, seed, build) all return RunExecutionResult (→ run_results.json), but not others:

source freshness returns a FreshnessResult (→ sources.json)
docs generate returns a CatalogArtifact (→ catalog.json)

>>> from dbt.cli.main import dbtRunner
>>> dbt = dbtRunner()
>>> results, success = dbt.invoke(['docs', 'generate'])
07:44:09  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 401 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:44:09
07:44:09  Concurrency: 1 threads (target='dev')
07:44:09
07:44:09  Done.
07:44:09  Building catalog
07:44:09  Catalog written to /Users/jerco/dev/scratch/testy/target/catalog.json
>>> type(results)
<class 'dbt.contracts.results.CatalogArtifact'>

jtcohen6 · 2023-03-13T18:24:21Z

The dbt parse command should return a Manifest to unblock shorter-term needs. Let's wrap the Manifest in a ParseResult, and explicitly say that we're not guaranteeing any of its underlying methods or anything about its structure in the future.

Out of scope (but worth doing later): creating a base result class, and having all commands return a subclass of that base class.

iknox-fa · 2023-04-10T21:44:18Z

@jtcohen6 I'm a little confused by the AC for this ticket. It's easy enough to return the manifest as generated in pre-flight (here's the PR to do so), but...

Let's wrap the Manifest in a ParseResult

ParseResult hasn't been a thing for a while. The best I can tell, it was removed during the partial parsing work about 2 years ago. Do you want to bring it back? Build something else?

creating a base result class, and having all commands return a subclass of that base class.

What would be the point in that, exactly? As pointed out, we're already not at all consistent in what results means across tasks.

we'd need to be very clear about which classes, attributes, & methods can be considered public versus private. (To date, all dbt-core Python internals are considered private, and any direct use of them is considered undocumented & un-guaranteed functionality.) Is this an API we'd be ready & willing to stand behind?

All great questions, but there's zero AC to guide decision-making here.

jtcohen6 · 2023-04-10T21:47:59Z

@iknox-fa The comment above was from live notes during a scoping/estimation meeting with @ChenyuLInx @aranke last month, where folks had different opinions.

Goals:

return a Manifest object
make it easy to save & reuse that object
communicate that the Manifest object is not fully contracted, and that the return signature of parse may change in the future

To be honest, I'm perfectly fine with just returning a Manifest object directly (as you have it in the draft PR currently), and noting in our documentation that:

different commands return different types
updating the type hint for the results here to be Optional[Any] instead of Optional[List] (the wrapper will change from a Tuple to a typed object in Handle internal exceptions in postflight #7242)
we're not contracting the Manifest object any more than we already have — caveat usor!

iknox-fa · 2023-04-10T22:15:49Z

@jtcohen6
Gotcha-- I'll move that PR from draft to open and make one for the docs site for communicating the changes.
✅

jtcohen6 · 2023-04-10T22:16:36Z

Re: docs site: I think it will make sense to include as part of dbt-labs/docs.getdbt.com#3118

jtcohen6 added enhancement New feature or request python_api Issues related to dbtRunner Python entry point Team:Execution labels Jan 8, 2023

github-actions bot changed the title ~~[Feature] dbt parse should return a manifest~~ [CT-1767] [Feature] dbt parse should return a manifest Jan 8, 2023

This was referenced Jan 8, 2023

[CT-926] dbt parse works in click #5550

Closed

[CT-1582] [Feature] Refactor tasks to breakout parsing the manifest into a separate piece #6357

Closed

[CT-1581] [Epic] dbt-core as a library: first steps #6356

Closed

jtcohen6 mentioned this issue Jan 22, 2023

Abstract manifest generation from tasks #6565

Merged

6 tasks

jtcohen6 added the Refinement Maintainer input needed label Feb 28, 2023

ChenyuLInx mentioned this issue Mar 7, 2023

[CT-2236] [Feature] Remove lib.py [as much as possible] #7093

Closed

3 tasks

jtcohen6 removed Team:Language Refinement Maintainer input needed labels Mar 13, 2023

jtcohen6 mentioned this issue Mar 13, 2023

[CT-2301] [Epic] API-ification: outstanding tasks for v1.5 #7162

Closed

leahwicz assigned iknox-fa Mar 27, 2023

This was referenced Apr 4, 2023

[CT-2372] [Feature] Any interest in a dbt manifest command? #7269

Closed

compile inline query doesn't add node #7292

Closed

iknox-fa mentioned this issue Apr 10, 2023

Parse command should return manifest when invoked via dbtRunner #7314

Merged

6 tasks

iknox-fa closed this as completed in #7314 Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1767] [Feature] `dbt parse` should return a manifest #6547

[CT-1767] [Feature] `dbt parse` should return a manifest #6547

jtcohen6 commented Jan 8, 2023 •

edited

Loading

jtcohen6 commented Jan 22, 2023

jtcohen6 commented Jan 31, 2023

jtcohen6 commented Mar 13, 2023 •

edited

Loading

iknox-fa commented Apr 10, 2023 •

edited

Loading

jtcohen6 commented Apr 10, 2023 •

edited

Loading

iknox-fa commented Apr 10, 2023

jtcohen6 commented Apr 10, 2023 •

edited

Loading

[CT-1767] [Feature] dbt parse should return a manifest #6547

[CT-1767] [Feature] dbt parse should return a manifest #6547

Comments

jtcohen6 commented Jan 8, 2023 • edited Loading

Current behavior

Motivation

Alternatives

jtcohen6 commented Jan 22, 2023

jtcohen6 commented Jan 31, 2023

jtcohen6 commented Mar 13, 2023 • edited Loading

iknox-fa commented Apr 10, 2023 • edited Loading

jtcohen6 commented Apr 10, 2023 • edited Loading

iknox-fa commented Apr 10, 2023

jtcohen6 commented Apr 10, 2023 • edited Loading

[CT-1767] [Feature] `dbt parse` should return a manifest #6547

[CT-1767] [Feature] `dbt parse` should return a manifest #6547

jtcohen6 commented Jan 8, 2023 •

edited

Loading

jtcohen6 commented Mar 13, 2023 •

edited

Loading

iknox-fa commented Apr 10, 2023 •

edited

Loading

jtcohen6 commented Apr 10, 2023 •

edited

Loading

jtcohen6 commented Apr 10, 2023 •

edited

Loading