Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1585] [Feature] New top level commands: interactive preview #6359

Closed
Tracked by #7162 ...
ChenyuLInx opened this issue Dec 2, 2022 · 17 comments · Fixed by #7208
Closed
Tracked by #7162 ...

[CT-1585] [Feature] New top level commands: interactive preview #6359

ChenyuLInx opened this issue Dec 2, 2022 · 17 comments · Fixed by #7208
Assignees
Labels
enhancement New feature or request python_api Issues related to dbtRunner Python entry point

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Dec 2, 2022

Describe the feature

See #6358 for more background of what interactive means.

This one is for supporting interactive preview function as a native function for dbt cli. Example of how this happens in dbt-rpc and dbt-server(link to actual code)

AC:
See last comment

@ChenyuLInx ChenyuLInx added enhancement New feature or request triage labels Dec 2, 2022
@github-actions github-actions bot changed the title [Feature] [Feature] New top level commands: interactive preview [CT-1585] [Feature] [Feature] New top level commands: interactive preview Dec 2, 2022
@ChenyuLInx ChenyuLInx changed the title [CT-1585] [Feature] [Feature] New top level commands: interactive preview [CT-1585] [Feature] New top level commands: interactive preview Dec 2, 2022
@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Dec 2, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 2, 2022

A lot of community interest in this one! :) #5418, #6087, #6090

Mostly the same comments as in #6358 (comment). I do think a totally new command for this one makes sense. It would work much the same as interactive compile, with the added step of actually executing the compiled query.

$ dbt preview --code <base64encoded>

From the CLI perspective, this should feel much the same as:

-- macros/preview.sql
{% macro preview(code) %}
  {% set result = run_query(code) %}
  {% do result.print_table() %}
{% endmacro %}
$ dbt run-operation preview --args '{"code": "select 1 as id, True as is_odd union all select 2, False"}'
| id | is_odd |
| -- | ------ |
|  1 |   True |
|  2 |  False |

formatting + passing the result

In the existing implementation, execute_result returns an agate.Table, and then we turn it into a simpler ResultTable Python object. My inclination here is that, on the CLI, we'd send the result to stdout, with the format depending on the --log-format:

For programmatic consumers (dbt-server), should we return the ResultTable directly as a Python object? Include a serialized representation in a structured log event? Is there a way we could proto-tize that result object, to maximize interoperability? (Is it time for us to finally roll our own "dataframe" helper for result sets, and replace agate?)

limiting result set

What are the arguments we want to support for this? # of lines?

Hm. I'm not 100% sure that this should be in dbt-core, but it does feel like it could be a place to enact a configurable limit to the result set. I'd be curious to hear what folks like @bossnunta @isaac-taylor @davidharting think.

The default should either be:

  • something reasonable (--limit 500)
  • nothing at all; dbt doesn't try to limit the result set unless passed an explicit configuration

Options on how to implement:

  • Modify SQL! Add limit 500 to the end of queries — or detect if they contain their own limit clause, in which case, don't add a limit clause.
    • Of course, this makes sense for SQL, but not for other languages! We could add as an optional arg to the statement macro, and attempt some conditional logic based on language. Python dataframes do support methods like .take(<size>) or .head(<size>), and we could try passing that into submit_python_job (which would also need a refactor to support "interactive" queries, never mind how slow it would be on some platforms).
  • Add a new argument to adapter.execute: adapter.execute(<code>, fetch=True, limit=500). Then, in each database adapter, we plumb this through to connections.execute. Many cursors support fetchmany(size=<int>) (e.g. psycopg2, snowflake-connector-python), and there's a natural spot for it here.
    • This feels like the surest way to go, and least error-prone
    • But if a user really did want >500 rows, we'd need to either do SQL parsing to detect that their query includes limit 1000 (as above), or provide another means to pass in that information, e.g. in the dbt Cloud IDE.

Let's keep thinking through this part. I think it can be out of scope for the initial first-pass implementation.

@jtcohen6 jtcohen6 added python_api Issues related to dbtRunner Python entry point Team:Execution and removed Refinement Maintainer input needed labels Dec 2, 2022
@isaac-taylor
Copy link

Hm. I'm not 100% sure that this should be in dbt-core, but it does feel like it could be a place to enact a configurable limit to the result set. I'd be curious to hear what folks like @bossnunta @isaac-taylor @davidharting think.

In my head, the question comes down to 'should we modify user code directly?' And if so (the IDE does today), where in the stack should that happen.

My guess is Core is the right place long term. That will make extending interactive preview/compile support to Python (if that's even doable) a bit easier, as we won't have multiple components needing to understand user code. I'm thinking the IDE shouldn't know about anything related to dbt code parsing itself.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 5, 2022

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

@isaac-taylor
Copy link

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

Seems like just a missing feature from 1.1 that we had in 1.0. Very doable from a technical perspective. It's about UX and prioritization I think Let me bring that one up to Nate to see if that's on his radar.

@bossnunta
Copy link

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

I think having a version of "limit [500]" as a UX (and/or centralized admin configuration) is a very good approach here. I don't believe we had that in IDE v1.0 either. I shared notes on this in Slack. (What we do in both 1.0 and 1.1 is the default to 500 rows that is overridden when you have a limit clause).

However, from product perspective, the IDE shouldn't really be doing sql parsing to deal with this in my opinion. Ideally we could do this with dbt-server (or maybe Core given that it's about parsing?)

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 6, 2022

However, from product perspective, the IDE shouldn't really be doing sql parsing to deal with this in my opinion. Ideally we could do this with dbt-server (or maybe Core given that it's about parsing?)

If we have a checkbox, we don't need to do SQL parsing at all, anywhere. Ideal!

It will be on the user to either:

  • put their own number in the "limit" checkbox
  • uncheck the box and add their own custom limit <number> within their query

@davidharting
Copy link

davidharting commented Dec 13, 2022

Just for historical record, we did at one time have a "limit 500" checkbox in cloud IDE v1.0, until we merged https://github.com/dbt-labs/dbt-cloud/pull/2284. Looking at the PR and the issue it closes, it is lost to history precisely why we made this change. I do know that it was unpopular with some folks though and I am more than happy to go back to a checkbox style approach.

In my memory, the biggest reason for the change was the amount of UI real estate that the checkbox took up. It seemed to mismatch how "primary" we thought this functionality should be.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 5, 2023

As in #6358 (comment), we'd also want the ability to "preview" a specific model, within the context of that model, to correctly resolve any instances of (e.g.) {{ this }} or {{ is_incremental() }}:

$ dbt preview --select specific_model

The idea being, we compile specific_model (versus arbitrary Jinja-SQL), and then preview its compiled_code. I'm using --select above for consistency, but it obviously wouldn't make sense for more than one node to be selected for preview. That's a check we can add ("must select only one node"), or we can use a different flag/argument for clarity.


Thinking about "compile" and "preview" in the dbt Cloud IDE: Who should be responsible for determining if the user is compiling/preview a specific model in the project, versus some arbitrary Jinja-SQL in a scratchpad? Should the IDE detect that the file open is an actual saved file in the project? Should it pass along the file name, in addition to file contents, and dbt-core/dbt-server/someone detects that the file name matches a manifest node, and use the code to update the manifest node? This gets trickier if we think about a user compiling/preview unsaved code changes that correspond to a manifest-registered node.

@davidharting
Copy link

Who should be responsible for determining if the user is compiling/preview a specific model in the project, versus some arbitrary Jinja-SQL in a scratchpad?

The way I would naively expect this to work is for dbt itself to support two different mutually exclusive flags for the two use cases. --select for existing models and --query (or something like that) to pass in arbitrary sql or python (base 64 encoded probably?).

Then I would expect it to be the job of clients to determine which case they are in and use the appropriate flag.

Does that seem plausible?

@b-per
Copy link
Contributor

b-per commented Jan 16, 2023

I'd agree with the last comment of 2 different flags/options.

In BigQuery, the--select option could use the free Preview feature (e.g. with bq head) rather than running a query that would scan data and cost money.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 16, 2023

Thanks @b-per! Two comments that came up in our related conversation just now:

  1. In cases where users are just trying to preview an already-existing table, we could try doing a "free" preview of that table, instead of actually submitting select * from {{ table }} as a standalone query. This would require us to understand if the table already exists, if its query logic has been updated since it was last built, ...
  2. BQ makes it possible to estimate the cost of a query before running it. The closest analogue on other databases would be to return an EXPLAIN plan. Could we expose this somehow? dbt preview --dry-run? dbt preview --explain? Maybe this is actually the right "workflow" for explaining queries (= estimating cost/complexity), first imagined in this quite old issue: workflow for explain queries? #401

To be clear, I think both of those should be out of scope for the initial implementation (this issue). But it's good to have these potential (adapter-specific) extensions in the back of our minds, while doing the work to add + expose this foundational capability.

@jtcohen6
Copy link
Contributor

Not something addressed previously in this issue, and definitely out of scope for the first iteration: Previewing transformations written in Python. On some platforms, this would require writing the final df.head(500) to a temp table, and then select * from that temp table.

How to know if it's Python?

  • In the case of dbt preview --select specific_model, we'd be able to inspect that model's language attribute.
  • Should preview --code also accept a --language argument?

@lostmygithubaccount
Copy link
Contributor

@jtcohen6 it's not uncommon for enterprise users to request the ability to disable preview in the IDE. would it be worth having a project-level setting with override flag we pass through to dbt Server and into Cloud? then they could have that as an option and always pass the override into their dbt commands

@jtcohen6
Copy link
Contributor

@lostmygithubaccount To me, dbt_project.yml doesn't feel like the right place to define that, versus an account-level option within dbt Cloud access administration. I'd expect that setting to flow from Cloud app config into Runtime + IDE. I suspect we don't want to show a "Preview" button at all in the IDE if it's disabled, rather than having everything appear the same but then raising a dbt-core error.

@iknox-fa
Copy link
Contributor

iknox-fa commented Feb 6, 2023

Per BLG 2/6/23
@ChenyuLInx is going to check with the IDE team to verify whether or not a run is needed to reflect the current state of the selected model.

@aranke What is the desired behavior for CLI usage wrt above, can you double check it with Doug / Jeremy?

@ChenyuLInx
Copy link
Contributor Author

Run is not needed.

@aranke
Copy link
Member

aranke commented Mar 16, 2023

Acceptance criteria for ticket:

  1. Support both inline and model execution similar to interactive compile.
  2. Support both command-line table printing and JSON output with flag output --json
  3. Limit the number of rows to show with the --limit flag.
    Use 5 by default to copy Pandas head behavior.

Nice to have:

  1. Deprecate the --show option for dbt seed and dbt build in favor of dbt preview cc @jtcohen6

@aranke aranke mentioned this issue Mar 24, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_api Issues related to dbtRunner Python entry point
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants