[CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes #5646

lostmygithubaccount · 2022-08-11T17:16:28Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Community Slack report here:

https://getdbt.slack.com/archives/C03QUA7DWCW/p1660223963454249

I reproduced easily -- the first run had df = df.limit(100) line uncommented and no chained call on the prior line, the second run is what's shown in the screenshot:

Expected Behavior

Would not expect method chaining to cause errors like this on dbt.ref(...) or dbt.source(...). We should clearly document this behavior and give a better error method if there's some underlying reason why we can't easily fix this, otherwise we should fix it.

Steps To Reproduce

See screenshot above. Basically add a .limit(N) to your reference call, e.g. df = dbt.ref(model_name).limit(100).

Relevant log output

@lostmygithubaccount ➜ /workspaces/codyspace/nyc_taxi_with_python (main ✗) $ dbt run -s int_yellow 
17:10:04  Running with dbt=1.3.0-b1
17:10:04  Found 5 models, 0 tests, 0 snapshots, 1 analysis, 571 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
17:10:04  
17:10:05  Concurrency: 8 threads (target='dev')
17:10:05  
17:10:05  1 of 1 START python table model dbt_cody.int_yellow ............................ [RUN]
17:10:10  1 of 1 ERROR creating python table model dbt_cody.int_yellow ................... [ERROR in 4.58s]
17:10:10  
17:10:10  Finished running 1 table model in 0 hours 0 minutes and 5.57 seconds (5.57s).
17:10:10  
17:10:10  Completed with 1 error and 0 warnings:
17:10:10  
17:10:10  Database Error in model int_yellow (models/intermediate/int_yellow.py)
17:10:10    100357 (P0000): Python Interpreter Error:
17:10:10    Traceback (most recent call last):
17:10:10      File "_udf_code.py", line 83, in main
17:10:10      File "_udf_code.py", line 7, in model
17:10:10      File "_udf_code.py", line 53, in <lambda>
17:10:10      File "_udf_code.py", line 21, in ref
17:10:10    KeyError: 'stg_yellow'
17:10:10     in function INT_YELLOW__DBT_SP with handler main
17:10:10    compiled Code at target/run/nyc_taxi_with_python/models/intermediate/int_yellow.py
17:10:10  
17:10:10  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Environment

- OS: linux
- Python: 3.10
- dbt: 1.3 beta, installed via main branches

Which database adapter are you using with dbt?

snowflake

Additional Context

cc: @ChenyuLInx

No response

The text was updated successfully, but these errors were encountered:

ChenyuLInx · 2022-08-12T03:39:55Z

@lostmygithubaccount I looked into this issue a bit and noticed that this is related to how we traverse the parsed AST.
We can go with either options you purposed, through proper message is slightly easier. But properly fix this is totally doable, we can switch from just traverse the AST tree of python models to also use regex to parse refs and sources. Will estimate this in the upcoming BLG and we can decide which way we want to go.

Note to myself: some example regex for dbt.ref

(dbt\.ref\()[^\)\}]*(\))

jtcohen6 · 2022-08-12T13:16:26Z

Without having tested at all locally — I'd guess that the regex approach risks being significantly slower, and potentially liable to abuse, versus using Python's AST. I'd lean slightly in the direction of just documenting that dbt.ref() + dbt.source() cannot be chained with other method calls. It's an extra line of code for the user, not a big usability issue.

elongl · 2022-08-12T13:52:43Z

As the one who experienced and reported the issue, I personally think that it's preferable to give a technical solution here.
It's probably going to confuse a lot of users and that they will stumble upon it given that dbt.ref() and dbt.source() are among the most used functions that will be used in Python models.

ChenyuLInx · 2022-08-12T18:51:09Z

@jtcohen6 thinking again it should be possible to do all of this using the AST NodeVisitor. When I get to it, if it is really tricky to support syntax like dbt.ref(model_name).limit(100) or fancy_transform(dbt.ref(model_name)), I will make sure we raise a clear error.

lostmygithubaccount · 2022-08-12T19:08:36Z

@elongl definitely agree that'd be ideal, but to @jtcohen6's point we should be conscious of the implications. will defer to @ChenyuLInx to further investigate the options and performance/other technical impacts before we make a call.

@elongl appreciate you reporting this! it is something that's going to trip people up and good to catch early in the beta

iknox-fa · 2022-08-15T16:20:16Z

As part of estimation we're including doing a very basic benchmark of doing a full ast parse (as opposed to the top-level we do currently)

lostmygithubaccount added bug Something isn't working triage Team:Execution and removed triage labels Aug 11, 2022

github-actions bot changed the title ~~[Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes~~ [CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes Aug 11, 2022

lostmygithubaccount added the python_models label Aug 11, 2022

ChenyuLInx mentioned this issue Aug 17, 2022

more complex visit_Call to parse chained command #5677

Merged

6 tasks

ChenyuLInx closed this as completed in #5677 Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes #5646

[CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes #5646

lostmygithubaccount commented Aug 11, 2022 •

edited

Loading

ChenyuLInx commented Aug 12, 2022

jtcohen6 commented Aug 12, 2022

elongl commented Aug 12, 2022

ChenyuLInx commented Aug 12, 2022

lostmygithubaccount commented Aug 12, 2022

iknox-fa commented Aug 15, 2022

[CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes #5646

[CT-1034] [Bug] Unexpected behavior when chaining methods on dbt-ref'ed/sourced dataframes #5646

Comments

lostmygithubaccount commented Aug 11, 2022 • edited Loading

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

ChenyuLInx commented Aug 12, 2022

jtcohen6 commented Aug 12, 2022

elongl commented Aug 12, 2022

ChenyuLInx commented Aug 12, 2022

lostmygithubaccount commented Aug 12, 2022

iknox-fa commented Aug 15, 2022

lostmygithubaccount commented Aug 11, 2022 •

edited

Loading