-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy load connections #1584
Comments
I think this was introduced in e2af871 with this change to the context/parser.py generate function: def generate(model, runtime_config, manifest, source_config):
# during parsing, we don't have a connection, but we might need one, so we
# have to acquire it.
# In the future, it would be nice to lazily open the connection, as in some
# projects it would be possible to parse without connecting to the db
with get_adapter(runtime_config).connection_named(model.get('name')):
return dbt.context.common.generate(
model, runtime_config, manifest, source_config, Provider()
) beckjake@, would it be possible to go back to making this lazily open the connection? I can't contact an actual bigquery client to run the compile command. |
(Meant to put the ampersand in front: @beckjake) |
Thanks for the report @heisencoder. Can you tell me a little bit more about your setup here? I'm interested in your note that you can't access a BQ client when running Additionally, I'm unable to reproduce this! I've removed |
I'm running I suspect that uninstalling gcloud could lead to an easy reproduction, although I don't know whether this will cause other unrelated failures. Maybe trying unplugging the network cable and then running the In any case, I think the main issue is due to greedily creating a connection during the compile (in the generate function). Jake's comment suggests that this could be approached via a lazy connection, although this was not implemented. |
Ok, that makes sense. Fundamentally, this line of reasoning is a little divergent from how we think about model compilation. Right now, the We could indeed lazily load connections, but I imagine that these future plans are necessarily going to involve running SQL against the warehouse in the Can you just say a little bit more about why you're running I hesitate to recommend this, but one approach may be to connect to a locally-running postgres instance in your hermetic environment. The resulting SQL might be postgres-specific, but depending on your needs, that might be acceptable. Edit: made this make more sense :) |
What I'm trying to do with the I'm happy to have the SQL interpolation happen at run time, so in any case would want to defer actual network connects to the run phase. |
Ok - that totally makes sense. It also helps me understand what #1538 was all about too. Thanks for the additional info! Maybe the right way to proceed here is to expose dbt's understanding of the graph after the parsing step? This seems like something we could support in a first-class way via the (eventual) Python API. dbt compiles projects in two steps:
After parsing, dbt will have a complete understanding of the shape of the graph, but will not yet have "correct" SQL for each of the models. Sounds like that wouldn't be a problem given your use case. |
In the short-term, I either need to revert back to version 0.13.1 or find a way to patch 0.14 to allow for lazily creating database connections. Longer term, I'd be happy if there was either a Python API or a command-line 'parse' option to create the Is the lazy evaluation option still on the table? I took out the database connection in the generate() function, but it failed on me later on. Thanks! |
I believe @beckjake implemented a new pattern for adapters in which each thread gets their own dedicated connection. I think the diff you're seeing here was a part of that change, though I'm not 100% certain. Jake, this is a little outside of my wheelhouse: how involved do you think it is to make these adapters lazy again? Is it something that could conceivably happen for 0.14.1? Or are we better off supporting this use case via some other mechanism? |
I don't love that we eagerly acquire the connection, and I am into trying to make it lazy load for 0.14.1 My thoughts were that "acquiring" a connection in the with block would mark it as ready to be acquired, and then on exit we'd only close it if it was acquired at some point (so the handle was non- |
Thanks Jake! Do you have a sense of when this might be ready? I'm trying to decide whether to revert back to 0.13.1 or look into moving forward with patches against 0.14. |
I think you might be best off reverting to 0.13.1 - we're wrapping up the release candidate for 0.14.0 presently, and I don't think this is a change we're going to be able to incorporate this late into the release cycle. Super happy to turn this around in short order after 0.14.0 drop though! Apologies for the inconvenience, but definitely looking to implement this change! |
Sounds good -- thanks! |
@beckjake Is there a tracking feature that I can follow to see progress on creating lazy connections? |
@heisencoder I'll use this issue for that |
Another thought on this same topic, would it make sense to address this issue by explicitly using a 'fake' version of the database? In the context of BigQuery, this could be accomplished by creating a new Either of these approaches would make Thoughts? |
Hey @heisencoder - dbt requires a connection to the database in order to correctly compile SQL. If you have code like: {{ config(materialized='incremental') }}
select * from my_big_table
{% if is_incremental() %}
where event_time > (select max(event_time) from {{ this }}
{% endif %} then the If you're so inclined, you can create a new "null" adapter for dbt which just implements the adapter interface and discards queries. I actually think that might be a kind of cool thing for dbt to ship with :) Note: if you call |
Hi @drewbanin The main goal of compiling the SQL is to test the SQL in an isolated environment that doesn't have a database connection. This would just be for simple syntax and schema testing, things that could be verified in a unit-test context. It look like the is_incremental macro defaults to returning false if there is no database connection, which is probably fine for our purposes. I'm assuming that if we're not using certain dbt power features that the compiled SQL will exactly match the executed SQL. Are there other macros other than is_incremental that we need to avoid? With regard to a "null" adapter, I'll probably need to create one to allow internally upgrading to 0.14. I'll look into putting one together, but I may also need certain BigQuery features, so I'm also considering creating one that extends the existing BigQuery adapter and just replaces the connection with a fake object of some kind. Thanks! |
The |
Hi @drewbanin With regard to implementing a 'null' adapter, I think for my purposes I'll need to extend the existing bigquery adapter because I need the result of the |
I think there could be merit to supporting a "null" adapter in dbt core, but probably not one which implements BigQuery-specific semantics. In general, we're aiming to have the output of I don't have a firm use-case for a generic null adapter in mind (it certainly would not output correct SQL for any given database) -- maybe just as a debugging / testing sort of utility? So, feel free to do whatever fits your needs here! |
Thank you Drew for your thoughts! I'll go ahead and create an internal fake BigQuery adapter then. |
This came up again today. We're going to prioritize this for an 0.15.x release - currently looking like 0.15.1. The impetus for this is that acquiring connections greedily can be really slow. This impacts more real-time queries, like ones that can be submitted to a dbt server. |
…nnections lazy-load connections (#1584)
…lazy-load-connections Revert "lazy-load connections (#1584)"
…nnections lazy-load connections (#1584)
Issue
The
dbt compile
command requiresgcloud
for bigquery profiles.This is a regression from version 0.13.1, which did not require gcloud for running the compile command.
Issue description
Command:
dbt compile
dbt --debug compile --profiles-dir . --vars "{target-path: my_path}";
Results
Error Summary:
dbt_compile_error.txt
System information
The output of
dbt --version
:Running with dbt=0.14.0-a2
The operating system you're running on:
Debian-based Linux with kernel version 4.19.37
The python version you're using (probably the output of
python --version
)2.7
Steps to reproduce
dbt compile
with a project that uses bigquery and withgcloud
missing from the path.The text was updated successfully, but these errors were encountered: