-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] DbtCompileOperator + allow partial_parse.msgpack
to be read from S3/GCS/Azure Blob Storage.
#870
Comments
It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!
|
This would be a gamechanger. Similar to Dagster's implementation where models are split into different data assets but a full dbt run only requires a single dbt parse at the beginning. This would unlock that functionality in Airflow, where we can run each model as a seperate Airflow task that reads from the same cached manifest rather than forcing each task to perform a dbt parse and incurring a high cpu cost. Previous solutions involved building the manifest during CI however this prevents users from injecting any vars at runtime. I've given some thought to writing an Airflow task that parses the dbt project and passes the manifest to each Cosmos dbt task, but it's tough when you're using Celery/Kubernetes executor with workers spread across multiple machines. |
I've somewhat caught up on how partial parsing is implemented in 1.4.0, and I'm thinking this really should be two issues:
Will have to think a little more about this. I know there are some use cases for dbt artifacts post-execution in addition to just using the There is a bit of overlap in the writing part with what the dbt docs operator already does, which is both a blessing and a curse. A blessing because we already have an API (we don't need to reinvent the wheel), but a curse because we already have an API (we don't want to create two different pathways to interacting with S3/Azure/GCS). Unfortunately mimicking exactly the pattern used by the dbt docs operators, and having something like |
Hi, @dwreeves. I'm Dosu, and I'm helping the Cosmos team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
Our CICD uses a different config than what Cosmos does, and the difference in the profile is close to unavoidable, which makes it so partial parsing doesn't actually work:
I think the only way you can reasonably get this to work is to have a
DbtCompileOperator
, and then read thepartial_parse.msgpack
from S3.The output directory should be templated, ideally, so that you can use
ti.xcoms_pull()
or otherwise avoid any "clashing" across multiple simultaneous DAG runs. (Then at the end of each DAG run the user can do e.g.S3DeleteObjectsOperator()
to clean things up at their leisure.)The text was updated successfully, but these errors were encountered: