[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

renanleme · 2023-09-22T15:24:52Z

Is this a regression in a recent version of dbt-core?

I believe this is a regression in dbt-core functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

Afterm upgrading to 1.6.2 from 1.5.1 our build is taking way longer to start the first model
From around 7 to 8 minutes to complete to more than 1 hour without starting.

I'm running dbt build --selector X

python: 3.11.5
dbt_core: 1.5.1 -> 1.6.2
dbt-snowflake: 1.5.1-> 1.6.2

Models:

Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models

Selector example

  - name: XYZ
    definition:
      union:
        - '*'
        - exclude:
            - method: selector
              value: P1
            - method: selector
              value: P2
            - method: selector
              value: P3
            - method: selector
              value: P4
            - method: selector
              value: P5

Expected/Previous Behavior

That the build would take around 8 minutes to start after compiling all the models

Steps To Reproduce

root@b78d02b2ea5d:/usr/app/dbt# dbt build  --selector XYZ
15:10:34  Running with dbt=1.5.1
15:11:43  Found 2840 models, 2277 tests, 31 snapshots, 10 analyses, 909 macros, 1 operation, 78 seed files, 499 sources, 0 exposures, 0 metrics, 0 groups
15:18:21  
15:18:52  Concurrency: 8 threads (target='dev-ssh-password')
15:18:52  
15:18:52  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Takes around 6 to 7 minutes

Same command, different version

dbt build --selector XYZ
14:42:04  Running with dbt=1.6.2
14:42:05  Registered adapter: snowflake=1.6.2
14:42:06  Unable to do partial parsing because of a version mismatch
14:42:35  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models

And it goes for more than 50 minutes without starting with CPU at 100%

Relevant log output

Checking the log file there's nothing being logged after `Found 2840 models...` line besides the CPU usage at 100%

Environment

- OS: Running inside container - `python:3.11.5-slim-bookworm`
`Linux 5.15.49-linuxkit-pr #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 aarch64 GNU/Linux`
- Python: `3.11.5`
- dbt (working version): `1.5.1`
- dbt (regression version):  `1.6.2` and `1.6.3`

Which database adapter are you using with dbt?

snowflake

Additional Context

I thought that this could be related to this issue: [CT-2960] [Bug] Compile performance hit (DBT 1.4.6 -> 1.6.0) #8360
But I upgraded to 1.6.3 before posting here and the issue continues
I ran dbt compile --selector XYZ on both and both ran successfully

The text was updated successfully, but these errors were encountered:

graciegoheen · 2023-09-22T16:19:10Z

HI!

I am curious about this line of your output:

Unable to do partial parsing because of a version mismatch

If you try doing a dbt build using 1.5.1 with the no-partial-parse flag (dbt --no-partial-parse build), are you seeing the same performance issue?

renanleme · 2023-09-22T16:32:48Z

HI!

I am curious about this line of your output:
Unable to do partial parsing because of a version mismatch
If you try doing a dbt build using 1.5.1 with the no-partial-parse flag (dbt --no-partial-parse build), are you seeing the same performance issue?

Hi, Gracie!

Thanks for checking!

I just ran the with the flag as requested on 1.5.1 and it ran even faster

root@f5e591bc8ed1:/usr/app/dbt# dbt --no-partial-parse  build --selector XYZ
16:29:02  Running with dbt=1.5.1
16:29:36  Found 2840 models, 2277 tests, 31 snapshots, 10 analyses, 909 macros, 1 operation, 78 seed files, 499 sources, 0 exposures, 0 metrics, 0 groups
16:30:01  
16:30:16  Concurrency: 8 threads (target='dev-ssh-password')
16:30:16  
16:30:16  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Meanwhile, I'm testing the same with 1.6.3 just in case... let's see

peterallenwebb · 2023-09-22T17:52:08Z

@renanleme Would you be willing to do the following?

Run the command with 1.6.3
Wait about 10 minutes
Break with Ctrl-C to interrupt
Capture the Python stack trace and reply with it here?

It might give us a hint as to what is going on so that we can reproduce the issue.

renanleme · 2023-09-22T17:57:19Z

Hi, @peterallenwebb

Sure, it was running for more than 1 hour, here is the stack trace

root@2f12c5a4a5b4:/usr/app/dbt# dbt --no-partial-parse  build --selector XYZ
16:34:38  Running with dbt=1.6.3
16:34:39  Registered adapter: snowflake=1.6.2
16:35:07  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
^C17:55:51  Encountered an error:

17:55:51  Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 259, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/main.py", line 208, in build
    results = task.run()
              ^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 449, in run
    self._runtime_initialize()
  File "/venv/lib/python3.11/site-packages/dbt/task/compile.py", line 147, in _runtime_initialize
    super()._runtime_initialize()
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 146, in _runtime_initialize
    self.job_queue = self.get_graph_queue()
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 139, in get_graph_queue
    return selector.get_graph_queue(spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 312, in get_graph_queue
    selected_nodes = self.get_selected(spec)
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 303, in get_selected
    selected_nodes, indirect_only = self.select_nodes(spec)
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 158, in select_nodes
    direct_nodes, indirect_nodes = self.select_nodes_recursively(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 128, in select_nodes_recursively
    direct_nodes, indirect_nodes = self.get_nodes_from_criteria(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 95, in get_nodes_from_criteria
    neighbors = self.collect_specified_neighbors(spec, collected)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 115, in collect_specified_neighbors
    additional.update(self.graph.select_parents(selected, depth))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 72, in select_parents
    ancestors.update(self.ancestors(node, max_depth))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 31, in ancestors
    filtered_graph = self.exclude_edge_type("parent_test")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 45, in exclude_edge_type
    return nx.restricted_view(
           ^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/function.py", line 506, in restricted_view
    hide_edges = nxf.hide_diedges(edges)
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/filters.py", line 30, in hide_diedges
    edges = {(u, v) for u, v in edges}
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/filters.py", line 30, in <setcomp>
    edges = {(u, v) for u, v in edges}
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 51, in <genexpr>
    if self.graph[a][b].get("edge_type") == edge_type_to_exclude
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

root@2f12c5a4a5b4:/usr/app/dbt# date
Fri Sep 22 17:56:05 UTC 2023

My guess is probably some issue with the recursion when excluding some models as we are using exclusion on our selector

peterallenwebb · 2023-09-22T18:22:14Z

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

renanleme · 2023-09-22T18:27:20Z

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

Sure, I will try that in a bit

renanleme · 2023-09-22T19:05:04Z

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

Hey, @peterallenwebb

Looks like that's the issue.

I reverted this one: f1dddaa

And it worked 🎉

dbt --no-partial-parse  build --selector XYZ --target dev-ssh-password                                                                                                                                                                 
19:01:19  Running with dbt=1.6.3
19:01:19  Registered adapter: snowflake=1.6.2
19:01:46  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
19:02:13  
19:02:31  Concurrency: 8 threads (target='dev-ssh-password')
19:02:31  
19:02:31  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Let me know if I can help debugging more.

peterallenwebb · 2023-09-22T19:44:08Z

@renanleme Thanks for your extremely fast responses! I'll let the team know and we'll get a fix released as soon as we can.

renanleme · 2023-09-22T19:48:49Z

@renanleme Thanks for your extremely fast responses! I'll let the team know and we'll a fix released as soon as we can.

You are welcome!

I was testing and I think this this code below solves the issue here:

    def descendants(self, node: UniqueId, max_depth: Optional[int] = None) -> Set[UniqueId]:
        """Returns all nodes reachable from `node` in `graph`"""
        if not self.graph.has_node(node):
            raise DbtInternalError(f"Node {node} not found in the graph!")
        filtered_graph = self.exclude_edge_type("parent_test")
        return {child for _, child in nx.bfs_edges(filtered_graph, node, depth_limit=max_depth)}

    def exclude_edge_type(self, edge_type_to_exclude):
        return nx.subgraph_view(
            self.graph,
            filter_edge=partial(self.filter_test_edges, edge_type=edge_type_to_exclude)
        )

    def filter_test_edges(self, n1, n2, edge_type):
        return self.graph[n1][n2].get("edge_type", edge_type)

The amount of models are the same so it should be ok (I ran some tests, all passed, including tests/unit/test_graph_selection.py)

dbt --no-partial-parse  build --selector XYZ --target dev-ssh-password                                                                                                                                                        
20:15:43  Running with dbt=1.6.3
20:15:43  Registered adapter: snowflake=1.6.2
20:16:10  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
20:17:18  
20:17:34  Concurrency: 8 threads (target='dev-ssh-password')
20:17:34  
20:17:34  1 of 1260 START sql view model raw.dbt_meta.stg_dbt__exposures ................. [RUN]

Thanks @peterallenwebb !

peterallenwebb · 2023-09-25T21:17:37Z

Update on my progress. I created a quick random graph generator to test the relative performance of the Graph.ancestors function under some hopefully-reasonable assumptions. This is the script:

from dbt.graph.graph import Graph
import time
import networkx
import random

def rand_edge(u, v, prob_test_parent):
    return (u,v,{'edge_type': 'parent_test'}) if random.random() < prob_test_parent else (u, v)

def rand_dag(num_nodes, exp_out_degree, prob_test_parent, seed):
    complete_edge_count = num_nodes * (num_nodes-1) / 2.0
    edge_prob = exp_out_degree * num_nodes / complete_edge_count
    g = networkx.gnp_random_graph(num_nodes, edge_prob, seed, directed=True)
    dag = networkx.DiGraph([rand_edge(u, v, prob_test_parent) for (u,v) in g.edges() if u < v])
    return dag

def time_ancestors(g, test_nodes):
    start_time = time.perf_counter()
    
    for node in test_nodes:
        a = g.ancestors(node)
    
    stop_time = time.perf_counter()
    return stop_time - start_time

seed = 12345

g100 = Graph(rand_dag(100, 10.0, 0.15, seed))
g400 = Graph(rand_dag(400, 10.0, 0.15, seed))
g1600 = Graph(rand_dag(1600, 10.0, 0.15, seed))
g3600 = Graph(rand_dag(3600, 10.0, 0.15, seed))

print("g100: " + str(time_ancestors(g100, [10, 20, 40, 60, 80])))
print("g400: " + str(time_ancestors(g400, [40, 80, 160, 240, 320])))
print("g1600: " + str(time_ancestors(g1600, [160, 320, 640, 960, 1290])))
print("g3600: " + str(time_ancestors(g3600, [300, 600, 1000, 2000, 3200])))

The results on some different branches:

1.5.latest
g100: 0.000161666888743639
g400: 0.00036137504503130913
g1600: 0.0011218339204788208
g3600: 0.0029966658912599087

1.6.latest
g100: 0.00265895901247859
g400: 0.009041625075042248
g1600: 0.0356107079423964
g3600: 0.08249700022861362

main
g100: 0.0025829579681158066
g400: 0.009305458050221205
g1600: 0.03589854063466191
g3600: 0.08135524997487664

patched according to current PR
g100: 0.0015306249260902405
g400: 0.003078209236264229
g1600: 0.010677333921194077
g3600: 0.02506020897999406

Observations

Assuming the random graphs are a reasonable approximation of the real world, we regressed the performance of this function in 1.6.0, and it now takes forty times longer to execute. However, even with the currently proposed fix PR, the performance is still much worse (ten times longer) than in 1.5.latest.

It's possible that on real-world data the PR is a better fix than these tests indicate, but it's hard to know, so I'm working on a more fundamental fix.

peterallenwebb · 2023-09-26T15:17:04Z

reopening until backport is complete

peterallenwebb · 2023-09-26T16:15:50Z

@renanleme Your contribution has been merged and backported. It will appear soon in dbt 1.6.4. Thank you for contributing to dbt-core!

renanleme added bug Something isn't working regression triage labels Sep 22, 2023

github-actions bot changed the title ~~[Regression] build never starts on 1.6.2~~ [CT-3144] [Regression] build never starts on 1.6.2 Sep 22, 2023

dbeatty10 changed the title ~~[CT-3144] [Regression] build never starts on 1.6.2~~ [CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 Sep 22, 2023

graciegoheen added awaiting_response and removed triage labels Sep 22, 2023

github-actions bot added triage and removed awaiting_response labels Sep 22, 2023

renanleme mentioned this issue Sep 22, 2023

CT-3144 Fix test edges type filter on Graph #8696

Merged

5 tasks

dbeatty10 removed the triage label Sep 24, 2023

peterallenwebb self-assigned this Sep 25, 2023

graciegoheen added the Highest Severity critical bug that must be resolved immediately label Sep 25, 2023

peterallenwebb closed this as completed in #8696 Sep 26, 2023

peterallenwebb reopened this Sep 26, 2023

peterallenwebb added the backport 1.6.latest label Sep 26, 2023

peterallenwebb mentioned this issue Sep 26, 2023

CT-3144 Fix test edges type filter on Graph (#8696) #8716

Merged

5 tasks

peterallenwebb closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

renanleme commented Sep 22, 2023 •

edited

Loading

graciegoheen commented Sep 22, 2023

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 22, 2023

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 25, 2023 •

edited

Loading

peterallenwebb commented Sep 26, 2023

peterallenwebb commented Sep 26, 2023

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

Comments

renanleme commented Sep 22, 2023 • edited Loading

Is this a regression in a recent version of dbt-core?

Current Behavior

Expected/Previous Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

graciegoheen commented Sep 22, 2023

renanleme commented Sep 22, 2023 • edited Loading

peterallenwebb commented Sep 22, 2023

renanleme commented Sep 22, 2023 • edited Loading

peterallenwebb commented Sep 22, 2023 • edited Loading

renanleme commented Sep 22, 2023

renanleme commented Sep 22, 2023 • edited Loading

peterallenwebb commented Sep 22, 2023 • edited Loading

renanleme commented Sep 22, 2023 • edited Loading

peterallenwebb commented Sep 25, 2023 • edited Loading

Observations

peterallenwebb commented Sep 26, 2023

peterallenwebb commented Sep 26, 2023

renanleme commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 22, 2023 •

edited

Loading

renanleme commented Sep 22, 2023 •

edited

Loading

peterallenwebb commented Sep 25, 2023 •

edited

Loading