Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3231] [Bug] Warehouse connection and post-run-hook logs get conflated when using dbtRunner #8866

Closed
2 tasks done
racheldaniel opened this issue Oct 17, 2023 · 8 comments · Fixed by #9631
Closed
2 tasks done
Assignees
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: Exp python_api Issues related to dbtRunner Python entry point
Milestone

Comments

@racheldaniel
Copy link
Contributor

racheldaniel commented Oct 17, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When using a post-run hook in one's project and invoking dbt through dbtRunner, some of the data warehouse connection events (NewConnection, ConnectionUsed, NewConnectionOpening) come through in the logs as the node hooks/{{dbt_project}}-on-run-end-0.sql. These events contain node_started_at and node_finished_at fields, and we see that the node_started_at time for these events will match the hooks/{{dbt_project}}-on-run-end-0.sql node's node_finished_at time from the previous run .

Ex. I did 2 consecutive runs of the command dbt --log-format=json --log-level=debug run --select stg_customers . I've attached the complete log files from those runs. Below are the node_started_at and node_finished_at times pulled from the logs for the two nodes that this command runs, including the timestamps that are shown for data warehouse events with the on-run-end-0 node name. Note the matching timestamps in bold

Run 1
stg_customers node_started_at: 2023-10-17T15:39:34.591460
on-run-end-0 node_started_at NewConnection: 2023-10-17T15:39:29.441266
on-run-end-0 node_started_at LogHookStartLine: 2023-10-17T15:39:35.209166

stg_customers node_finished at: 2023-10-17T15:39:34.974192
on-run-end-0 node_finished_at NewConnection: 2023-10-17T15:39:29.504932
on-run-end-0 node_finished_at LogHookEndLine: 2023-10-17T15:39:35.263544

Run 2

stg_customers node_started_at: 2023-10-17T15:52:40.656494
on-run-end-0 node_started_at NewConnection: 2023-10-17T15:39:35.209166
on-run-end-0 node_started_at LogHookEndLine: 2023-10-17T15:52:41.374772

stg_customers node_finished at: 2023-10-17T15:52:41.134269
on-run-end-0 node_finished_at NewConnection: 2023-10-17T15:39:35.263544
on-run-end-0node_finished_at LogHookEndLine: 2023-10-17T15:52:41.436292

dbt-run-1.log
dbt-run-2.log

Expected Behavior

I would not expect to see any node data in a NewConnection event at all, much less node data that seems to reflect a different node from a prior run.

This is what the NewConnection events look like when using dbtRunner to execute:

{"data": {"conn_name": "master", "conn_type": "postgres", "node_info": {"materialized": "table", "meta": {}, "node_finished_at": "2023-10-17T15:52:41.436292", "node_name": "jaffle_shop-on-run-end-0", "node_path": "hooks/jaffle_shop-on-run-end-0.sql", "node_relation": {"alias": "jaffle_shop-on-run-end-0", "database": "dbt", "relation_name": "", "schema": "dbt_rdaniel"}, "node_started_at": "2023-10-17T15:52:41.374772", "node_status": "success", "resource_type": "operation", "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"}}, "info": {"category": "", "code": "E005", "extra": {}, "invocation_id": "62bbb8fa-4d66-43eb-bfef-f728f48b0bb8", "level": "debug", "msg": "Acquiring new postgres connection 'master'", "name": "NewConnection", "pid": 25364, "thread": "MainThread", "ts": "2023-10-17T16:00:18.459616Z"}}
{"data": {"conn_name": "list_dbt", "conn_type": "postgres"}, "info": {"category": "", "code": "E005", "extra": {}, "invocation_id": "62bbb8fa-4d66-43eb-bfef-f728f48b0bb8", "level": "debug", "msg": "Acquiring new postgres connection 'list_dbt'", "name": "NewConnection", "pid": 25364, "thread": "ThreadPoolExecutor-9_0", "ts": "2023-10-17T16:00:18.462447Z"}} 

And here is what the same NewConnection events look like when running dbt normally from the command line:

{"data": {"conn_name": "master", "conn_type": "postgres"}, "info": {"category": "", "code": "E005", "extra": {}, "invocation_id": "5fd50a08-2530-400b-a6c3-7145ce54ed12", "level": "debug", "msg": "Acquiring new postgres connection 'master'", "name": "NewConnection", "pid": 27364, "thread": "MainThread", "ts": "2023-10-17T16:06:11.586544Z"}}
{"data": {"conn_name": "list_dbt", "conn_type": "postgres"}, "info": {"category": "", "code": "E005", "extra": {}, "invocation_id": "5fd50a08-2530-400b-a6c3-7145ce54ed12", "level": "debug", "msg": "Acquiring new postgres connection 'list_dbt'", "name": "NewConnection", "pid": 27364, "thread": "ThreadPoolExecutor-0_0", "ts": "2023-10-17T16:06:11.588122Z"}}

Steps To Reproduce

  1. Set up a dbt project that contains a post-run hook. I replicated by adding the following to my dbt_project.yml:
    on-run-end:
      - "select 1;"
    
  2. In a python repl:
    >>> from dbt.cli.main import dbtRunner
    >>> dbt = dbtRunner()
    >>> dbt.invoke(["--log-format=json", "--log-level=debug", "run", "--select", "stg_customers"]) 
    
  3. Observe that the NewConnection event is associated with the on-run-end node
  4. Repeat step 2
  5. Observe that 3 is again true, and that the node_started_at timestamp for this NewConnection event matches the node_finished_at timestamp from the previous run's LogHookEndLine event

Relevant log output

No response

Environment

- OS: MacOS Ventura 13.5.2, Ubuntu 20.04
- Python: 3.8.10
- dbt: 1.6.6

Which database adapter are you using with dbt?

postgres

Additional Context

No response

@racheldaniel racheldaniel added bug Something isn't working triage labels Oct 17, 2023
@github-actions github-actions bot changed the title [Bug] Warehouse connection and post-run-hook logs get conflated when using dbtRunner [CT-3231] [Bug] Warehouse connection and post-run-hook logs get conflated when using dbtRunner Oct 17, 2023
@dbeatty10 dbeatty10 self-assigned this Oct 27, 2023
@dbeatty10
Copy link
Contributor

@racheldaniel Awesome to jam with you earlier today to try to get to the bottom of this!

Python script for further exploration

Here's a Python script that we can tweak to filter out just the JSON objects that we want to examine further:

python script
# runner.py

import json
import time
from dbt.cli.main import dbtRunner


# Specify the project & model name to examine
PROJECT_NAME = "jaffle_shop"
MODEL_NAME = "stg_customers"

# Amount of time to pause between runs (in seconds)
PAUSE_BETWEEN_RUNS = 1  # in seconds

# Event names to filter upon
EVENT_NAMES = ("NewConnection", "LogHookEndLine")
NODE_NAME = f"{PROJECT_NAME}-on-run-end-0"
UNIQUE_ID = f"operation.{PROJECT_NAME}.{NODE_NAME}"

# Specify the path to dbt logs in JSON Lines (JSON-L) format
LOG_PATH = "logs/dbt.log"


def log_hook_events(json_obj):
    """Filter to check for specific event names"""
    return (
        "info" in json_obj
        and "name" in json_obj["info"]
        and json_obj["info"]["name"] in EVENT_NAMES
    )


def node_name(json_obj):
    """Filter to check for a specific node name"""
    return (
        "data" in json_obj
        and "node_info" in json_obj["data"]
        and "node_name" in json_obj["data"]["node_info"]
        and json_obj["data"]["node_info"]["node_name"] == NODE_NAME
    )


def node_unique_id(json_obj):
    """Filter to check for a specific node name"""
    return (
        "data" in json_obj
        and "node_info" in json_obj["data"]
        and "unique_id" in json_obj["data"]["node_info"]
        and json_obj["data"]["node_info"]["unique_id"] == UNIQUE_ID
    )


def filter_jsonl(file_path, json_filter_func=log_hook_events):
    filtered_objects = []

    try:
        with open(file_path, "r") as file:
            # Read the file line by line and parse each JSON object
            for line in file:
                json_obj = json.loads(line)

                # Check whether the callback function evaluates to True for this JSON object
                if json_filter_func(json_obj):
                    filtered_objects.append(json_obj)
    except FileNotFoundError:
        print(f"The file {file_path} does not exist.")
    except json.JSONDecodeError:
        print("A line in the file does not contain valid JSON.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

    return filtered_objects


def filter_json_logs(filter_func=log_hook_events, show=True):
    matches = filter_jsonl(LOG_PATH, filter_func)

    print(f"Found {len(matches)} JSON objects that match your filter")

    if show:
        # Pretty-print each JSON object
        for json_obj in matches:
            print(json.dumps(json_obj, indent=4))

    return matches


def main():
    dbt = dbtRunner()

    cmd = [
        "--quiet",
        "--log-format=json",
        "--log-level=debug",
        "run",
        "--select",
        MODEL_NAME,
    ]

    dbt.invoke(cmd)

    # Use a function to filter JSON logs. Options are:
    # - log_hook_events
    # - node_name
    # - node_unique_id
    filter_json_logs(filter_func=log_hook_events, show=True)


if __name__ == "__main__":
    main()
Commands:
dbt clean
python runner.py
sleep 60
python runner.py
Here's the full output:
(postgres_1.6) $ dbt clean
python runner.py
sleep 60
python runner.py
17:42:37  Running with dbt=1.6.6
17:42:37  Checking target/*
17:42:37  Cleaned target/*
17:42:37  Checking dbt_packages/*
17:42:37  Cleaned dbt_packages/*
17:42:37  Checking logs/*
17:42:37  Cleaned logs/*
17:42:37  Finished cleaning all paths.
Found 3 JSON objects that match your filter
{
    "data": {
        "conn_name": "master",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'master'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.145200Z"
    }
}
{
    "data": {
        "conn_name": "list_postgres",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'list_postgres'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "ThreadPoolExecutor-0_0",
        "ts": "2023-10-27T17:42:40.146457Z"
    }
}
{
    "data": {
        "execution_time": 0.0040040016,
        "index": 1,
        "node_info": {
            "materialized": "view",
            "meta": {},
            "node_finished_at": "2023-10-27T17:42:40.603161",
            "node_name": "jaffle_shop-on-run-end-0",
            "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
            "node_relation": {
                "alias": "jaffle_shop-on-run-end-0",
                "database": "postgres",
                "relation_name": "",
                "schema": "dbt_dbeatty"
            },
            "node_started_at": "2023-10-27T17:42:40.595296",
            "node_status": "success",
            "resource_type": "operation",
            "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
        },
        "statement": "jaffle_shop.on-run-end.0",
        "status": "SELECT 1",
        "total": 1
    },
    "info": {
        "category": "",
        "code": "Q033",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "info",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.00s]",
        "name": "LogHookEndLine",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.603343Z"
    }
}
Found 6 JSON objects that match your filter
{
    "data": {
        "conn_name": "master",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'master'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.145200Z"
    }
}
{
    "data": {
        "conn_name": "list_postgres",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'list_postgres'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "ThreadPoolExecutor-0_0",
        "ts": "2023-10-27T17:42:40.146457Z"
    }
}
{
    "data": {
        "execution_time": 0.0040040016,
        "index": 1,
        "node_info": {
            "materialized": "view",
            "meta": {},
            "node_finished_at": "2023-10-27T17:42:40.603161",
            "node_name": "jaffle_shop-on-run-end-0",
            "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
            "node_relation": {
                "alias": "jaffle_shop-on-run-end-0",
                "database": "postgres",
                "relation_name": "",
                "schema": "dbt_dbeatty"
            },
            "node_started_at": "2023-10-27T17:42:40.595296",
            "node_status": "success",
            "resource_type": "operation",
            "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
        },
        "statement": "jaffle_shop.on-run-end.0",
        "status": "SELECT 1",
        "total": 1
    },
    "info": {
        "category": "",
        "code": "Q033",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "info",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.00s]",
        "name": "LogHookEndLine",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.603343Z"
    }
}
{
    "data": {
        "conn_name": "master",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'master'",
        "name": "NewConnection",
        "pid": 42018,
        "thread": "MainThread",
        "ts": "2023-10-27T17:43:44.788232Z"
    }
}
{
    "data": {
        "conn_name": "list_postgres",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'list_postgres'",
        "name": "NewConnection",
        "pid": 42018,
        "thread": "ThreadPoolExecutor-0_0",
        "ts": "2023-10-27T17:43:44.789530Z"
    }
}
{
    "data": {
        "execution_time": 0.0034229755,
        "index": 1,
        "node_info": {
            "materialized": "view",
            "meta": {},
            "node_finished_at": "2023-10-27T17:43:45.107894",
            "node_name": "jaffle_shop-on-run-end-0",
            "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
            "node_relation": {
                "alias": "jaffle_shop-on-run-end-0",
                "database": "postgres",
                "relation_name": "",
                "schema": "dbt_dbeatty"
            },
            "node_started_at": "2023-10-27T17:43:45.100886",
            "node_status": "success",
            "resource_type": "operation",
            "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
        },
        "statement": "jaffle_shop.on-run-end.0",
        "status": "SELECT 1",
        "total": 1
    },
    "info": {
        "category": "",
        "code": "Q033",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "info",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.00s]",
        "name": "LogHookEndLine",
        "pid": 42018,
        "thread": "MainThread",
        "ts": "2023-10-27T17:43:45.108046Z"
    }
}
Here's just the final JSON objects that we filtered for:
{
    "data": {
        "conn_name": "master",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'master'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.145200Z"
    }
}
{
    "data": {
        "conn_name": "list_postgres",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'list_postgres'",
        "name": "NewConnection",
        "pid": 41233,
        "thread": "ThreadPoolExecutor-0_0",
        "ts": "2023-10-27T17:42:40.146457Z"
    }
}
{
    "data": {
        "execution_time": 0.0040040016,
        "index": 1,
        "node_info": {
            "materialized": "view",
            "meta": {},
            "node_finished_at": "2023-10-27T17:42:40.603161",
            "node_name": "jaffle_shop-on-run-end-0",
            "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
            "node_relation": {
                "alias": "jaffle_shop-on-run-end-0",
                "database": "postgres",
                "relation_name": "",
                "schema": "dbt_dbeatty"
            },
            "node_started_at": "2023-10-27T17:42:40.595296",
            "node_status": "success",
            "resource_type": "operation",
            "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
        },
        "statement": "jaffle_shop.on-run-end.0",
        "status": "SELECT 1",
        "total": 1
    },
    "info": {
        "category": "",
        "code": "Q033",
        "extra": {},
        "invocation_id": "6f4b7e9f-7fcd-403d-a931-d20557c2a3b7",
        "level": "info",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.00s]",
        "name": "LogHookEndLine",
        "pid": 41233,
        "thread": "MainThread",
        "ts": "2023-10-27T17:42:40.603343Z"
    }
}
{
    "data": {
        "conn_name": "master",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'master'",
        "name": "NewConnection",
        "pid": 42018,
        "thread": "MainThread",
        "ts": "2023-10-27T17:43:44.788232Z"
    }
}
{
    "data": {
        "conn_name": "list_postgres",
        "conn_type": "postgres"
    },
    "info": {
        "category": "",
        "code": "E005",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "debug",
        "msg": "Acquiring new postgres connection 'list_postgres'",
        "name": "NewConnection",
        "pid": 42018,
        "thread": "ThreadPoolExecutor-0_0",
        "ts": "2023-10-27T17:43:44.789530Z"
    }
}
{
    "data": {
        "execution_time": 0.0034229755,
        "index": 1,
        "node_info": {
            "materialized": "view",
            "meta": {},
            "node_finished_at": "2023-10-27T17:43:45.107894",
            "node_name": "jaffle_shop-on-run-end-0",
            "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
            "node_relation": {
                "alias": "jaffle_shop-on-run-end-0",
                "database": "postgres",
                "relation_name": "",
                "schema": "dbt_dbeatty"
            },
            "node_started_at": "2023-10-27T17:43:45.100886",
            "node_status": "success",
            "resource_type": "operation",
            "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
        },
        "statement": "jaffle_shop.on-run-end.0",
        "status": "SELECT 1",
        "total": 1
    },
    "info": {
        "category": "",
        "code": "Q033",
        "extra": {},
        "invocation_id": "93134ad8-520f-49b1-8982-20e04bb9c209",
        "level": "info",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.00s]",
        "name": "LogHookEndLine",
        "pid": 42018,
        "thread": "MainThread",
        "ts": "2023-10-27T17:43:45.108046Z"
    }
}

Making tweaks

The script assumes the following project name and model name, which are configurable:

PROJECT_NAME = "jaffle_shop"
MODEL_NAME = "stg_customers"

If we want to change the events to filter upon, update this section:

EVENT_NAMES = ("NewConnection", "LogHookEndLine")

If we want to filter by node name instead of the "NewConnection" and "LogHookEndLine" events, change this:

filter_json_logs(filter_func=log_hook_events, show=True)

to be this instead:

filter_json_logs(filter_func=node_name, show=True)

@racheldaniel
Copy link
Contributor Author

racheldaniel commented Oct 30, 2023

Thank you, Doug! Per our conversation Friday, here is some clarifying information and some cleaner data for review:

In the ticket description the log excerpts I pasted in the ticket body were not from the same logs as those I linked, which is likely confusing. Here I've generated logs again following the same steps in the ticket.
run_hook_logs_1.json
run_hook_logs_2.json

Here are the problematic lines to note:

In run_hook_logs_2.json from the second run, this node can be observed close to the start of the run. Notice that it contains node_info for node_name jaffle_shop-on-run-end-0 but the event name is NewConnection.

{
  "data": {
      "conn_name": "master",
      "conn_type": "postgres",
      "node_info": {
          "materialized": "table",
          "meta": {},
          "node_finished_at": "2023-10-27T16:26:26.193613",
          "node_name": "jaffle_shop-on-run-end-0",
          "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
          "node_relation": {
              "alias": "jaffle_shop-on-run-end-0",
              "database": "dbt",
              "relation_name": "",
              "schema": "dbt_rdaniel"
          },
          "node_started_at": "2023-10-27T16:26:26.153592",
          "node_status": "success",
          "resource_type": "operation",
          "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
      }
  },
  "info": {
      "category": "",
      "code": "E005",
      "extra": {},
      "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
      "level": "debug",
      "msg": "Acquiring new postgres connection 'master'",
      "name": "NewConnection",
      "pid": 43359,
      "thread": "MainThread",
      "ts": "2023-10-27T16:31:35.955797Z"
  }
}

This event contains a node_started_at timestamp that you'll notice is before the first log's ts timestamp 2023-10-27T16:31:35.589978Z. So where did that node_started_at timestamp come from? If we look back at the logs from the first run in run_hook_logs_1.json, we can see that the timestamp matches the LogHookStartLine from the previous run:



{
  "data": {
    "index": 1,
    "node_info": {
      "materialized": "table",
      "meta": {},
      "node_finished_at": "",
      "node_name": "jaffle_shop-on-run-end-0",
      "node_path": "hooks/jaffle_shop-on-run-end-0.sql",
      "node_relation": {
        "alias": "jaffle_shop-on-run-end-0",
        "database": "dbt",
        "relation_name": "",
        "schema": "dbt_rdaniel"
      },
      "node_started_at": "2023-10-27T16:26:26.153592",
      "node_status": "started",
      "resource_type": "operation",
      "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0"
    },
    "statement": "jaffle_shop.on-run-end.0",
    "total": 1
  },
  "info": {
    "category": "",
    "code": "Q032",
    "extra": {},
    "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
    "level": "info",
    "msg": "1 of 1 START hook: jaffle_shop.on-run-end.0 .................................... [RUN]",
    "name": "LogHookStartLine",
    "pid": 43359,
    "thread": "MainThread",
    "ts": "2023-10-27T16:26:26.159236Z"
  }
}


I bring this up because it suggests some kind of caching going on in core, but the real problem this is causing in the IDE is that we do calculations for total elapsed time based on the node_started_at and node_finished_at fields, and because that first NewConnection event is coming across as associated with the jaffle_shop-on-run-end-0 node, we use that node_started_at time. The result is that in the IDE we see bizarre calculated elapsed times:

Screenshot 2023-10-30 at 9 58 09 AM


Here are the code paths in dbt-cloud that pull node_started_at and node_finished_at to build the NodeResult objects, and here is where those timestamps are pulled from those objects in cloud-ui to calculate elapsed time

Please let me know how I can further assist!

@dbeatty10
Copy link
Contributor

dbeatty10 commented Oct 31, 2023

I see what you are saying @racheldaniel.

run_hook_logs_1 -- all 7 events

Filtering run_hook_logs_1.json for events related to "node_name": "jaffle_shop-on-run-end-0" gives these 7 events, all of which have "node_started_at": "2023-10-27T16:26:26.153592":

Click to toggle JSON
{
    "info": {
        "name": "WritingInjectedSQLForNode",
        "msg": "Writing injected SQL for node \"operation.jaffle_shop.jaffle_shop-on-run-end-0\"",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.156651Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "LogHookStartLine",
        "msg": "1 of 1 START hook: jaffle_shop.on-run-end.0 .................................... [RUN]",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.159236Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.160144Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: /* {\"app\": \"dbt\", \"dbt_version\": \"1.6.6\", \"profile_name\": \"user\", \"target_name\": \"default\", \"connection_name\": \"master\"} */\nselect 1;",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.160914Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: SELECT 1 in 0.0 seconds",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.192099Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "LogHookEndLine",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.03s]",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.193827Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionClosed",
        "msg": "On master: Close",
        "invocation_id": "13428cbb-a2c6-4c11-a282-feb52cfcb0e5",
        "ts": "2023-10-27T16:26:26.195333Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}

run_hook_logs_2 -- first 25 events

Doing the same filter on run_hook_logs_2.json for events related to "node_name": "jaffle_shop-on-run-end-0" gives 32(!) events.

The first 25 all have "node_started_at": "2023-10-27T16:26:26.153592", just like run_hook_logs_1:

Click to toggle JSON
{
    "info": {
        "name": "NewConnection",
        "msg": "Acquiring new postgres connection 'master'",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:35.955797Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.500301Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: BEGIN",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.501344Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "NewConnectionOpening",
        "msg": "Opening a new connection, currently in state init",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.502637Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: BEGIN in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.689137Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.691572Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: /* {\"app\": \"dbt\", \"dbt_version\": \"1.6.6\", \"profile_name\": \"user\", \"target_name\": \"default\", \"connection_name\": \"master\"} */\nwith relation as (\n        select\n            pg_rewrite.ev_class as class,\n            pg_rewrite.oid as id\n        from pg_rewrite\n    ),\n    class as (\n        select\n            oid as id,\n            relname as name,\n            relnamespace as schema,\n            relkind as kind\n        from pg_class\n    ),\n    dependency as (\n        select distinct\n            pg_depend.objid as id,\n            pg_depend.refobjid as ref\n        from pg_depend\n    ),\n    schema as (\n        select\n            pg_namespace.oid as id,\n            pg_namespace.nspname as name\n        from pg_namespace\n        where nspname != 'information_schema' and nspname not like 'pg\\_%'\n    ),\n    referenced as (\n        select\n            relation.id AS id,\n            referenced_class.name ,\n            referenced_class.schema ,\n            referenced_class.kind\n        from relation\n        join class as referenced_class on relation.class=referenced_class.id\n        where referenced_class.kind in ('r', 'v', 'm')\n    ),\n    relationships as (\n        select\n            referenced.name as referenced_name,\n            referenced.schema as referenced_schema_id,\n            dependent_class.name as dependent_name,\n            dependent_class.schema as dependent_schema_id,\n            referenced.kind as kind\n        from referenced\n        join dependency on referenced.id=dependency.id\n        join class as dependent_class on dependency.ref=dependent_class.id\n        where\n            (referenced.name != dependent_class.name or\n             referenced.schema != dependent_class.schema)\n    )\n\n    select\n        referenced_schema.name as referenced_schema,\n        relationships.referenced_name as referenced_name,\n        dependent_schema.name as dependent_schema,\n        relationships.dependent_name as dependent_name\n    from relationships\n    join schema as dependent_schema on relationships.dependent_schema_id=dependent_schema.id\n    join schema as referenced_schema on relationships.referenced_schema_id=referenced_schema.id\n    group by referenced_schema, referenced_name, dependent_schema, dependent_name\n    order by referenced_schema, referenced_name, dependent_schema, dependent_name;",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.694010Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: SELECT 17 in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.750521Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "Rollback",
        "msg": "On master: ROLLBACK",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.759279Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.795450Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: BEGIN",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.796619Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: BEGIN in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.858170Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLCommit",
        "msg": "On master: COMMIT",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.859935Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.861057Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: COMMIT",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.862185Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: COMMIT in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.894646Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionClosed",
        "msg": "On master: Close",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:36.896428Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.314347Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: BEGIN",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.315668Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "NewConnectionOpening",
        "msg": "Opening a new connection, currently in state closed",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.316718Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: BEGIN in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.497314Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLCommit",
        "msg": "On master: COMMIT",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.498355Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.499139Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: COMMIT",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.499877Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: COMMIT in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.529682Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:26:26.153592",
            "node_finished_at": "2023-10-27T16:26:26.193613",
            "node_status": "success"
        }
    }
}

run_hook_logs_2 -- last 7 events

The final 7 have "node_started_at": "2023-10-27T16:31:37.532037", which is what we want/expect.

Click to toggle JSON
{
    "info": {
        "name": "WritingInjectedSQLForNode",
        "msg": "Writing injected SQL for node \"operation.jaffle_shop.jaffle_shop-on-run-end-0\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.535097Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "LogHookStartLine",
        "msg": "1 of 1 START hook: jaffle_shop.on-run-end.0 .................................... [RUN]",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.537988Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "ConnectionUsed",
        "msg": "Using postgres connection \"master\"",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.539327Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "SQLQuery",
        "msg": "On master: /* {\"app\": \"dbt\", \"dbt_version\": \"1.6.6\", \"profile_name\": \"user\", \"target_name\": \"default\", \"connection_name\": \"master\"} */\nselect 1;",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.540237Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "SQLQueryStatus",
        "msg": "SQL status: SELECT 1 in 0.0 seconds",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.569072Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "",
            "node_status": "started"
        }
    }
}
{
    "info": {
        "name": "LogHookEndLine",
        "msg": "1 of 1 OK hook: jaffle_shop.on-run-end.0 ....................................... [\u001b[32mSELECT 1\u001b[0m in 0.03s]",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.571483Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "2023-10-27T16:31:37.571084",
            "node_status": "success"
        }
    }
}
{
    "info": {
        "name": "ConnectionClosed",
        "msg": "On master: Close",
        "invocation_id": "6236acc0-e604-4827-a297-d797873695df",
        "ts": "2023-10-27T16:31:37.575315Z"
    },
    "data": {
        "node_info": {
            "node_name": "jaffle_shop-on-run-end-0",
            "node_started_at": "2023-10-27T16:31:37.532037",
            "node_finished_at": "2023-10-27T16:31:37.571084",
            "node_status": "success"
        }
    }
}

I don't know the reason for the "extra" 25 events in run_hook_logs_2, and I haven't been able to reproduce it locally.

I think we'll want to create a version of runner.py that reliably produces similar results in order to figure out what exactly is going on and verify that it is fixed.

Could you help try to do something like this that reproduces what you are seeing?

@dbeatty10 dbeatty10 removed the triage label Oct 31, 2023
@racheldaniel
Copy link
Contributor Author

@dbeatty10 Sure, let me see if I can reproduce via script

@racheldaniel
Copy link
Contributor Author

racheldaniel commented Oct 31, 2023

@dbeatty10 I tweaked your script so that it now does two runs, produces two output files for comparison, and creates an additional file that just has the node names, events and timestamps from those two runs. I'm doing this with postgres, and the only thing I've changed about the standard jaffle_shop project is that I added the following to my dbt_project.yml:

on-run-end:
  - "select 1;"

I'm sure all of the changes I made were not necessary-- for example I added some cleanup of old log files just to make absolutely sure file writes had nothing to do with the duplicate logs. Let me know if this doesn't work for you for some reason!

# test_runner.py

import json
import os
import time
from dbt.cli.main import dbtRunner


# Specify the project & model name to examine
PROJECT_NAME = "jaffle_shop"
MODEL_NAME = "stg_orders"

# Amount of time to pause between runs (in seconds)
PAUSE_BETWEEN_RUNS = 10  # in seconds

# Event names to filter upon
EVENT_NAMES = ("NewConnection", "LogHookEndLine")
NODE_NAME = f"{PROJECT_NAME}-on-run-end-0"
UNIQUE_ID = f"operation.{PROJECT_NAME}.{NODE_NAME}"

# Specify the path to dbt logs in JSON Lines (JSON-L) format
LOG_PATH = "logs/dbt.log"


def log_hook_events(json_obj):
    """Filter to check for specific event names"""
    return (
        "info" in json_obj
        and "name" in json_obj["info"]
        and json_obj["info"]["name"] in EVENT_NAMES
    )


def node_name(json_obj):
    """Filter to check for a specific node name"""
    return (
        "data" in json_obj
        and "node_info" in json_obj["data"]
        and "node_name" in json_obj["data"]["node_info"]
        and json_obj["data"]["node_info"]["node_name"] == NODE_NAME
    )


def node_unique_id(json_obj):
    """Filter to check for a specific node name"""
    return (
        "data" in json_obj
        and "node_info" in json_obj["data"]
        and "unique_id" in json_obj["data"]["node_info"]
        and json_obj["data"]["node_info"]["unique_id"] == UNIQUE_ID
    )


def filter_jsonl(file_path, json_filter_func=log_hook_events):
    filtered_objects = []

    try:
        with open(file_path, "r") as file:
            # Read the file line by line and parse each JSON object
            for line in file:
                json_obj = json.loads(line)

                # Check whether the callback function evaluates to True for this JSON object
                if json_filter_func(json_obj):
                    filtered_objects.append(json_obj)
    except FileNotFoundError:
        print(f"The file {file_path} does not exist.")
    except json.JSONDecodeError:
        print("A line in the file does not contain valid JSON.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

    return filtered_objects


def filter_json_logs(filter_func=log_hook_events, show=True, output_file=None):
    matches = filter_jsonl(LOG_PATH, filter_func)

    print(f"Found {len(matches)} JSON objects that match your filter")
    if output_file:
        with open(output_file, "w") as file:
            json.dump(matches, file, indent=4)
    if show:
        # Pretty-print each JSON object
        for json_obj in matches:
            print(json.dumps(json_obj, indent=4))
    
    return matches


def compare_node_timestamps(file_path_1, file_path_2):
    comparison_file = f"logs/{PROJECT_NAME}-{MODEL_NAME}-comparison.json"
    run_nodes = []

    def __make_json_obj(json_obj):
        return {
          'node_name': json_obj['data']['node_info']['node_name'],
          'unique_id': json_obj['data']['node_info']['unique_id'],
          'node_started_at': json_obj['data']['node_info']['node_started_at'],
          'node_finished_at': json_obj['data']['node_info']['node_finished_at'],
          'event_name': json_obj['info']['name']
        }

    def process_file(file_path):
        try:
            with open(file_path, "r") as file:
                content = json.load(file)
                for obj in content:
                    new_obj = __make_json_obj(obj)
                    run_nodes.append(new_obj)
        except Exception as e:
            print(f"An unexpected error occurred: {e}")

    process_file(file_path_1)
    run_nodes.append("___________________________END_RUN_1___________________________")
    process_file(file_path_2)

    with open(comparison_file, "w") as file:
        json.dump(run_nodes, file, indent=4)

    return run_nodes
        

def main(output_file=None):
    dbt = dbtRunner()

    cmd = [
        "--log-format=json",
        "--log-level=debug",
        "run",
        "--select",
        MODEL_NAME,
    ]

    dbt.invoke(cmd)

    # Use a function to filter JSON logs. Options are:
    # - log_hook_events
    # - node_name
    # - node_unique_id
    filter_json_logs(filter_func=node_name, show=True, output_file=output_file)

def run_and_cleanup(output_file):
  if os.path.exists(LOG_PATH):
    os.remove(LOG_PATH)
  if os.path.exists(output_file):
    os.remove(output_file)
  main(output_file)

if __name__ == "__main__":
  output_file_1 = f"logs/{PROJECT_NAME}-{MODEL_NAME}-filtered-1.json"
  run_and_cleanup(output_file_1)

  time.sleep(PAUSE_BETWEEN_RUNS)

  output_file_2 = f"logs/{PROJECT_NAME}-{MODEL_NAME}-filtered-2.json"
  run_and_cleanup(output_file_2)

  compare_node_timestamps(output_file_1, output_file_2)

@dbeatty10
Copy link
Contributor

This works perfect @racheldaniel 🤩

Here's the steps that I used locally to run this end-to-end:

Reprex

Clone the Jaffle Shop repo:

git clone https://github.com/dbt-labs/jaffle_shop.git
cd jaffle_shop

Copy-paste the test_runner.py script from here into the current working directory.

Modify dbt_project.yml to use my local profile name associated with dbt-postgres.

Then make sure the connection works:

dbt debug

Add the on-run-end hook to dbt_project.yml:

on-run-end:
  - "select 1;"

Run the script:

python test_runner.py

Then examine the output within logs/jaffle_shop-stg_orders-comparison.json.

In my local run, I can see the following the demonstrates the issue you are describing:

"node_started_at": "2023-10-31T16:13:21.526377" shows up for the first time:

    {
        "node_name": "jaffle_shop-on-run-end-0",
        "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0",
        "node_started_at": "2023-10-31T16:13:21.526377",
        "node_finished_at": "",
        "event_name": "WritingInjectedSQLForNode"
    },
    "___________________________END_RUN_1___________________________",

"node_started_at": "2023-10-31T16:13:21.526377" is re-used for the final time:

    {
        "node_name": "jaffle_shop-on-run-end-0",
        "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0",
        "node_started_at": "2023-10-31T16:13:21.526377",
        "node_finished_at": "2023-10-31T16:13:21.538143",
        "event_name": "SQLQueryStatus"
    },

"node_started_at" is finally updated for the first time:

    {
        "node_name": "jaffle_shop-on-run-end-0",
        "unique_id": "operation.jaffle_shop.jaffle_shop-on-run-end-0",
        "node_started_at": "2023-10-31T16:13:31.989195",
        "node_finished_at": "",
        "event_name": "WritingInjectedSQLForNode"
    },

@dbeatty10 dbeatty10 removed the triage label Oct 31, 2023
@dbeatty10 dbeatty10 removed their assignment Oct 31, 2023
@graciegoheen graciegoheen added Impact: Exp backport 1.7.latest High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Feb 7, 2024
@martynydbt martynydbt added this to the v1.8 milestone Feb 8, 2024
@dbeatty10 dbeatty10 added the python_api Issues related to dbtRunner Python entry point label Feb 9, 2024
@gshank
Copy link
Contributor

gshank commented Feb 12, 2024

The node_info is started and cleared by a contextmanager, so it's possible that the hooks are outside of that. It's also possible that the event_status info is getting saved and restored where it shouldn't be.

@gshank
Copy link
Contributor

gshank commented Feb 15, 2024

It looks like LogHookEndLine shouldn't have node_info, because it's for an entire run, not for the node in a call_runner. The adapter various Connection events sometimes are associated with a node and sometimes not. So I'm guessing that they are sometimes finding node_info that they shouldn't be. Still looking into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: Exp python_api Issues related to dbtRunner Python entry point
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants