[Bug]: Jobs should have a NULL `execution_finish` if job is currently running #7239

RobAtticus · 2024-09-06T01:31:47Z

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Background worker, Policy

What happened?

While a job is running, the execution_finish field of _timescaledb_internal.bgw_job_stat_history should be set to NULL (and likely succeeded as well) until the job actually finishes or crashes. Instead succeeded is set to FALSE (which, could be argued as correct, though misleading) and execution_finish is set to roughly the same time as execution_start (modulo a few microseconds).

This creates a confusing result in timescaledb_information.job_errors where err_message says job crash detected, see server logs which is not true.

TimescaleDB version affected

2.16+

PostgreSQL version used

15

What operating system did you use?

Cloud

What installation method did you use?

Not applicable

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

-- check that job is running/active
tsdb=> SELECT query_start, state, application_name, query FROM pg_stat_activity WHERE application_name LIKE '%[1150]%';
          query_start          | state  |     application_name      |                      query
-------------------------------+--------+---------------------------+--------------------------------------------------
 2024-09-06 01:04:05.230171+00 | active | Compression Policy [1150] | CALL _timescaledb_functions.policy_compression()
(1 row)

-- check job_errors
tsdb=> SELECT * FROM timescaledb_information.job_errors WHERE job_id = 1150;
 job_id |      proc_schema       |     proc_name      | pid  |          start_time           |          finish_time          | sqlerrcode |             err_message
--------+------------------------+--------------------+------+-------------------------------+-------------------------------+------------+-------------------------------------
   1150 | _timescaledb_functions | policy_compression | 2984 | 2024-09-06 01:04:05.225839+00 | 2024-09-06 01:04:05.225858+00 |            | job crash detected, see server logs
(1 row)

-- check internal entry
tsdb=> SELECT * FROM _timescaledb_internal.bgw_job_stat_history WHERE succeeded = FALSE AND job_id = 1150;
   id    | job_id | pid  |        execution_start        |       execution_finish        | succeeded |                                                                                                                                                        data
---------+--------+------+-------------------------------+-------------------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1618245 |   1150 | 2984 | 2024-09-06 01:04:05.225839+00 | 2024-09-06 01:04:05.225858+00 | f         | {"job": {"owner": "tsdbadmin", "proc_name": "policy_compression", "scheduled": true, "max_retries": -1, "max_runtime": "00:00:00", "proc_schema": "_timescaledb_functions", "retry_period": "01:00:00", "hypertable_id": 1, "initial_start": "01:00:00", "fixed_schedule": true, "schedule_interval": "00:05:00"}}
(1 row)

How can we reproduce the bug?

1. Wait for a job to start as a bg worker. It must take a non-trivial amount of time to run
2. Check `timescaledb_information.job_errors` and `_timescaledb_internal.bgw_job_stat_history`

The text was updated successfully, but these errors were encountered:

RobAtticus · 2024-09-06T01:32:15Z

@fabriziomello assigned you since you did this feature originally, but feel free to re-assign

RobAtticus added the bug label Sep 6, 2024

RobAtticus assigned fabriziomello Sep 6, 2024

RobAtticus added internal-team-ask bgw The background worker subsystem, including the scheduler labels Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Jobs should have a NULL `execution_finish` if job is currently running #7239

[Bug]: Jobs should have a NULL `execution_finish` if job is currently running #7239

RobAtticus commented Sep 6, 2024

RobAtticus commented Sep 6, 2024

[Bug]: Jobs should have a NULL execution_finish if job is currently running #7239

[Bug]: Jobs should have a NULL execution_finish if job is currently running #7239

Comments

RobAtticus commented Sep 6, 2024

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

RobAtticus commented Sep 6, 2024

[Bug]: Jobs should have a NULL `execution_finish` if job is currently running #7239

[Bug]: Jobs should have a NULL `execution_finish` if job is currently running #7239