Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Jobs should have a NULL execution_finish if job is currently running #7239

Open
RobAtticus opened this issue Sep 6, 2024 · 1 comment
Assignees
Labels
bgw The background worker subsystem, including the scheduler bug internal-team-ask

Comments

@RobAtticus
Copy link
Member

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Background worker, Policy

What happened?

While a job is running, the execution_finish field of _timescaledb_internal.bgw_job_stat_history should be set to NULL (and likely succeeded as well) until the job actually finishes or crashes. Instead succeeded is set to FALSE (which, could be argued as correct, though misleading) and execution_finish is set to roughly the same time as execution_start (modulo a few microseconds).

This creates a confusing result in timescaledb_information.job_errors where err_message says job crash detected, see server logs which is not true.

TimescaleDB version affected

2.16+

PostgreSQL version used

15

What operating system did you use?

Cloud

What installation method did you use?

Not applicable

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

-- check that job is running/active
tsdb=> SELECT query_start, state, application_name, query FROM pg_stat_activity WHERE application_name LIKE '%[1150]%';
          query_start          | state  |     application_name      |                      query
-------------------------------+--------+---------------------------+--------------------------------------------------
 2024-09-06 01:04:05.230171+00 | active | Compression Policy [1150] | CALL _timescaledb_functions.policy_compression()
(1 row)

-- check job_errors
tsdb=> SELECT * FROM timescaledb_information.job_errors WHERE job_id = 1150;
 job_id |      proc_schema       |     proc_name      | pid  |          start_time           |          finish_time          | sqlerrcode |             err_message
--------+------------------------+--------------------+------+-------------------------------+-------------------------------+------------+-------------------------------------
   1150 | _timescaledb_functions | policy_compression | 2984 | 2024-09-06 01:04:05.225839+00 | 2024-09-06 01:04:05.225858+00 |            | job crash detected, see server logs
(1 row)

-- check internal entry
tsdb=> SELECT * FROM _timescaledb_internal.bgw_job_stat_history WHERE succeeded = FALSE AND job_id = 1150;
   id    | job_id | pid  |        execution_start        |       execution_finish        | succeeded |                                                                                                                                                        data
---------+--------+------+-------------------------------+-------------------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1618245 |   1150 | 2984 | 2024-09-06 01:04:05.225839+00 | 2024-09-06 01:04:05.225858+00 | f         | {"job": {"owner": "tsdbadmin", "proc_name": "policy_compression", "scheduled": true, "max_retries": -1, "max_runtime": "00:00:00", "proc_schema": "_timescaledb_functions", "retry_period": "01:00:00", "hypertable_id": 1, "initial_start": "01:00:00", "fixed_schedule": true, "schedule_interval": "00:05:00"}}
(1 row)

How can we reproduce the bug?

1. Wait for a job to start as a bg worker. It must take a non-trivial amount of time to run
2. Check `timescaledb_information.job_errors` and `_timescaledb_internal.bgw_job_stat_history`
@RobAtticus
Copy link
Member Author

@fabriziomello assigned you since you did this feature originally, but feel free to re-assign

@RobAtticus RobAtticus added internal-team-ask bgw The background worker subsystem, including the scheduler labels Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgw The background worker subsystem, including the scheduler bug internal-team-ask
Projects
None yet
Development

No branches or pull requests

2 participants