PostgreSQL error in trigger function #19

mreithub · 2016-05-13T17:39:15Z

Apparently when there's high load, it's possible that the trigger function isn't as bullet proof as expected.
I'm (very rarely, but still) get the following error on a table that every 30secs gets ~100-200 updates (pretty much all at the same time):

SQL [
    UPDATE ws_client_state SET ts=now(), ip=?::inet, state=?
    WHERE host_id=?
];

ERROR: range lower bound must be less than or equal to range upper bound Where: SQL statement "
    UPDATE recall.ws_client_state_log
    SET _log_time = tstzrange(LOWER(_log_time), now())
    WHERE host_id = $1.host_id AND upper_inf(_log_time)
    AND LOWER(_log_time) != now()
" PL/pgSQL function recall._trigfn() line 38 at EXECUTE

The table in question looks like this (although I think that should be irrelevant for this issue):

               Table "public.ws_client_state"
 Column  |           Type           |       Modifiers        
---------+--------------------------+------------------------
 host_id | integer                  | not null
 ts      | timestamp with time zone | not null default now()
 ip      | inet                     | not null
 state   | character(1)             | not null
Indexes:
    "pk_ws_client_state" PRIMARY KEY, btree (host_id)
Foreign-key constraints:
    "ws_client_state_host_id_fkey" FOREIGN KEY (host_id) REFERENCES host(host_id)
Triggers:
    trig_recall AFTER INSERT OR DELETE OR UPDATE ON ws_client_state FOR EACH ROW EXECUTE PROCEDURE recall._trigfn()
Inherits: recall.ws_client_state_tpl

The log table's at ~570k entries right now and the error's been triggered maybe 30 times so far.
As I said, changes do tend to occur all roughly at the same time every 30 seconds and it's possible that the same data record in the data table is updated by two different sources at roughly the same time (so maybe it's an atomicity transactional issue, although afair trigger functions should be pretty much atomic).

Anyway, that's definitely something that needs to be sorted out before a v1.0 release (and before pushing the extension to PGXN for that matter...).

PS.: I know I'm storing the timestamp twice for each log entry (because of the ts column in the data table), but the overhead's pretty tiny

The text was updated successfully, but these errors were encountered:

mreithub · 2016-05-15T12:32:02Z

I've narrowed the problem down now. A pretty minimal setup to reproduce it would be the following:

CREATE TABLE foo (
  id integer not null primary key,
  val1 integer,
  val2 integer
)
SELECT recall.enable('foo', '1 month');

insert into foo (id) VALUES (123);

Having two transactions that change different columns of the same entry out of order results in the trigger function attempting (and failing) to create a log entry with a negative interval (which obviously fails):

t1: begin;
t1: select now();
t2: begin;
t2: select now(); -- t2.now() > t1.now(), because its transaction was started later
t2: update foo set val1=12 WHERE id=123; -- runs immediately, creates a log entry with lower(_log_time) = t2.now()
t1: update foo set val2=23 WHERE id=123; -- blocks until t2 is committed
t2: commit; -- which causes t1's update to run and fail (since the trigger function tries to set the end time of the current log entry for that key to t1.now(). Its start time is t2.now(), so we end up trying to set the entry's _log_time to a negative interval - which PostgreSQL refuses to do.

contents of the data table:

after t2's update: id=123, val1=12, val2=NULL
after t1's update got through: id=123, val1=12, val2=23 -- but since the trigger fails, this never sees the light of day

what pg_recall tries to write to the log table:

id=123, val1=NULL, val2=NULL, _log_time=[insert_ts, t2.now())
id=123, val1=12, val2=NULL, _log_time=[t2.now(), t1.now()) -- that one fails because t2.now() > t1.now(). PostgreSQL's intervals forbid that (and rightfully so)
id=123, val1=12, val2=23, _log_time=[t1.now(),) -- this log line never even gets written because

The problem (or one of them) is that t1's trigger function sees (and tries to update) data that wasn't there when the transaction started (because there was a lock on that record.

Possible fix 1:

if the update in the trigger function fails, use statement_timestamp() instead of now():

id=123, val1=NULL, val2=NULL, _log_time=[insert_ts, t2.now())
id=123, val1=12, val2=NULL, _log_time=[t2.now(), realNow())
id=123, val1=12, val2=23, _log_time=[realNow(),)

(works the same for more than two transactions)

advantage: log entries remain in their real-world order
disadvantage: it's no longer possible to group all changes that happened in one transaction simply by timestamp

Possible fix 2:

If the update in the trigger function fails, reorder the existing log entries:

id=123, val1=NULL, val2=NULL, _log_time=[insert_ts, t1.now())
id=123, val1=12, val2=23, _log_time=[t1.now(), t2.now()) -- that one changed order
id=123, val1=12, val2=NULL, _log_time[t2.now(),)
advantage: no timestamps are changed (so it's still possible to find all changes that happened in the context of a specific transaction)
disadvantage: the log entries are no longer in their logical order (and looking at the log table one would assume that the current state has a NULL value for val2).

The logical order issue is a real strong disadvantage that (in my opinion) disqualifies that attempt (as with it it's possible to call recall.at('tblName', now()) and get different results to what's in the data table)

Possible fix 3:

Log only the final change (i.e. in case of that error, put the new data values in the current log entry instead of setting its end time and creating a new one)

advantage: relatively simple, the end result is still reflected
disadvantage: log entries get skipped; it might look like a change happened in one transaction when it actually happened in another

My favorite right now is fix 1, but before deciding on a fix I'll take a few more days to investigate a little further.

mreithub added the bug label May 13, 2016

mreithub added this to the 1.0 milestone May 13, 2016

mreithub closed this as completed in 4e9e29f May 17, 2016

mreithub modified the milestones: 1.0, 0.9.6 May 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL error in trigger function #19

PostgreSQL error in trigger function #19

mreithub commented May 13, 2016 •

edited

Loading

mreithub commented May 15, 2016 •

edited

Loading

PostgreSQL error in trigger function #19

PostgreSQL error in trigger function #19

Comments

mreithub commented May 13, 2016 • edited Loading

mreithub commented May 15, 2016 • edited Loading

contents of the data table:

Possible fix 1:

Possible fix 2:

Possible fix 3:

mreithub commented May 13, 2016 •

edited

Loading

mreithub commented May 15, 2016 •

edited

Loading