Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for pre-update hook #702

Closed
wants to merge 3 commits into from
Closed

Conversation

jlongster
Copy link

I'm experimenting with adding support for the preupdate hook. This hook is fired before any INSERT, UPDATE, or DELETE operations are committed to the database, and gives you access to the old values and the new values. I don't know if the current maintainers are interested in having this, but regardless I'd like some feedback on implementing it. If you all do not want to merge this I can keep this patch locally.

This allows you to do this:

const sqlite3 = require('./sqlite3');
const db = new sqlite3.Database(":memory:");

function onUpdate(type, db, table, oldValues, newValues) {
  if(type === "insert") {
    console.log("inserted:", newValues);
  }
  else if(type === "update") {
    console.log("updated:", oldValues, " to ", newValues);
  }
}

db.addListener("preupdate", onUpdate);

db.run('CREATE TABLE foo (id INTEGER PRIMARY KEY,  name TEXT NOT NULL UNIQUE);', function() {
  db.run('INSERT INTO foo (name) VALUES ("James")', function() {
    db.run('UPDATE foo SET name="Sarah"', function() {
      db.removeListener("preupdate", onUpdate)
      db.close();
    });
  });
});

The output is:

inserted: [ 1, 'James' ]
updated: [ 1, 'James' ]  to  [ 1, 'Sarah' ]

The first big change is I need to update to sqlite 3.14. The preupdate hook does not exist in the version current bundled into node-sqlite.

Then, I needed to compile it with the SQLITE_ENABLE_PREUPDATE_HOOK option.

The rest of the code mostly follows the conventions of the profile and trace events. However, the preupdate hook is a bit more work. Because it gives you access to changed rows, I need to construct two Row values that I can emit with the event. A few problems with this:

  1. Row and Values is not available in database.cc as they are defined in statement.h. I needed to move them into database.h but this is probably not desirable. A better approach would be to create a new header file that both can include.
  2. The preupdate hook does not give you access to the column names. Only index and value. So I need to only create an array, not an object, so I can't reuse RowToJS from statement.cc. This also means any of the Values::Field objects need a "placeholder" name that I don't actually use. Again, there's probably a better way to do this.

The user is expected to query the column names on app startup, as there would surely be a big perf penalty by querying them on every preupdate event.

I'm looking for general feedback; is this the right approach? Any obvious memory leaks, etc?

@coveralls
Copy link

Coverage Status

Coverage decreased (-6.05%) to 79.745% when pulling c7ec3fb on jlongster:master into ef7c284 on mapbox:master.

@kroggen
Copy link

kroggen commented Sep 3, 2016

Nice!

Maybe getting the column names is not so slow because the fist database page (where schemas are stored) may be cached in memory. I use this code below to get the columns names. Note that it does not use sqlite3_step, just sqlite3_prepare is sufficient:

sqlite3_stmt * get_table_columns(sqlite3 *db, const char *table) {
   sqlite3_stmt *stmt=0;
   char *sql=0;

   sql = sqlite3_mprintf("SELECT * FROM %s WHERE 1=0", table);
   if (sql == 0) goto loc_exit;

   if (sqlite3_prepare_v2(db, sql, -1, &stmt, NULL) != SQLITE_OK) {
      sqlite3_finalize(stmt);
      stmt=0;
   }

loc_exit:   
   if (sql) sqlite3_free(sql);
   return stmt;

}

It returns the prepared statement with the columns names. Then we can use sqlite3_column_name(stmt, ncol).

@jlongster
Copy link
Author

Yeah, after a bunch of research I found someone else suggesting that as well. Still, I didn't know how much work prepare does.

Maybe getting the column names is not so slow because the fist database page (where schemas are stored) may be cached in memory.

Does sqlite3 manually do this? Keep some sort of cache? Or do you mean we will most likely not hit the disk as it will already be cached by the OS?

@jlongster
Copy link
Author

Looks like emitting an object with column names is significantly slower:

# Emitting an array without column names:
jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  0.49s user 0.05s system 145% cpu 0.367 total

jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  0.47s user 0.05s system 143% cpu 0.359 total

jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  0.48s user 0.05s system 143% cpu 0.368 total

# Emitting an object with column names:
jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  1.21s user 0.21s system 150% cpu 0.938 total

jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  1.19s user 0.20s system 151% cpu 0.924 total

jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  1.19s user 0.20s system 152% cpu 0.913 total

jlong-16106:~/projects/node-sqlite3(master)% time node lib/index.js
node lib/index.js  1.21s user 0.21s system 149% cpu 0.942 total

Goes from an average of ~.36s to ~.92. My test case is simple:

function onUpdate(type, db, table, oldValues, newValues) {
}

db.addListener("preupdate", onUpdate);

db.run('UPDATE transactions SET description="foo" WHERE amount < 40000', function(err) {
});

This is run against a database containing thousands of rows, and the query affecting 21568 of them. My first guess was that maybe creating JS objects are slower than arrays, but that shouldn't be true so there isn't much extra JS allocation. Must be the process of preparing a statement and getting the column names.

@kroggen
Copy link

kroggen commented Sep 6, 2016

This is an interesting result.

You can just retrieve the column names without creating the javascript object, just to check the speed.

And yes, the SQLite has a page cache but I guess it is used for modified pages to be kept on memory before a transaction is commited. I am not sure if the same page cache is used when running a second SQL query. I can test it.

@coveralls
Copy link

Coverage Status

Coverage decreased (-6.2%) to 79.609% when pulling 51a6348 on jlongster:master into ef7c284 on mapbox:master.

@kroggen
Copy link

kroggen commented Sep 6, 2016

Maybe it's because the hook function is being called for each modified row.

What is your goal with this code? Synchronization?

@jlongster
Copy link
Author

Maybe it's because the hook function is being called for each modified row.

Yes, which is why I didn't set it up that way in the first place. It's an advanced feature so users are expected to manage the column names themselves.

What is your goal with this code? Synchronization?

I'm developing a specialized app that needs to know when things change. Hard to explain without going into the details of the app. I'm expecting to maintain a fork for now and if there's interest I can help merge this in, but as said earlier I wouldn't be surprised if the maintainers didn't want this right now.

@jlongster
Copy link
Author

So there are 2 bad things still going on with this PR:

  • The preupdate events may fire after the callback to db.run is called (most of them fire before, but if firing a lot some come through later). It's unclear to me what is desirable. Should we enforce an order? Doing so may block the process for a while, if you're always waiting for all preupdate events to fire before finishing. Maybe the order is nondeterministic. Personally, I don't know enough about libuv to know how to enforce the order. This is all based on the async.h implementation which uses libuv signals.
  • Much more serious, the app doesn't seem to exit sometimes. I've banged my head against the wall for a while on this. I have no idea why. It must be something with the async.h watchers not getting cleared out or something. You can repro this by checking out git clone -b hang-repro https://github.com/jlongster/node-sqlite3.git and running node lib/index.js (I even committed the local db for testing). Run it several times and eventually the app won't exit, it just hangs.

I tried compiling a version of sqlite without mutexes just in case it was a race condition, but that didn't help. Tried a bunch of other stuff too. I think it only happens when it fires the hook a lot of times (here, about ~20000). Any guidance for how to debug this would be great.

@jlongster
Copy link
Author

Figured it out! These lines are problematic: https://github.com/mapbox/node-sqlite3/blob/master/src/async.h#L65-L69

If sending thousands of events, node just never shuts down.

indutny added a commit to indutny/node-sqlite3 that referenced this pull request Sep 6, 2016
`uv_ref()` can't be called from the non-event-loop thread. Furthermore,
it is not necessary in `async.h`, because open `uv_async_t` is already
holding the event loop from exiting.

See: TryGhost#702
@jlongster
Copy link
Author

This is the latest code that works pretty well for me. Got some help with the race condition and it is solved in #705 (thanks!).

The last issue is that the preupdate hook isn't guaranteed to be done firing before the run callback is called. Updates that trigger the hook may still come through after the run callback if thousands of them are coming through. This happens because the hook dispatches messages via libuv and as far as I can tell the run callback is called immediately when it's done, so there may still be preupdate messages about to be fired.

I think this is a problem, but not big enough to block me. I'll have to fix it at some point, so if anyone has pointers, that'd be great. Otherwise I'll fix it later.

@coveralls
Copy link

Coverage Status

Coverage decreased (-6.3%) to 79.474% when pulling ed16f6b on jlongster:master into 7322ea7 on mapbox:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-6.4%) to 79.439% when pulling 4a40a72 on jlongster:master into 7322ea7 on mapbox:master.

@mafischer
Copy link

Found this thread looking for this feature, any reason why this never got merged?

@daniellockyer
Copy link
Member

Hey @jlongster, would you be able to rebase the PR? 🙂

@jlongster
Copy link
Author

I use https://github.com/WiseLibs/better-sqlite3 now and I don't need this anymore so I don't have time to continue to work on this, sorry... (6 years later!)

@daniellockyer
Copy link
Member

No worries 🙂 I'm going to close this PR for now, but I may look into implementing this in the near future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants