Skip to content
Jason Petersen edited this page May 27, 2014 · 2 revisions

PostgresFDW-based Topsie

After evaluating the extensive (and almost feature-complete) work provided in postgres_fdw, I have a clear understanding of what it does (and doesn't) provide in terms of a platform for a well-written "real-time inserts" extension.

Current Feature Set

postgres_fdw's current workflow is complete and correct. In particular:

  • Sessions are wrapped in a topmost remote transaction with at least REPEATABLE READ semantics. This is done transparently to the end user
  • SAVEPOINTs are created remotely any time a new connection-using command is issued
  • (Sub-)transaction callbacks are implemented to correctly roll back remote (sub-)transactions during local failures
  • Query conditions are partitioned based on whether they can be safely executed remotely. If not, they are evaluated locally on returned results
  • RETURNING clauses are correctly implemented
  • Prepared statements and cursors are used for all remote operations
  • The (sub-)transaction callbacks clean up all prepared statements and cursors on error

Limitations

Some areas exist for improvement.

  • ANALYZE currently sends the entire table back to the local machine rather than using commands to advance the cursor when needing to skip sampled rows
  • UPDATE and DELETE behave kind of peculiarly: they participate in a ForeignScan similarly to SELECT but also grab locks (FOR UPDATE) on rows they will touch. postgres_fdw secretly adds ctid to the remote target list and uses it to issue subsequent UPDATE or DELETE commands.

The rationale behind the second (apparently inefficient) limitation is related to the support for qualifications that cannot be evaluated remotely: by scanning and filtering locally for a set of ctids, postgres_fdw supports all queries.

A comment in the code points out that if no qualifications (or join conditions) must be executed locally, then issuing an UPDATE/DELETE directly during the foreign scan phase (and doing nothing during the modify phase) would be preferable. This is a change we'll want to make.

Work

Here's what I'm doing…

Configuration Schema

A configuration schema is needed for nodes, shards, placements, etc. Some of this can live in the OPTIONS of the foreign table or server, but most should live in tables. This is complete.

Adapter Views

The above schema needs to be compatible with Citus' own representation of nodes, placements, etc., so I've written a set of VIEWs that can adapt the Citus metadata to what the extension expects. This is also complete, but may need some modification during testing with Citus.

Hash Function

Obviously we need a good hash function to shard our data. I've written one similar to hash_array that provides identical output to Citus' own hash function. hash_array provides a natural way to calculate multi-column partition hashes so we'll want its structure eventually.

Routing Algorithms

For each of the configuration table rows, structures and algorithms will be needed to support looking up where a row lives, where it should live, where a shard's replicas live, etc. These are underway for review this week.

INSERT Support

The postgres_fdw functions need modification to check the partition column and send rows to the correct machines (including duplicates to replicas). This is pretty straightforward once the above structures and algorithms exist: instead of the current assumption that a foreign table points to exactly one remote machine, we add loops to send rows to many machines. So some function signatures will change and whatnot but the core logic remains identical.

SELECT Support

This also is pretty straightforward: instead of opening one cursor on a remote machine, we open one cursor per shard needed for a given query. This might be a fixed number of shards (if pruning is implemented) or all shards. We'll iterate each query until all remote tables have been exhausted.

So there's a bit of bookkeeping that needs to happen (i.e. removing connections from our set of queried machines as we hit end of stream), but the logic again remains mostly the same.

UPDATE/DELETE Support

This is a bit trickier. Currently postgres_fdw scans remote tables with a SELECT FOR UPDATE and requests their ctids. Modifications are then subsequently made directly using those ctids. This is described in the Limitations section above.

Phase One

We can keep track of the ctid returned by each remote node (we'll even be scanning replicas) and then issue UPDATE or DELETE commands against those ctids (we'll keep track of what ctids came from which node and then only use those ctids with that node). This fallback (as suboptimal as it is) will still be necessary to support queries with qualifications or join conditions that cannot be evaluated remotely.

Phase Two

If a given query has no "local qualifications", we can just skip the scan/modify route and issue a UPDATE or DELETE directly against remote nodes during the scan phase. The modify phase would then be a no-op. We'll need to augment deparse.c to deparse UPDATE and DELETE nodes (I think), but other than that this will act just like a foreign scan but with an UPDATE or DELETE clause instead.

Correctness

Right now UPDATE and DELETE both support RETURNING clauses. We'll want to be sure to only get RETURNING rows from a single replica rather than returning rows from all replicas. Rather than attempting to de-duplicate result rows, we'll just determine (a priori) which replica will provide the RETURNING for a given query (since all replicas should theoretically have identical output).