Attack Plan

`PostgresFDW`-based Topsie

After evaluating the extensive (and almost feature-complete) work provided in postgres_fdw, I have a clear understanding of what it does (and doesn't) provide in terms of a platform for a well-written "real-time inserts" extension.

Current Feature Set

postgres_fdw's current workflow is complete and correct. In particular:

Sessions are wrapped in a topmost remote transaction with at least REPEATABLE READ semantics. This is done transparently to the end user
SAVEPOINTs are created remotely any time a new connection-using command is issued
(Sub-)transaction callbacks are implemented to correctly roll back remote (sub-)transactions during local failures
Query conditions are partitioned based on whether they can be safely executed remotely. If not, they are evaluated locally on returned results
RETURNING clauses are correctly implemented
Prepared statements and cursors are used for all remote operations
The (sub-)transaction callbacks clean up all prepared statements and cursors on error

Limitations

Some areas exist for improvement.

ANALYZE currently sends the entire table back to the local machine rather than using commands to advance the cursor when needing to skip sampled rows
UPDATE and DELETE behave kind of peculiarly: they participate in a ForeignScan similarly to SELECT but also grab locks (FOR UPDATE) on rows they will touch. postgres_fdw secretly adds ctid to the remote target list and uses it to issue subsequent UPDATE or DELETE commands.

The rationale behind the second (apparently inefficient) limitation is related to the support for qualifications that cannot be evaluated remotely: by scanning and filtering locally for a set of ctids, postgres_fdw supports all queries.

A comment in the code points out that if no qualifications (or join conditions) must be executed locally, then issuing an UPDATE/DELETE directly during the foreign scan phase (and doing nothing during the modify phase) would be preferable. This is a change we'll want to make.

Work

Here's what I'm doing…

Configuration Schema

A configuration schema is needed for nodes, shards, placements, etc. Some of this can live in the OPTIONS of the foreign table or server, but most should live in tables. This is complete.

Adapter Views

The above schema needs to be compatible with Citus' own representation of nodes, placements, etc., so I've written a set of VIEWs that can adapt the Citus metadata to what the extension expects. This is also complete, but may need some modification during testing with Citus.

Hash Function

Obviously we need a good hash function to shard our data. I've written one similar to hash_array that provides identical output to Citus' own hash function. hash_array provides a natural way to calculate multi-column partition hashes so we'll want its structure eventually.

Routing Algorithms

For each of the configuration table rows, structures and algorithms will be needed to support looking up where a row lives, where it should live, where a shard's replicas live, etc. These are underway for review this week.

`INSERT` Support

The postgres_fdw functions need modification to check the partition column and send rows to the correct machines (including duplicates to replicas). This is pretty straightforward once the above structures and algorithms exist: instead of the current assumption that a foreign table points to exactly one remote machine, we add loops to send rows to many machines. So some function signatures will change and whatnot but the core logic remains identical.

`SELECT` Support

This also is pretty straightforward: instead of opening one cursor on a remote machine, we open one cursor per shard needed for a given query. This might be a fixed number of shards (if pruning is implemented) or all shards. We'll iterate each query until all remote tables have been exhausted.

So there's a bit of bookkeeping that needs to happen (i.e. removing connections from our set of queried machines as we hit end of stream), but the logic again remains mostly the same.

`UPDATE`/`DELETE` Support

This is a bit trickier. Currently postgres_fdw scans remote tables with a SELECT FOR UPDATE and requests their ctids. Modifications are then subsequently made directly using those ctids. This is described in the Limitations section above.

Phase One

We can keep track of the ctid returned by each remote node (we'll even be scanning replicas) and then issue UPDATE or DELETE commands against those ctids (we'll keep track of what ctids came from which node and then only use those ctids with that node). This fallback (as suboptimal as it is) will still be necessary to support queries with qualifications or join conditions that cannot be evaluated remotely.

Phase Two

If a given query has no "local qualifications", we can just skip the scan/modify route and issue a UPDATE or DELETE directly against remote nodes during the scan phase. The modify phase would then be a no-op. We'll need to augment deparse.c to deparse UPDATE and DELETE nodes (I think), but other than that this will act just like a foreign scan but with an UPDATE or DELETE clause instead.

Correctness

Right now UPDATE and DELETE both support RETURNING clauses. We'll want to be sure to only get RETURNING rows from a single replica rather than returning rows from all replicas. Rather than attempting to de-duplicate result rows, we'll just determine (a priori) which replica will provide the RETURNING for a given query (since all replicas should theoretically have identical output).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attack Plan

`PostgresFDW`-based Topsie

Current Feature Set

Limitations

Work

Configuration Schema

Adapter Views

Hash Function

Routing Algorithms

`INSERT` Support

`SELECT` Support

`UPDATE`/`DELETE` Support

Phase One

Phase Two

Correctness

Clone this wiki locally

Attack Plan

PostgresFDW-based Topsie

Current Feature Set

Limitations

Work

Configuration Schema

Adapter Views

Hash Function

Routing Algorithms

INSERT Support

SELECT Support

UPDATE/DELETE Support

Phase One

Phase Two

Correctness

Clone this wiki locally

`PostgresFDW`-based Topsie

`INSERT` Support

`SELECT` Support

`UPDATE`/`DELETE` Support