Replies: 3 comments 11 replies
-
This looks great @chubei.
|
Beta Was this translation helpful? Give feedback.
3 replies
-
Followings are my thoughts:
|
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What makes a connector
Validate Connection
During migration and before starting ingestion,
validate_connection
is called. Connector should test the connection and return theResult
. Connector may perform extra validation steps such as whether a needed feature of the external database is available.List Schemas
Lists all schemas available from a connector.
Validate/Get Schemas
In
validate_schemas
method, the connector should fetch the actual source schema from database and then check:In
get_schemas
method, the connector should map the external database schema to Dozer schema.Replication changes tracking types
Replication changes tracking type is defined based on schema and primary key/unique key availability.
Start
Once started, a connector should begin to output messages to the
ingestor
that was passed instart
, until there is no more data to output.start
should never return unless there's an error.The output message is
((u64, u64), IngestionMessage)
. The(u64, u64)
pair is the identifier of a message, which should be unique and monotically increasing for every message it sent. There're more contracts that must be respected between different runs of a connector regarding this message identifier, which is discussed in the Checkpointing section.IngestionMessage
is an enum, the variants areOperationEvent
, which contains aInsert
,Delete
orUpdate
operation, andSnapshottingDone
, which is used to indicate that snapshotting is done. See Checkpointing section for information on snapshotting.For
Delete
andUpdate
operations, the connector needs to output the old record before update or deletion. OnlyPK connectors can fill the old record withNull
s except for the fields that are part of the primary key (see Schema section).last_checkpoint
parameter is related to checkpointing and is discussed in Checkpointing section.Checkpointing
Dozer use the checkpointing specification of a connector to decide what to do when it's stopped and restarted. Dozer considers a connector to be fully defined by its configuration. As long as the connector configuration doesn't change from last run, Dozer treats the connector as unchanged and expects certain checkpointing guarantees from the connector.
A connector must report its checkpointing capacity as one of the following
Full Checkpointing
A Full Checkpointing connector outputs operations from the external database in two phases: Snapshotting and Streaming.
In the Snapshotting phase, the connector creates a snapshot of the external database and outputs
Insert
operations. The snapshot phase may contain zero operations. When the snapshot finishes, the connector sends aSnapshottingDone
message, whose identifier is the snapshot identifier.In the Streaming phase maintains a unique mapping from message identifers to operations (the snapshot identifier should also be tracked).
Dozer may choose to start the connector from the snapshot identifier, from any message identifier in the Streaming phase, or start a new snapshot, indicated by the
last_checkpoint
parameter ofstart
. Dozer will never ask the connector to start from a message identifier in the Snapshotting phase.When asked to start a new snapshot, the connector should begin the snapshotting phase all over again. The connector may abandon any previous message identifiers. Dozer will not try to use them.
When started from the snapshot identifier, the connector should output all operations as if the Streaming phase is started fresh. The connector may abandon any message identifiers that're greater than the snapshot identifier and reuse them.
When started from a message identifier in the Streaming phase, the connector should output operations from corresponding operation, exclusively. The connector may abandon any message identifiers that're greater than said identifier and reuse them.
If a connector is successfully started from the specified message identifier, Dozer guarantees that the API endpoint is always consistent with the external database (up to the streaming delay).
If starting from the specified message identifier cannot be done, Dozer will ask the connector to start a new snapshot. In this case, Dozer API endpoint stays at its last state until the new snapshotting is done.
Full Checkpointing connector example
For example, a Full Checkpointing connector can output following messages in Snapshotting phase:
And following messages in Streaming phase:
When started from
(1, 0)
, the connector should output the following (whole Streaming phase):When started from
(2, 1)
, the connector should output the following:None Checkpointing
A None Checkpointing connector doesn't keep track of the message identifiers. Between restarts, it only needs to guarantee that the message identifiers are monotically increasing. This kind of connector should only send
Insert
operations. With None Checkpointing connector, Dozer makes no guarantee of consistency between API endpoint and external database. However, operations that're ACKed by Dozer are never lost. See Dozer ACK section.Dozer ACK
Dozer periodically sends ACKs to connector to acknowledge the successful processing of an operation sent from the connector. The ACK frequency can be configured, minimum being 1, meaning that every operation will be ACKed. Do note that high ACK frequency implies lower processing throughput.
Also note that ACK of an operation doens't mean it immediately shows up in API queries. There's the streaming delay between the two events.
Code
!
is experimental. We'll use()
in actual code.ReplicationChangesTrackingType
(or part of it) will become part ofSchema
in future.Beta Was this translation helpful? Give feedback.
All reactions