Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: add ingestion job framework #58373

Merged
merged 1 commit into from
Jan 7, 2021

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Dec 30, 2020

This change introduces a new StreamIngestionJob. It does not do much
more than laying out the general outline of the job, which is very
similar to other bulk jobs such as changefeed, backup etc.

More precisely:

  • Introduces StreamIngestionDetails job details proto
  • Hooks up the dependency to a mock stream client
  • Introduces a StreamIngestionProcessorSpec
  • Sets up a simple DistSQL flow which round-robin assigns the partitions
    to the processors.

Most notable TODOs in job land which will be addressed in follow up PRs:

  • StreamIngestionPlanHook to create this job. It Will involve figuring out
    SQL syntax.
  • Introducing a ts watermark in both the job and processors. This watermark will represent the lowest resolved ts which all processors
    have ingested till. Iron out semantics on job start and resumption.
  • Introducing a StreamIngestionFrontier processor which will slurp the
    results from the StreamIngestionProcessors, and use them to keep track
    of the minimum resolved ts across all processors.

Fixes: #57399

Release note: None

@adityamaru adityamaru requested review from dt and pbardea December 30, 2020 15:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

Rough WIP to ensure that the PR doesn't become hard to review. I have left some TODOs in the code and would be happy to receive any comments on them.

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, I left a few small casting nits that rely on changes to the stream client PR. I'll ping here when that is updated for a rebase.

message StreamIngestionDetails {
// StreamAddress is the location of the stream which the ingestion job will
// read from.
string stream_address = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

casttype to a streamclient.StreamAddress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -134,6 +134,10 @@ message ReadImportDataSpec {
// NEXTID: 16
}

message StreamIngestionDataSpec {
map<int32,string> partition_address = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also add [(gogoproto.castvalue) = "github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient.PartitionAddress"]; on this line to get this map to be typed a bit stricter. I also think we can castkey to a PartitionID type that I can add to the streamclient package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to a repeated field as that is what the ingestion processor PR changed it to.

@adityamaru adityamaru changed the title [WIP] streamingccl: add ingestion job framework streamingccl: add ingestion job framework Jan 7, 2021
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, and @pbardea)


pkg/ccl/streamingccl/streamclient/mock_stream_client.go, line 19 at r4 (raw file):

// NewMockStreamClient returns a new mock stream client.
func NewMockStreamClient() *MockStreamClient {

What do you think of renaming this to something like just client and making it the real stream client that just isn't yet implemented? (I also don't think it needs to be exported?)
Then this can just return the interface.

This change introduces a new StreamIngestionJob. It does not do much
more than laying out the general outline of the job, which is very
similar to other bulk jobs such as changefeed, backup etc.

More precisely:
- Introduces StreamIngestionDetails job details proto
- Hooks up the dependancy to a mock stream client
- Introduces a StreamIngestionProcessorSpec
- Sets up a simple DistSQL flow which round robin assigns the partitions
  to the processors.

Most notable TODOs in job land which will be addressed in follow up PRs:
- StreamIngestionPlanHook to create this job. Will involve figuring out
  SQL syntax.
- Introducing a ts watermark in both the job and processors. This
  watermark will represent the lowest resolved ts which all processors
have ingested till. Iron out semantics on job start and resumption.
- Introducing a StreamIngestionFrontier processor which will slurp the
  results from the StreamIngestionProcessors, and use them to keep track
of the minimum resolved ts across all processors.

Release note: None
@adityamaru
Copy link
Contributor Author

TFTR!

bors r=pbardea

@craig
Copy link
Contributor

craig bot commented Jan 7, 2021

Build failed:

@adityamaru
Copy link
Contributor Author

looks like a flake in acceptance/gossip/peerings.

bors r=pbardea

@craig
Copy link
Contributor

craig bot commented Jan 7, 2021

Build succeeded:

@craig craig bot merged commit d8b5cb0 into cockroachdb:master Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

streaming: create stream ingestion job
3 participants