Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: attempt to plan on alternate nodes if the original address is unavailable #84009

Closed
samiskin opened this issue Jul 7, 2022 · 1 comment · Fixed by #84445
Closed
Assignees
Labels
A-tenant-streaming Including cluster streaming C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone

Comments

@samiskin
Copy link
Contributor

samiskin commented Jul 7, 2022

If we're retrying and have an old topology, but the original stream address we used for planing is down, we should attempt to connect to alternate nodes to plan and be able to make progress.

Jira issue: CRDB-17403
Epic CRDB-10147

@samiskin samiskin added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 7, 2022
@livlobo livlobo added the A-tenant-streaming Including cluster streaming label Jul 7, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 7, 2022

cc @cockroachdb/tenant-streaming

@shermanCRL shermanCRL added this to the 22.2 milestone Jul 7, 2022
craig bot pushed a commit that referenced this issue Jul 29, 2022
84445: streamingccl: use any topology address in job/frontier client connection r=samiskin a=samiskin

Resolves #84009 

Previously the ingestion job would rely on the provided StreamAddress to obtain
a plan, which is what has to happen when the stream is first created, however
after getting back a topology there are now many more potential addresses to
connect to.

This change makes the ingestion job and the frontier attempt to iterate through
all stream addresses in the topology when attempting to connect to a client.

The topology was also added to the ingestion job progress in order to make this
work, which also has the added observability benefit of the user being able to
accomplish tasks such as checking which nodes have spans which are lagging in
the checkpointed frontier.

Release note (bug fix): ingestion job is now able to fall back to alternate nodes in
its topology if the original streamaddress is unavailable.

Co-authored-by: Shiranka Miskin <[email protected]>
@craig craig bot closed this as completed in 774f779 Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tenant-streaming Including cluster streaming C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants