Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: guard calls to SetupSSH #96794

Merged
merged 1 commit into from
Feb 9, 2023

Conversation

herkolategan
Copy link
Collaborator

@herkolategan herkolategan commented Feb 8, 2023

This change ensures that calls to SetupSSH do not run concurrently across
processes or threads. Overlapping calls are not safe and can lead to invalid SSH
configurations. The scenario takes place when multiple clusters are created
simultaneously from the same or multiple processes.

Resolves: #90092

Release note: None

@herkolategan herkolategan requested a review from a team as a code owner February 8, 2023 13:11
@herkolategan herkolategan requested review from srosenberg and smg260 and removed request for a team February 8, 2023 13:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@herkolategan herkolategan force-pushed the hbl/roachdprod-ssh-gate branch from 560cb60 to 74b5270 Compare February 8, 2023 13:22
@herkolategan herkolategan changed the title roachprod: guard calls to gcloud config-ssh roachprod: guard calls to gcloud compute config-ssh Feb 8, 2023
Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/roachprod/vm/gce/gcloud.go line 654 at r1 (raw file):

// ConfigSSH is part of the vm.Provider interface
func (p *Provider) ConfigSSH(zones []string) error {
	configSSHMu.Lock()

Would this make more sense at the roachprod level? Only having locking for GCP will still result in potential write contention to the config file from other providers.

Conceptually it makes sense to have only one process modifying ssh config.

Also if there is ever a case where we have multiple roachtests running from the same machine (I don't think we can), this would not work.

Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


pkg/roachprod/vm/gce/gcloud.go line 654 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Would this make more sense at the roachprod level? Only having locking for GCP will still result in potential write contention to the config file from other providers.

Conceptually it makes sense to have only one process modifying ssh config.

Also if there is ever a case where we have multiple roachtests running from the same machine (I don't think we can), this would not work.

I was dubious at doing it on aroachprod level as each provider might have different ways of providing access. AWS seems to send a key rather than modifying the shared local resource. But then again there might be little harm in a blanket lock one level up? It's true if you run multiple roachprod instances the issue would persist, I'll look into a system-wide lock (maybe a file based lock).

Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


pkg/roachprod/vm/gce/gcloud.go line 654 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

I was dubious at doing it on aroachprod level as each provider might have different ways of providing access. AWS seems to send a key rather than modifying the shared local resource. But then again there might be little harm in a blanket lock one level up? It's true if you run multiple roachprod instances the issue would persist, I'll look into a system-wide lock (maybe a file based lock).

Right so Sync which ultimately calls SetupSSH already does a file lock for the sync part. I'll re-use that logic and add one for the call to SetupSSH.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/roachprod/vm/gce/gcloud.go line 654 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Right so Sync which ultimately calls SetupSSH already does a file lock for the sync part. I'll re-use that logic and add one for the call to SetupSSH.

I think the issue this PR addresses only arises because roachtest can attempt to create multiple clusters simultaneously (via multiple roachprod calls).

Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


pkg/roachprod/vm/gce/gcloud.go line 654 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

I think the issue this PR addresses only arises because roachtest can attempt to create multiple clusters simultaneously (via multiple roachprod calls).

True, but I agree with your statement around multiple processes still being at risk. In fact CleanSSH was already covered by the system-wide or (user user-wide rather) lock-file. Same just had to be done for SetupSSH.

This change ensures that calls to `SetupSSH` do not run concurrently across
processes or threads. Overlapping calls are not safe and can lead to invalid SSH
configurations. The scenario takes place when multiple clusters are created
simultaneously from the same or multiple processes.

Resolves:  cockroachdb#90092

Release note: None
@herkolategan herkolategan force-pushed the hbl/roachdprod-ssh-gate branch from 74b5270 to 87510fd Compare February 8, 2023 16:08
@herkolategan herkolategan changed the title roachprod: guard calls to gcloud compute config-ssh roachprod: guard calls to SetupSSH Feb 8, 2023
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @srosenberg)

@herkolategan
Copy link
Collaborator Author

bors r=renatolabs,smg260

@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build failed:

@herkolategan
Copy link
Collaborator Author

bors retry

@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachprod: remove concurrency in SetupSSH
4 participants