Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roachtest azure nightly #111704

Merged
merged 5 commits into from
Oct 5, 2023
Merged

Conversation

smg260
Copy link
Contributor

@smg260 smg260 commented Oct 4, 2023

These are a series of commits to enable roachtests to run on Azure in TeamCity.

  1. Add the relevant teamcity invoke script
  2. Update authentication to look in CLI or environment for dev and TC respectively
  3. Look for a default subscription in the environment, with fallback to existing "pick first" implementation
  4. Add a security rule to allow roachtest host machine to connect to a vm via kafka admin
  5. Update azure default location to one with more quota and apt-get update before installing go for a cdc test (failed on azure)

A follow up PR will enable an initial set of roachtests to run.

Epic: CC-25185
Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@smg260 smg260 force-pushed the roachtest_azure_nightly branch 3 times, most recently from 24cdbbf to 715fee8 Compare October 4, 2023 13:04
@smg260 smg260 marked this pull request as ready for review October 4, 2023 13:47
@smg260 smg260 requested review from a team as code owners October 4, 2023 13:47
@smg260 smg260 requested review from herkolategan and srosenberg and removed request for a team October 4, 2023 13:47
Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was looking for a TC run and found one :) https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyAzureBazel

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


build/teamcity/util/roachtest_util.sh line 82 at r2 (raw file):

  azure)
    if [ -z "${FILTER}" ]; then
      # Soon to go away with Radu's tag changes.

Super Nit: A TODO tag could make this more visible, with a PR/issue ref. But sounds like it's really soon so might not matter to keep track.


pkg/roachprod/vm/azure/auth.go line 55 at r2 (raw file):

		p.mu.Unlock()
	} else {
		err = errors.Wrap(err, "could got get Azure auth token")

Nit: not part of this PR, but this comment has a typo.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


pkg/roachprod/vm/azure/azure.go line 56 at r2 (raw file):

	const cliErr = "please install the Azure CLI utilities " +
		"(https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)"
	const authErr = "please use `az login` to login to Azure"

Nit: This error could be confusing if the env vars are set but wrong, I think?

@smg260 smg260 force-pushed the roachtest_azure_nightly branch from 715fee8 to 17478f9 Compare October 4, 2023 16:54
Copy link
Contributor Author

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


build/teamcity/util/roachtest_util.sh line 82 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Super Nit: A TODO tag could make this more visible, with a PR/issue ref. But sounds like it's really soon so might not matter to keep track.

It's consistent with the other branches for AWS and GCE, so its just a comment to be taken at face value. It will go away at the same time as the others.


pkg/roachprod/vm/azure/azure.go line 56 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Nit: This error could be confusing if the env vars are set but wrong, I think?

Thanks. Made a more generic message.

Miral Gadani added 5 commits October 4, 2023 13:21
Epic: CC-25185
Release note: none
This PR ensures that Azure SDK is granted access in
either 1 of 2 ways.

1. If the required environment variables are present,
then, `NewAuthorizerFromEnvironment` is used.
2. Else, `az`, the CLI tool must be installed and
authenticated and `NewAuthorizerFromCLI` is used.

The former is used by TeamCity, the latter is likely
to be used my developers.
Previously, the subscription id associated with all
the SDK requests defaulted to using the first one when
listing all of them. This does not allow us to
query against a particular one.

This PR introduces code to check for an environment
variable of name `AZURE_SUBSCRIPTION_ID` which, if
present is used. Otherwise, we fall back to selecting
the first one.

Additionally, we just memoize the simple string
representation of the subscriptionId instead of the
struct.

Epic: CC-25185
Release note: none
Epic: CC-25185
Release note: none
roachtest: apt update before go install

Update default location to `eastus`, since there
is more quota there right now.

Also, `apt-get update` before installing go (cdc)

Epic: CC-25185
Release note: none
@smg260 smg260 force-pushed the roachtest_azure_nightly branch from 17478f9 to e7dd3ec Compare October 4, 2023 17:21
if len(page.Values()) == 0 {
err = errors.New("did not find Azure subscription")
return sub, err
// Fallback to retrieving the first subscription
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what order the subscriptions come back in from sc.List(ctx) below? Can we guarantee that the first subscription will be a subscription where a roachprod cluster should be allowed to run in?

I'm asking these questions because previously there were few subscriptions in Azure and few that people had access to. Now, there are many more subscriptions and people will generally have access to multiple. Also, soon the revenue and product divisions will be running their roachprod clusters in different subscriptions from engineering.

How about an approach like the following instead?

  • If the user specifies an exact subscription to use, use that subscription. If that subscription can't be found, error out (to ensure the cluster doesn't get created in the wrong subscription).
  • If the user doesn't set a subscription, try to use a standard set of adhoc subscriptions and use the first one that is found to be accessible to the user. If non exist / are accessible for the user, then it errors. The initial list of subscriptions would be these in this order: e2e-adhoc, Microsoft Azure Sponsorship.

Copy link
Contributor Author

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jlinder and @srosenberg)


pkg/roachprod/vm/azure/azure.go line 1517 at r4 (raw file):

Previously, jlinder (James H. Linder) wrote…

Do we know what order the subscriptions come back in from sc.List(ctx) below? Can we guarantee that the first subscription will be a subscription where a roachprod cluster should be allowed to run in?

I'm asking these questions because previously there were few subscriptions in Azure and few that people had access to. Now, there are many more subscriptions and people will generally have access to multiple. Also, soon the revenue and product divisions will be running their roachprod clusters in different subscriptions from engineering.

How about an approach like the following instead?

  • If the user specifies an exact subscription to use, use that subscription. If that subscription can't be found, error out (to ensure the cluster doesn't get created in the wrong subscription).
  • If the user doesn't set a subscription, try to use a standard set of adhoc subscriptions and use the first one that is found to be accessible to the user. If non exist / are accessible for the user, then it errors. The initial list of subscriptions would be these in this order: e2e-adhoc, Microsoft Azure Sponsorship.

Can't guarantee order, as they could be different per person, nor that what is returned is usable by roachprod.

I don't like returning the first one blindly either, but it is just preserving the existing behaviour. I'd be ok to remove that.

The first bullet is covered because the code does look for a subscription in the environment, and in the case that its invalid, API calls error out.

The second bullet can be implemented but

  • it ties roachprod to CRL's infrastructure, unless we store in config somewhere
  • just because we can authenticate against a subscription, does not mean we will have the required permissions to do anything there, so "accessible to user" is prob not enough
  • Minor but ideally we'd use the subscription ID over the display name.

If the concern here mainly for users of roachprod (i.e. not TeamCity), I could add a higher priority fallback which invokes az in an attempt to get the default subscription ID, only if CLI auth is used? This can be set when installing the CLI and would be a reasonable fallback.

@healthy-pod
Copy link
Contributor

I talked to James and we created this issue (because we can do this later and it shouldn't block the PR).

Copy link
Collaborator

@jlinder jlinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @smg260 and @srosenberg)


pkg/roachprod/vm/azure/azure.go line 1517 at r4 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Can't guarantee order, as they could be different per person, nor that what is returned is usable by roachprod.

I don't like returning the first one blindly either, but it is just preserving the existing behaviour. I'd be ok to remove that.

The first bullet is covered because the code does look for a subscription in the environment, and in the case that its invalid, API calls error out.

The second bullet can be implemented but

  • it ties roachprod to CRL's infrastructure, unless we store in config somewhere
  • just because we can authenticate against a subscription, does not mean we will have the required permissions to do anything there, so "accessible to user" is prob not enough
  • Minor but ideally we'd use the subscription ID over the display name.

If the concern here mainly for users of roachprod (i.e. not TeamCity), I could add a higher priority fallback which invokes az in an attempt to get the default subscription ID, only if CLI auth is used? This can be set when installing the CLI and would be a reasonable fallback.

👍 re: the first bullet being covered.

On the second bullet:

Agreed with Ahmad that this doesn't need to be fixed in this PR. I added a few more thoughts to the issue he linked.

@smg260
Copy link
Contributor Author

smg260 commented Oct 4, 2023

TFTR!

bors r=healthy-pod,herkolategan

@craig
Copy link
Contributor

craig bot commented Oct 5, 2023

Build succeeded:

@craig craig bot merged commit 2cd2d8c into cockroachdb:master Oct 5, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants