Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: allow easy upgrading #97311

Closed
renatolabs opened this issue Feb 17, 2023 · 2 comments · Fixed by #103307
Closed

roachprod: allow easy upgrading #97311

renatolabs opened this issue Feb 17, 2023 · 2 comments · Fixed by #103307
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team

Comments

@renatolabs
Copy link
Contributor

renatolabs commented Feb 17, 2023

A lot of people at Cockroach Labs use roachprod to create and manage clusters. However, there's currently no convenient way to upgrade the roachprod binary other than pulling the latest changes from the cockroach repo and building it locally. This is not ideal for a few reasons: it's easy to forget; it's distracting and can take quite some time; requires users to have all dependencies installed even if they just want to use the binary.

We want to allow everyone to easily upgrade roachprod so that they can get the latest bug fixes and improvements. Some ideas (discussed in an internal thread):

  • Manage releases for roachprod
  • Make it available through homebrew
  • Build nightly binaries available somewhere (GCS) + a convenient way to download the latest binary.

Update: we already have CI builds of roachprod, so we should probably use that. All we need is a convenient way to download the new binaries and replace the old ones.

Jira issue: CRDB-24628
Epic: CRDB-10428

@renatolabs renatolabs added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team labels Feb 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented Feb 17, 2023

cc @cockroachdb/test-eng

@srosenberg
Copy link
Member

We could use [1] which has all the boilerplate to orchestrate a self-update. However, it uses github releases to pull the latest binary. We could just publish them (from CI) to the archived repo [2] and apply a ring buffer, since we don't need to keep every nightly there; say last 7 nightly builds?

[1] https://github.com/rhysd/go-github-selfupdate/
[2] https://github.com/cockroachdb/roachprod

renatolabs added a commit to renatolabs/cockroach that referenced this issue Mar 15, 2023
In cockroachdb#98076, we started validating hostnames before running any commands
to avoid situations where a stale cache could lead to unintended
interference with other clusters due to public IP reuse. The check
relies on the VM's `hostname` matching the expected cluster name in
the cache. GCP and Azure clusters set the hostname to the instance
name by default, but that is not the case for AWS; the aforementioned
PR explicitly sets the hostname when the instance is created.

However, in the case of long running AWS clusters (created before host
validation was introduced) or clusters that are created with an
outdated version of `roachprod`, the hostname will still be the
default AWS hostname, and any interaction with that cluster will fail
if using a recent `roachprod` version. To remedy this situation, this
commit includes:

* better error reporting. When we attempt to run a command on an AWS
cluster and host validation fails, we display a message to the user
explaining that their hostnames may need fixing.

* if the user confirms that the cluster still exists (by running
`roachprod list`), they are able to automatically fix the hostnames to
the expected value by running a new `fix-long-running-aws-hostnames`
command. This is a temporary workaround that should be removed once we
no longer have clusters that would be affected by this issue.

This commit will be reverted once we no longer have clusters created
with the default hostnames; this will be easier to achieve once we
have an easy way for everyone to upgrade their `roachprod` (see cockroachdb#97311).

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Mar 16, 2023
98682: roachprod: provide workaround for long-running AWS clusters r=srosenberg a=renatolabs

In #98076, we started validating hostnames before running any commands to avoid situations where a stale cache could lead to unintended interference with other clusters due to public IP reuse. The check relies on the VM's `hostname` matching the expected cluster name in the cache. GCP and Azure clusters set the hostname to the instance name by default, but that is not the case for AWS; the aforementioned PR explicitly sets the hostname when the instance is created.

However, in the case of long running AWS clusters (created before host validation was introduced) or clusters that are created with an outdated version of `roachprod`, the hostname will still be the default AWS hostname, and any interaction with that cluster will fail if using a recent `roachprod` version. To remedy this situation, this commit includes:

* better error reporting. When we attempt to run a command on an AWS cluster and host validation fails, we display a message to the user explaining that their hostnames may need fixing.

* if the user confirms that the cluster still exists (by running `roachprod list`), they are able to automatically fix the hostnames to the expected value by running a new `fix-long-running-aws-hostnames` command. This is a temporary workaround that should be removed once we no longer have clusters that would be affected by this issue.

This commit will be reverted once we no longer have clusters created with the default hostnames; this will be easier to achieve once we have an easy way for everyone to upgrade their `roachprod` (see #97311).

Epic: none

Release note: None

98717: tree: fix tuple encoding performance regression r=mgartner a=mgartner

This commit fixes a performance regression in pgwire encoding of tuples
introduced in #95009.

Informs #98306

Epic: None

Release note: None

Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
craig bot pushed a commit that referenced this issue Jun 1, 2023
103307: roachprod: add command to self update the roachprod binary r=srosenberg,rail,herkolategan a=smg260

This PR contains 2 ways to get the latest `roachprod`.

1. `update` command as part of roachprod itself: \
\
`roachprod update` will check and download the latest binary for the current platform, and optionally update the running roachprod.
\
This uses the [TeamCity REST API](https://www.jetbrains.com/help/teamcity/rest/teamcity-rest-api-documentation.html) with guest authentication to find the latest successful build (on `master`) from which to download the binary.
\
When proceeding with an update, the existing binary is renamed with a `.bak` prefix so that it may be reverted, either manually or via `roachprod update --revert`. Permissions are copied from the existing binary.

2. `scripts/roachprod-get-latest.sh` shell script \
\
Downloads the latest roachprod binary from TeamCity to the specified (or default current) directory. Has basic checks for `curl` and confirming whether to overwrite any existing roachprod.

The builds used by both the binary, and the script are [here](https://teamcity.cockroachdb.com/project.html?projectId=Cockroach_Ci_Builds&branch_Cockroach_Ci_Builds=%3Cdefault%3E)


Note:  
- The linter prevents direct use of http.Get, so that meant creating proto files for the TeamCity rest API responses.
- The existing `httputil` has no accommodations for unmarshalling a subset of fields in a json response, hence the addition of an `IgnoreUnknownFields` option.
- This should work as long as our build remain public

Epic: none
Fixes: #97311

Release note: None

Co-authored-by: Miral Gadani <[email protected]>
@craig craig bot closed this as completed in 4edb01b Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants