-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: allow easy upgrading #97311
Comments
cc @cockroachdb/test-eng |
We could use [1] which has all the boilerplate to orchestrate a self-update. However, it uses github releases to pull the latest binary. We could just publish them (from CI) to the archived repo [2] and apply a ring buffer, since we don't need to keep every nightly there; say last 7 nightly builds? [1] https://github.com/rhysd/go-github-selfupdate/ |
In cockroachdb#98076, we started validating hostnames before running any commands to avoid situations where a stale cache could lead to unintended interference with other clusters due to public IP reuse. The check relies on the VM's `hostname` matching the expected cluster name in the cache. GCP and Azure clusters set the hostname to the instance name by default, but that is not the case for AWS; the aforementioned PR explicitly sets the hostname when the instance is created. However, in the case of long running AWS clusters (created before host validation was introduced) or clusters that are created with an outdated version of `roachprod`, the hostname will still be the default AWS hostname, and any interaction with that cluster will fail if using a recent `roachprod` version. To remedy this situation, this commit includes: * better error reporting. When we attempt to run a command on an AWS cluster and host validation fails, we display a message to the user explaining that their hostnames may need fixing. * if the user confirms that the cluster still exists (by running `roachprod list`), they are able to automatically fix the hostnames to the expected value by running a new `fix-long-running-aws-hostnames` command. This is a temporary workaround that should be removed once we no longer have clusters that would be affected by this issue. This commit will be reverted once we no longer have clusters created with the default hostnames; this will be easier to achieve once we have an easy way for everyone to upgrade their `roachprod` (see cockroachdb#97311). Epic: none Release note: None
98682: roachprod: provide workaround for long-running AWS clusters r=srosenberg a=renatolabs In #98076, we started validating hostnames before running any commands to avoid situations where a stale cache could lead to unintended interference with other clusters due to public IP reuse. The check relies on the VM's `hostname` matching the expected cluster name in the cache. GCP and Azure clusters set the hostname to the instance name by default, but that is not the case for AWS; the aforementioned PR explicitly sets the hostname when the instance is created. However, in the case of long running AWS clusters (created before host validation was introduced) or clusters that are created with an outdated version of `roachprod`, the hostname will still be the default AWS hostname, and any interaction with that cluster will fail if using a recent `roachprod` version. To remedy this situation, this commit includes: * better error reporting. When we attempt to run a command on an AWS cluster and host validation fails, we display a message to the user explaining that their hostnames may need fixing. * if the user confirms that the cluster still exists (by running `roachprod list`), they are able to automatically fix the hostnames to the expected value by running a new `fix-long-running-aws-hostnames` command. This is a temporary workaround that should be removed once we no longer have clusters that would be affected by this issue. This commit will be reverted once we no longer have clusters created with the default hostnames; this will be easier to achieve once we have an easy way for everyone to upgrade their `roachprod` (see #97311). Epic: none Release note: None 98717: tree: fix tuple encoding performance regression r=mgartner a=mgartner This commit fixes a performance regression in pgwire encoding of tuples introduced in #95009. Informs #98306 Epic: None Release note: None Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>
103307: roachprod: add command to self update the roachprod binary r=srosenberg,rail,herkolategan a=smg260 This PR contains 2 ways to get the latest `roachprod`. 1. `update` command as part of roachprod itself: \ \ `roachprod update` will check and download the latest binary for the current platform, and optionally update the running roachprod. \ This uses the [TeamCity REST API](https://www.jetbrains.com/help/teamcity/rest/teamcity-rest-api-documentation.html) with guest authentication to find the latest successful build (on `master`) from which to download the binary. \ When proceeding with an update, the existing binary is renamed with a `.bak` prefix so that it may be reverted, either manually or via `roachprod update --revert`. Permissions are copied from the existing binary. 2. `scripts/roachprod-get-latest.sh` shell script \ \ Downloads the latest roachprod binary from TeamCity to the specified (or default current) directory. Has basic checks for `curl` and confirming whether to overwrite any existing roachprod. The builds used by both the binary, and the script are [here](https://teamcity.cockroachdb.com/project.html?projectId=Cockroach_Ci_Builds&branch_Cockroach_Ci_Builds=%3Cdefault%3E) Note: - The linter prevents direct use of http.Get, so that meant creating proto files for the TeamCity rest API responses. - The existing `httputil` has no accommodations for unmarshalling a subset of fields in a json response, hence the addition of an `IgnoreUnknownFields` option. - This should work as long as our build remain public Epic: none Fixes: #97311 Release note: None Co-authored-by: Miral Gadani <[email protected]>
A lot of people at Cockroach Labs use
roachprod
to create and manage clusters. However, there's currently no convenient way to upgrade theroachprod
binary other than pulling the latest changes from the cockroach repo and building it locally. This is not ideal for a few reasons: it's easy to forget; it's distracting and can take quite some time; requires users to have all dependencies installed even if they just want to use the binary.We want to allow everyone to easily upgrade
roachprod
so that they can get the latest bug fixes and improvements. Some ideas (discussed in an internal thread):roachprod
homebrew
Update: we already have CI builds of
roachprod
, so we should probably use that. All we need is a convenient way to download the new binaries and replace the old ones.Jira issue: CRDB-24628
Epic: CRDB-10428
The text was updated successfully, but these errors were encountered: