Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
77329: kvserver: self delegated snapshots r=amygao9 a=amygao9

This commit adds a new rpc stream for sending raft message requests between
replicas which allows for delegating snapshots. Currently this patch implements
the leaseholder delegating to itself, but in future patches the leaseholder
will be able to delegate snapshot sending to follower replicas. A new raft
message request type of `DelegatedSnapshotRequest` includes a header of
nessesary fields from a `SnapshotRequest` and the replica descriptor of the new
sender replica. This allows the leaseholder to fill in snapshot metadata before
delegating to the new sender store to generate the snapshot and transmit it to
the recipient.

Related to #42491
Release note: None

79968: backupccl: AWS AssumeRole support for BulkIO operations r=DarrylWong a=DarrylWong

Before, the only way for users to authenticate with AWS
was by implicit authentication or specified with an Access
Key and Secret.

Now, users can authenticate by passing in an AssumeRole ARN with
the approriate permissions. This change encompasses AWS KMS
encryption and backup/restore to a S3 bucket.

Release note (enterprise): Users can now authenticate to AWS by passing
in the argument AWS_ROLE_ARN=<role-ARN>

80274: metric: Add new alerting rules from docs r=tbg a=rimadeodhar

Add alerting rules documented in our customer facing docs
to the new alerting and aggregation rules endpoint. This
endpoint is intended to be used as a guideline for configuring
alerts and aggregation rules for our customers. The alerts
outlined in the doc were added before this endpoint was created.
Now that we have a more structured way to communicate ways to
alert and aggregate our metrics, we should transition these alerts
to this endpoint and update our documentation to reference the
endpoint.

This PR also fixes a small bug which was causing the details
of the tripped circuit breakers rule to be flipped with the
unavailable ranges rule.

Release note (api change): Update api/v2/rules endpoint to include
alerts defined in our customer facing docs.

Release note (bug fix): Fix the bug which was causing the unavailable
ranges rule details to be flipped with the tripped circuit breakers.


Resolves #78051

80341: sql: rewrite sql stats compaction job to avoid scanning mvcc garbage r=maryliag,mgartner a=Azhng

Related to #79548

Previously, SQL Stats cleanup job's performance could severely degrade 
and cannot keep up with the write load, since its `DELETE` statements
often needed to scan over large range of MVCC garbage.
This commit addresses this issue by further constraining the scans of
the `DELETE` statements to reduce how much MVCC garbage it scans over.

Release note: None

80502: clusterversion: mint 22.1 cluster version r=irfansharif,jlinder a=celiala

This commit will mint a 22.1 cluster version. This will be backported onto both release-22.1/release-22.1.0.

Partial work for #80555

### Minting: what is it and when to do it?
Minting a cluster version indicates the end of one release and the beginning of the next. Minting closes the door on any future version gates or migrations being backported to 21.1; i.e. once we mint the version, we cannot add new migrations/version gates under it without requiring a wipe of any clusters that ran a build including the mint. Because of this, we should only merge this onto the 22.1 release branch once we’re sure we won’t need any more cluster versions. In terms of timing, the release team plans for this to be the last commit backported onto 22.1 immediately before choosing the SHA for an rc.1.


Release justification: Non-production code change.
Release note: None

80654: authors: Add Joe Sankey to AUTHORS r=rafiss a=joesankey

Release note: None

Co-authored-by: Amy Gao <[email protected]>
Co-authored-by: Darryl <[email protected]>
Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: Azhng <[email protected]>
Co-authored-by: Celia La <[email protected]>
Co-authored-by: joesankey <[email protected]>
  • Loading branch information
7 people committed Apr 28, 2022
7 parents 7092ff4 + 5018508 + c584b95 + af221d4 + b01d45e + 01a5722 + 878b697 commit f06da92
Show file tree
Hide file tree
Showing 42 changed files with 1,438 additions and 189 deletions.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@ Jimmy Larsson <[email protected]>
Jincheng Li <[email protected]>
Jingguo Yao <[email protected]>
Joe Harlow <[email protected]>
Joe Sankey <[email protected]>
Joel Kenny <[email protected]>
Joey Pereira <[email protected]> <[email protected]> <[email protected]>
joezxy <[email protected]> <[email protected]>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ dir="$(dirname $(dirname $(dirname $(dirname "${0}"))))"
source "$dir/teamcity-support.sh" # For $root
source "$dir/teamcity-bazel-support.sh" # For run_bazel

BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e LITERAL_ARTIFACTS_DIR=$root/artifacts -e AWS_ACCESS_KEY_ID -e AWS_KMS_KEY_ARN_A -e AWS_KMS_KEY_ARN_B -e AWS_KMS_REGION_A -e AWS_KMS_REGION_B -e AWS_SECRET_ACCESS_KEY -e BUILD_TAG -e BUILD_VCS_NUMBER -e CLOUD -e COCKROACH_DEV_LICENSE -e TESTS -e COUNT -e GITHUB_API_TOKEN -e GITHUB_ORG -e GITHUB_REPO -e GOOGLE_EPHEMERAL_CREDENTIALS -e SLACK_TOKEN -e TC_BUILD_BRANCH -e TC_BUILD_ID -e TC_SERVER_URL" \
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e LITERAL_ARTIFACTS_DIR=$root/artifacts -e AWS_ACCESS_KEY_ID -e AWS_ACCESS_KEY_ID_ASSUME_ROLE -e AWS_KMS_KEY_ARN_A -e AWS_KMS_KEY_ARN_B -e AWS_KMS_REGION_A -e AWS_KMS_REGION_B -e -e AWS_ROLE_ARN AWS_SECRET_ACCESS_KEY -e AWS_SECRET_ACCESS_KEY_ASSUME_ROLE -e BUILD_TAG -e BUILD_VCS_NUMBER -e CLOUD -e COCKROACH_DEV_LICENSE -e TESTS -e COUNT -e GITHUB_API_TOKEN -e GITHUB_ORG -e GITHUB_REPO -e GOOGLE_EPHEMERAL_CREDENTIALS -e SLACK_TOKEN -e TC_BUILD_BRANCH -e TC_BUILD_ID -e TC_SERVER_URL" \
run_bazel build/teamcity/cockroach/nightlies/roachtest_nightly_impl.sh
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -194,4 +194,4 @@ trace.jaeger.agent string the address of a Jaeger agent to receive traces using
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 21.2-112 set the active cluster version in the format '<major>.<minor>'
version version 22.1 set the active cluster version in the format '<major>.<minor>'
3 changes: 2 additions & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
<tr><td><code>kv.replica_circuit_breaker.slow_replication_threshold</code></td><td>duration</td><td><code>1m0s</code></td><td>duration after which slow proposals trip the per-Replica circuit breaker (zero duration disables breakers)</td></tr>
<tr><td><code>kv.replica_stats.addsst_request_size_factor</code></td><td>integer</td><td><code>50000</code></td><td>the divisor that is applied to addsstable request sizes, then recorded in a leaseholders QPS; 0 means all requests are treated as cost 1</td></tr>
<tr><td><code>kv.replication_reports.interval</code></td><td>duration</td><td><code>1m0s</code></td><td>the frequency for generating the replication_constraint_stats, replication_stats_report and replication_critical_localities reports (set to 0 to disable)</td></tr>
<tr><td><code>kv.snapshot_delegation.enabled</code></td><td>boolean</td><td><code>false</code></td><td>set to true to allow snapshots from follower replicas</td></tr>
<tr><td><code>kv.snapshot_rebalance.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for rebalance and upreplication snapshots</td></tr>
<tr><td><code>kv.snapshot_recovery.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for recovery snapshots</td></tr>
<tr><td><code>kv.transaction.max_intents_bytes</code></td><td>integer</td><td><code>4194304</code></td><td>maximum number of bytes used to track locks in transactions</td></tr>
Expand Down Expand Up @@ -210,6 +211,6 @@
<tr><td><code>trace.opentelemetry.collector</code></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.</td></tr>
<tr><td><code>trace.span_registry.enabled</code></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://<ui>/#/debug/tracez</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.2-112</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>22.1</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
1 change: 1 addition & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ ALL_TESTS = [
"//pkg/server/heapprofiler:heapprofiler_test",
"//pkg/server/pgurl:pgurl_test",
"//pkg/server/serverpb:serverpb_test",
"//pkg/server/serverrules:serverrules_test",
"//pkg/server/settingswatcher:settingswatcher_test",
"//pkg/server/status:status_test",
"//pkg/server/systemconfigwatcher:systemconfigwatcher_test",
Expand Down
4 changes: 2 additions & 2 deletions pkg/ccl/logictestccl/testdata/logic_test/crdb_internal_tenant
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ select crdb_internal.get_vmodule()
query T
select regexp_replace(crdb_internal.node_executable_version()::string, '(-\d+)?$', '');
----
21.2
22.1

query ITTT colnames
select node_id, component, field, regexp_replace(regexp_replace(value, '^\d+$', '<port>'), e':\\d+', ':<port>') as value from crdb_internal.node_runtime_info
Expand Down Expand Up @@ -450,7 +450,7 @@ select * from crdb_internal.gossip_alerts
query T
select regexp_replace(crdb_internal.node_executable_version()::string, '(-\d+)?$', '');
----
21.2
22.1

user root

Expand Down
1 change: 1 addition & 0 deletions pkg/cloud/amazon/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ go_library(
"@com_github_aws_aws_sdk_go//aws",
"@com_github_aws_aws_sdk_go//aws/awserr",
"@com_github_aws_aws_sdk_go//aws/credentials",
"@com_github_aws_aws_sdk_go//aws/credentials/stscreds",
"@com_github_aws_aws_sdk_go//aws/session",
"@com_github_aws_aws_sdk_go//service/kms",
"@com_github_aws_aws_sdk_go//service/s3",
Expand Down
33 changes: 33 additions & 0 deletions pkg/cloud/amazon/aws_kms.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/credentials/stscreds"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/kms"
"github.com/cockroachdb/cockroach/pkg/cloud"
Expand All @@ -43,6 +44,7 @@ type kmsURIParams struct {
endpoint string
region string
auth string
roleArn string
}

func resolveKMSURIParams(kmsURI url.URL) kmsURIParams {
Expand All @@ -53,6 +55,7 @@ func resolveKMSURIParams(kmsURI url.URL) kmsURIParams {
endpoint: kmsURI.Query().Get(AWSEndpointParam),
region: kmsURI.Query().Get(KMSRegionParam),
auth: kmsURI.Query().Get(cloud.AuthParam),
roleArn: kmsURI.Query().Get(AWSRoleArnParam),
}

// AWS secrets often contain + characters, which must be escaped when
Expand Down Expand Up @@ -132,6 +135,31 @@ func MakeAWSKMS(uri string, env cloud.KMSEnv) (cloud.KMS, error) {
"implicit credentials disallowed for s3 due to --external-io-implicit-credentials flag")
}
opts.SharedConfigState = session.SharedConfigEnable
case cloud.AuthParamAssume:
if kmsURIParams.roleArn == "" {
return nil, errors.Errorf(
"%s is set to '%s', but %s must be set",
cloud.AuthParam,
cloud.AuthParamAssume,
AWSRoleArnParam,
)
}
// User can specify the account that is assuming the role, or if left
// unspecified, it will be retrieved from the default credentials
// chain.
if (kmsURIParams.accessKey == "") != (kmsURIParams.secret == "") {
return nil, errors.Errorf(
"%s is set to '%s', but %s and %s must both be set for a specified user or neither for implicit",
cloud.AuthParam,
cloud.AuthParamAssume,
AWSAccessKeyParam,
AWSSecretParam,
)
} else if kmsURIParams.accessKey != "" {
// Account that is doing the AssumeRole is specified by the user,
// so pass in the access key and secret when creating the session.
opts.Config.MergeIn(awsConfig)
}
default:
return nil, errors.Errorf("unsupported value %s for %s", kmsURIParams.auth, cloud.AuthParam)
}
Expand All @@ -140,6 +168,11 @@ func MakeAWSKMS(uri string, env cloud.KMSEnv) (cloud.KMS, error) {
if err != nil {
return nil, errors.Wrap(err, "new aws session")
}

if kmsURIParams.auth == cloud.AuthParamAssume {
sess.Config.Credentials = stscreds.NewCredentials(sess, kmsURIParams.roleArn)
}

if region == "" {
// TODO(adityamaru): Maybe use the KeyID to get the region, similar to how
// we infer the region from the bucket for s3_storage.
Expand Down
74 changes: 72 additions & 2 deletions pkg/cloud/amazon/aws_kms_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ func init() {

func TestEncryptDecryptAWS(t *testing.T) {
defer leaktest.AfterTest(t)()

// If environment credentials are not present, we want to
// skip all AWS KMS tests, including auth-implicit, even though
// it is not used in auth-implicit.
Expand All @@ -54,7 +53,6 @@ func TestEncryptDecryptAWS(t *testing.T) {
}
q.Add(param, v)
}

// Get AWS KMS region from env variable.
kmsRegion := os.Getenv("AWS_KMS_REGION_A")
if kmsRegion == "" {
Expand Down Expand Up @@ -125,6 +123,78 @@ func TestEncryptDecryptAWS(t *testing.T) {
}
}

func TestEncryptDecryptAWSAssumeRole(t *testing.T) {
defer leaktest.AfterTest(t)()
// If environment credentials are not present, we want to
// skip all AWS KMS tests, including auth-implicit, even though
// it is not used in auth-implicit.
_, err := credentials.NewEnvCredentials().Get()
if err != nil {
skip.IgnoreLint(t, "Test only works with AWS credentials")
}

q := make(url.Values)
expect := map[string]string{
"AWS_ACCESS_KEY_ID": AWSAccessKeyParam,
"AWS_SECRET_ACCESS_KEY": AWSSecretParam,
"AWS_ROLE_ARN": AWSRoleArnParam,
}
for env, param := range expect {
v := os.Getenv(env)
if v == "" {
skip.IgnoreLintf(t, "%s env var must be set", env)
}
q.Add(param, v)
}
// Get AWS KMS region from env variable.
kmsRegion := os.Getenv("AWS_KMS_REGION_A")
if kmsRegion == "" {
skip.IgnoreLint(t, "AWS_KMS_REGION_A env var must be set")
}
q.Add(KMSRegionParam, kmsRegion)
q.Set(cloud.AuthParam, cloud.AuthParamAssume)

for _, id := range []string{"AWS_KMS_KEY_ARN_A", "AWS_KEY_ID", "AWS_KEY_ALIAS"} {
// Get AWS Key identifier from env variable.
keyID := os.Getenv(id)
if keyID == "" {
skip.IgnoreLint(t, fmt.Sprintf("%s env var must be set", id))
}

t.Run(fmt.Sprintf("auth-implicit-%s", id), func(t *testing.T) {
// You can create an IAM that can access AWS KMS
// in the AWS console, then set it up locally.
// https://docs.aws.com/cli/latest/userguide/cli-configure-role.html
// We only run this test if default role exists.
credentialsProvider := credentials.SharedCredentialsProvider{}
_, err := credentialsProvider.Retrieve()
if err != nil {
skip.IgnoreLint(t, err)
}

// Create params for implicit user.
params := make(url.Values)
params.Add(cloud.AuthParam, cloud.AuthParamAssume)
params.Add(AWSRoleArnParam, q.Get(AWSRoleArnParam))
params.Add(KMSRegionParam, kmsRegion)

uri := fmt.Sprintf("aws:///%s?%s", keyID, params.Encode())
cloud.KMSEncryptDecrypt(t, uri, cloud.TestKMSEnv{
Settings: cluster.NoSettings,
ExternalIOConfig: &base.ExternalIODirConfig{},
})
})

t.Run(fmt.Sprintf("specified-%s", id), func(t *testing.T) {
uri := fmt.Sprintf("aws:///%s?%s", keyID, q.Encode())
cloud.KMSEncryptDecrypt(t, uri, cloud.TestKMSEnv{
Settings: cluster.NoSettings,
ExternalIOConfig: &base.ExternalIODirConfig{},
})
})
}
}

func TestPutAWSKMSEndpoint(t *testing.T) {
defer leaktest.AfterTest(t)()

Expand Down
68 changes: 57 additions & 11 deletions pkg/cloud/amazon/s3_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/credentials/stscreds"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
Expand Down Expand Up @@ -68,6 +69,10 @@ const (

// KMSRegionParam is the query parameter for the 'region' in every KMS URI.
KMSRegionParam = "REGION"

// AWSRoleArnParam is the query parameter for the 'role ARN' in an AWS
// URI, use if credentials are being verified with Assume Role.
AWSRoleArnParam = "AWS_ROLE_ARN"
)

type s3Storage struct {
Expand Down Expand Up @@ -100,7 +105,7 @@ var reuseSession = settings.RegisterBoolSetting(
// requests).
type s3ClientConfig struct {
// copied from ExternalStorage_S3.
endpoint, region, bucket, accessKey, secret, tempToken, auth string
endpoint, region, bucket, accessKey, secret, tempToken, auth, roleArn string
// log.V(2) decides session init params so include it in key.
verbose bool
}
Expand All @@ -115,6 +120,7 @@ func clientConfig(conf *roachpb.ExternalStorage_S3) s3ClientConfig {
tempToken: conf.TempToken,
auth: conf.Auth,
verbose: log.V(2),
roleArn: conf.RoleArn,
}
}

Expand Down Expand Up @@ -151,6 +157,7 @@ func S3URI(bucket, path string, conf *roachpb.ExternalStorage_S3) string {
setIf(AWSServerSideEncryptionMode, conf.ServerEncMode)
setIf(AWSServerSideEncryptionKMSID, conf.ServerKMSID)
setIf(S3StorageClassParam, conf.StorageClass)
setIf(AWSRoleArnParam, conf.RoleArn)

s3URL := url.URL{
Scheme: "s3",
Expand All @@ -177,6 +184,7 @@ func parseS3URL(_ cloud.ExternalStorageURIContext, uri *url.URL) (roachpb.Extern
ServerEncMode: uri.Query().Get(AWSServerSideEncryptionMode),
ServerKMSID: uri.Query().Get(AWSServerSideEncryptionKMSID),
StorageClass: uri.Query().Get(S3StorageClassParam),
RoleArn: uri.Query().Get(AWSRoleArnParam),
/* NB: additions here should also update s3QueryParams() serializer */
}
conf.S3Config.Prefix = strings.TrimLeft(conf.S3Config.Prefix, "/")
Expand Down Expand Up @@ -231,6 +239,29 @@ func MakeS3Storage(
return nil, errors.New(
"implicit credentials disallowed for s3 due to --external-io-implicit-credentials flag")
}
case cloud.AuthParamAssume:
{
if conf.RoleArn == "" {
return nil, errors.Errorf(
"%s is set to '%s', but %s must be set",
cloud.AuthParam,
cloud.AuthParamAssume,
AWSRoleArnParam,
)
}
// User can specify the account that is assuming the role, or if left
// unspecified, it will be retrieved from the default credentials
// chain.
if (conf.AccessKey == "") != (conf.Secret == "") {
return nil, errors.Errorf(
"%s is set to '%s', but %s and %s must both be set for a specified user or neither for implicit",
cloud.AuthParam,
cloud.AuthParamAssume,
AWSAccessKeyParam,
AWSSecretParam,
)
}
}
default:
return nil, errors.Errorf("unsupported value %s for %s", conf.Auth, cloud.AuthParam)
}
Expand Down Expand Up @@ -319,13 +350,6 @@ func newClient(
opts.Config.HTTPClient = client
}

switch conf.auth {
case "", cloud.AuthParamSpecified:
opts.Config.WithCredentials(credentials.NewStaticCredentials(conf.accessKey, conf.secret, conf.tempToken))
case cloud.AuthParamImplicit:
opts.SharedConfigState = session.SharedConfigEnable
}

// TODO(yevgeniy): Revisit retry logic. Retrying 10 times seems arbitrary.
opts.Config.MaxRetries = aws.Int(10)

Expand All @@ -335,9 +359,31 @@ func newClient(
opts.Config.LogLevel = aws.LogLevel(aws.LogDebugWithRequestRetries | aws.LogDebugWithRequestErrors)
}

sess, err := session.NewSessionWithOptions(opts)
if err != nil {
return s3Client{}, "", errors.Wrap(err, "new aws session")
var sess *session.Session
var err error

switch conf.auth {
case "", cloud.AuthParamSpecified:
sess, err = session.NewSessionWithOptions(opts)
if err != nil {
return s3Client{}, "", errors.Wrap(err, "new aws session")
}
sess.Config.Credentials = credentials.NewStaticCredentials(conf.accessKey, conf.secret, conf.tempToken)
case cloud.AuthParamImplicit:
opts.SharedConfigState = session.SharedConfigEnable
sess, err = session.NewSessionWithOptions(opts)
if err != nil {
return s3Client{}, "", errors.Wrap(err, "new aws session")
}
case cloud.AuthParamAssume:
if conf.accessKey != "" {
opts.Config.WithCredentials(credentials.NewStaticCredentials(conf.accessKey, conf.secret, conf.tempToken))
}
sess, err = session.NewSessionWithOptions(opts)
if err != nil {
return s3Client{}, "", errors.Wrap(err, "new aws session")
}
sess.Config.Credentials = stscreds.NewCredentials(sess, conf.roleArn)
}

region := conf.region
Expand Down
Loading

0 comments on commit f06da92

Please sign in to comment.