-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli,storage: add emergency ballast #66893
Conversation
278105f
to
5b4e386
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up Jackson! This looks good to me (I did have some questions and UX comments inline), but I didn't do a nitty-gritty line-by-line review. I don't think Josh will do that either; mind getting a reviewer from storage
as well for that?
Reviewed 7 of 7 files at r1, 2 of 2 files at r2, 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @joshimhoff)
pkg/base/store_spec.go, line 329 at r3 (raw file):
case "ballast-size": var minBytesAllowed int64 var minPercent float64 = 1
aren't 0.1% etc also allowed?
pkg/cli/start.go, line 341 at r3 (raw file):
return err } else if isDiskFull { err := errors.Errorf("store %s: out of disk space", spec.Path)
nit: you could collect all of the out-of-disk specs across stores, in case multiple are out of space. You could also hint people at the existence of the ballast file or, in the future, at docs on the topic.
pkg/storage/ballast.go, line 36 at r3 (raw file):
desiredSizeBytes := BallastSizeBytes(spec, diskUsage) ballastPath := base.EmergencyBallastFile(fs.PathJoin, spec.Path)
are these all stable under different ways of starting the same node? For example, I might start it with --store=./data
one day, and with --store=/foo/bar/data
the next.
We should also note on EmergencyBallastFile
that the constants within cannot be changed, or existing ballast files will start getting included in disk usage & new ballast files will be created. More things that would be useful:
- give devs a way to opt-out of ballast files via an env var. I don't want a 1GB file to be created every time I
./cockroach start-single- node
. - similarly, need to make sure that
cockroach demo
doesn't ever do ballast files (I think that's always in-mem, so probably already true) - it might be worth putting a
.txt
file next toEmergencyBallastFile
that has some information about ballast and how to disable it if undesirable.
pkg/storage/ballast_test.go, line 218 at r3 (raw file):
path, cleanup := setup(t, 4096) defer cleanup() resized, err := maybeEstablishBallast(vfs.Default, path, 4096, vfs.DiskUsage{
From your experience, are disk capacities static? For example, if a disk reports a capacity of 10TB + rand(0,100) bytes
, will that lead to the ballast file being recreated ~every time (which is probably mildly expensive, or very expensive when the file actually has to be written wholesale)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thanks for putting me on the PR. It's really nice to work with y'all directly on solving various problems that have an operational angle / affect oncall life in a big way. That is, DEVOPS (in the deep platonic sense (no buzzwords allowed)).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)
pkg/storage/ballast.go, line 25 at r3 (raw file):
// IsDiskFull examines the store indicated by spec, determining whether the // store's underlying disk is out of disk space. A disk is considered to be // full if available capacity is less than half of the store's ballast size.
Is one of the goals of this PR that after a single crash, all subsequent crashes will have the out of disk exit code, as CRDB will determine that the disk is full by calling this function? What is our confidence that in an out of disk scenario, we will exit with the new code repeatedly? In asking this Q, I am thinking about the possibly of an attempt to write to some log file failing before this check is made, as has happened in CC land (as we use datadir disk for logging currently). Can you say why our confidence is what it is for my learning also? I can imagine deleting the ballast automatically on startup when needed, but continuing to crash as the available disk is ~equal to the ballast, but perhaps that is not necessary as we have the confidence we need already, just for reasons I don't follow yet.
Also, what about the first crash? This will happen in the <=21.1 way, that is, somewhere a write to disk will fail leading to a crash, and we can't say much about the exit code, as it depends where that happens? If that is right, do we generally expect a 100% full disk, that is, no available capacity, rather than a disk that has around 1/2 ballast worth of capacity? In a way this relates to my first Q.
0a8dc02
to
95a9a3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind getting a reviewer from storage as well for that?
For sure. I think maybe we should see whether the exit code approach is workable within CC alerting, but then I'll tag storage folks for review.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @joshimhoff, and @tbg)
pkg/base/store_spec.go, line 329 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
aren't 0.1% etc also allowed?
Good call, lifted the minimum requirement altogether.
pkg/storage/ballast.go, line 25 at r3 (raw file):
Is one of the goals of this PR that after a single crash, all subsequent crashes will have the out of disk exit code, as CRDB will determine that the disk is full by calling this function? What is our confidence that in an out of disk scenario, we will exit with the new code repeatedly? In asking this Q, I am thinking about the possibly of an attempt to write to some log file failing before this check is made, as has happened in CC land (as we use datadir disk for logging currently). Can you say why our confidence is what it is for my learning also?
Yeah, we want all subsequent exits to have the disk full exit code. I'm pretty confident that we will always exit with the new code, so long as some other process isn't causing disk usage to fluctuate rapidly. The call to this function happens very early in the node start path, even before we initialize logging. Before we call IsDiskFull
, we parse command flags, maybe re-execute the process in the background if requested, set a umask and set up signal handlers. That's it.
I can imagine deleting the ballast automatically on startup when needed, but continuing to crash as the available disk is ~equal to the ballast, but perhaps that is not necessary as we have the confidence we need already, just for reasons I don't follow yet.
In a non-CC setting, we want the user to be able to start up with the newly created ballast headroom and try to recover the node/cluster. Without a separate file or flag to indicate that intent, we need the user to delete the ballast so that the increased available capacity can be the signal.
Also, what about the first crash? This will happen in the <=21.1 way, that is, somewhere a write to disk will fail leading to a crash, and we can't say much about the exit code, as it depends where that happens?
I expect the vast majority of first crashes to exit with the Disk Full
error code, because of a wrapper around the virtual file system used by Pebble and most filesystem writes within CockroachDB. There are definitely some codepaths that directly use the standard library os
package and an ENOSPC
in those codepaths potentially could fatal the process with a different exit code. As far as I know, these codepaths are just things like:
- TLS certificate rotation
- writing to external/nodelocal storage (eg, when writing backups to the node's local disk for collecting later; not relevant to CC)
- writing CPU and heap profiles, goroutine dumps, debug.zips
- unfortunately, resizing the ballast itself, for now
If that is right, do we generally expect a 100% full disk, that is, no available capacity, rather than a disk that has around 1/2 ballast worth of capacity? In a way this relates to my first Q.
We generally expect an available capacity that is just shy of 0. There's some amount of variation in the low-level block allocation requests made by the filesystem, but as far as I know, not a lot. I think there may be more variation on copy-on-write filesystems like btrfs or zfs, but we don't need to worry about that in CC.
pkg/storage/ballast.go, line 36 at r3 (raw file):
are these all stable under different ways of starting the same node? For example, I might start it with --store=./data one day, and with --store=/foo/bar/data the next.
Yeah, we convert to the store path to an absolute path very early when we construct the StoreSpec
. We don't have a guarantee that the store directory will exist though, so I updated this to account for that.
- give devs a way to opt-out of ballast files via an env var. I don't want a 1GB file to be created every time I ./cockroach start-single- node.
Good point. I'll follow up with this in a bit.
- similarly, need to make sure that cockroach demo doesn't ever do ballast files (I think that's always in-mem, so probably already true)
I'll double check this.
- it might be worth putting a .txt file next to EmergencyBallastFile that has some information about ballast and how to disable it if undesirable.
Good call.
pkg/storage/ballast_test.go, line 218 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
From your experience, are disk capacities static? For example, if a disk reports a capacity of
10TB + rand(0,100) bytes
, will that lead to the ballast file being recreated ~every time (which is probably mildly expensive, or very expensive when the file actually has to be written wholesale)?
This is a good question — I do expect it to be static, but it's worth verifying that experimentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @tbg)
pkg/storage/ballast.go, line 25 at r3 (raw file):
Ack! Thanks for explaining. I might suggest that some of above thinking should be in comments in the code, perhaps in start.go
? I also wonder if we should / can system test the consistent exit code when out of disk, so we don't lose the property down the line?
I will check this out today / tomorrow in the AM; I think it will work; I just want to be paranoid about it.
For sure. I think maybe we should see whether the exit code approach is workable within CC alerting
12fc9b2
to
015c171
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff and @tbg)
pkg/cli/start.go, line 341 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit: you could collect all of the out-of-disk specs across stores, in case multiple are out of space. You could also hint people at the existence of the ballast file or, in the future, at docs on the topic.
good call, done.
pkg/storage/ballast.go, line 25 at r3 (raw file):
added some comments!
I will check this out today / tomorrow in the AM; I think it will work; I just want to be paranoid about it.
awesome, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. just a few nits
Reviewed 7 of 7 files at r1, 2 of 2 files at r2, 3 of 10 files at r3, 2 of 6 files at r4, 6 of 6 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @joshimhoff, and @tbg)
pkg/base/store_spec.go, line 82 at r5 (raw file):
// SizeSpec if it is correctly parsed. func NewSizeSpec( field, value string, bytesRange *intInterval, percentRange *floatInterval,
maybe make the field
argument a redact.SafeString
pkg/base/store_spec.go, line 96 at r5 (raw file):
size.Percent *= percentFactor if err != nil { return SizeSpec{}, fmt.Errorf("could not parse %s size (%s) %s", field, value, err)
then use errors.Newf
here and below
pkg/base/store_spec.go, line 154 at r5 (raw file):
// pflag's value interface. func (ss *SizeSpec) Set(value string) error { spec, err := NewSizeSpec("", value, nil, nil)
change that empty string to "specified"
so that the error messages, if any, check out with the flag UX.
pkg/base/store_spec.go, line 194 at r5 (raw file):
// String returns a fully parsable version of the store spec. func (ss StoreSpec) String() string {
All this would benefit from being turned into a SafeFormat
.
pkg/cli/start.go, line 1037 at r5 (raw file):
} ballastPathsStr := strings.Join(ballastPaths, "\n") err = errors.WithDetailf(err, `Deleting or truncating the ballast file(s) at
nit: might be a hint more than a detail
pkg/storage/ballast.go, line 25 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
added some comments!
I will check this out today / tomorrow in the AM; I think it will work; I just want to be paranoid about it.
awesome, thank you!
Can you also explain the above in the release note thanks. Release notes are the communication channel by which we send information to docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff, @knz, and @tbg)
pkg/base/store_spec.go, line 82 at r5 (raw file):
Previously, knz (kena) wrote…
maybe make the
field
argument aredact.SafeString
Done.
pkg/base/store_spec.go, line 96 at r5 (raw file):
Previously, knz (kena) wrote…
then use
errors.Newf
here and below
Done.
pkg/base/store_spec.go, line 154 at r5 (raw file):
Previously, knz (kena) wrote…
change that empty string to
"specified"
so that the error messages, if any, check out with the flag UX.
Done.
pkg/base/store_spec.go, line 194 at r5 (raw file):
Previously, knz (kena) wrote…
All this would benefit from being turned into a
SafeFormat
.
Added a TODO
pkg/cli/start.go, line 1037 at r5 (raw file):
Previously, knz (kena) wrote…
nit: might be a hint more than a detail
Done.
pkg/storage/ballast.go, line 25 at r3 (raw file):
Previously, knz (kena) wrote…
Can you also explain the above in the release note thanks. Release notes are the communication channel by which we send information to docs.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r3, 1 of 6 files at r4, 2 of 3 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick, @itsbilal, @jbowens, @joshimhoff, @knz, and @tbg)
pkg/storage/ballast.go, line 28 at r7 (raw file):
// may want to include `COCKROACH_AUTO_BALLAST=false` in their environment to // prevent the automatic creation of large ballast files on their local // filesystem.
Have we already taught our on-prem users to create a ballast file using https://www.cockroachlabs.com/docs/stable/cockroach-debug-ballast.html?
If yes, I assume that the default here would create another one since the existing one could have an arbitrary path that we don't know about.
I can see why we would want to default to true, and possibly all we need is sufficient release documentation -- what I worry about is their being 2 ballast files and no one noticing.
pkg/storage/ballast.go, line 64 at r7 (raw file):
} // If the ballast is larger than desired, truncate it now in case the
what if we need to increase it in size? Is there a reason why that would only happen via the Capacity
periodic logic and not here?
pkg/storage/ballast.go, line 89 at r7 (raw file):
// BallastSizeBytes returns the desired size of the emergency ballast, // calculated from the provided store spec and disk usage. If the store spec // contains an explicit ballast size (either in bytes or as a perecentage of
percentage
pkg/storage/ballast.go, line 141 at r7 (raw file):
// If the user configured a really large ballast, we might not ever // have >= 4x the required amount available. Also allow extending the // ballast if we will have 10 GiB available after the extension.
What is the motivation for this 10GiB logic? Should we log an error instead of potentially taking up too much space? I can see why that would not be satisfactory since no one may notice. A code comment could help here.
pkg/storage/pebble.go, line 592 at r7 (raw file):
// store's capacity is queried periodically. Only try if the store is not // an in-memory store. vfs.MemFS will error if you retrieve disk usage. if cfg.Dir != "" /* if not in-memory */ {
Is this a reliable way of detecting "not in-memory"?
pkg/storage/pebble.go, line 1027 at r7 (raw file):
fsuTotal := int64(du.TotalBytes) fsuAvail := int64(du.AvailBytes)
Can you add a code comment about Capacity
being called periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick, @itsbilal, @joshimhoff, @knz, @sumeerbhola, and @tbg)
pkg/storage/ballast.go, line 28 at r7 (raw file):
Previously, sumeerbhola wrote…
Have we already taught our on-prem users to create a ballast file using https://www.cockroachlabs.com/docs/stable/cockroach-debug-ballast.html?
If yes, I assume that the default here would create another one since the existing one could have an arbitrary path that we don't know about.
I can see why we would want to default to true, and possibly all we need is sufficient release documentation -- what I worry about is their being 2 ballast files and no one noticing.
It is featured somewhat prominently on the production checklist docs. I pinged customer success to see if anyone has a sense of how widespread manually-created ballasts are. This is definitely something that needs to be well-documented in the release docs. I'm not sure what else we can do outside that.
pkg/storage/ballast.go, line 64 at r7 (raw file):
Previously, sumeerbhola wrote…
what if we need to increase it in size? Is there a reason why that would only happen via the
Capacity
periodic logic and not here?
There's just the complication that the data directory might not exist yet. I think it is preferable to establish the ballast here too if available capacity allows. I updated maybeEstablishBallast
, adding a MkdirAll
of the data directory before attempting to create/grow the ballast.
pkg/storage/ballast.go, line 89 at r7 (raw file):
Previously, sumeerbhola wrote…
percentage
Done.
pkg/storage/ballast.go, line 141 at r7 (raw file):
Previously, sumeerbhola wrote…
What is the motivation for this 10GiB logic? Should we log an error instead of potentially taking up too much space? I can see why that would not be satisfactory since no one may notice. A code comment could help here.
The 10 GiB logic exists for users that want the safety of a large ballast. Larger ballast sizes (eg, 5%, 10%) are not unreasonably large, but it's possible that available capacity will never exceed 4x the ballast sizes (eg, 20%, 40%) after recovery. That's especially true on larger clusters where you can use a higher percentage of each individual disk's capacity because the cluster as a whole still has plenty of capacity for rehoming replicas if a node is lost. Added a little more of this explanation as a code comment.
pkg/storage/pebble.go, line 592 at r7 (raw file):
Previously, sumeerbhola wrote…
Is this a reliable way of detecting "not in-memory"?
Yeah, it's the check Capacity
makes, and the check (*Pebble).InMem()
makes.
pkg/storage/pebble.go, line 1027 at r7 (raw file):
Previously, sumeerbhola wrote…
Can you add a code comment about
Capacity
being called periodically.
Done.
28d8af4
to
9ae7f2b
Compare
881da2d
to
cfa19be
Compare
8149787
to
7746b7b
Compare
951addb
to
783ed0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r3, 1 of 6 files at r5, 9 of 14 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bananabrick, @itsbilal, @jbowens, @joshimhoff, @knz, @sumeerbhola, and @tbg)
pkg/cli/start.go, line 1040 at r8 (raw file):
continue } ballastPaths = append(ballastPaths, base.EmergencyBallastFile(vfs.Default.PathJoin, spec.Path))
how about confirming that this file actually exists and classifying into two categories of errors: can or cannot be rectified by deleting the ballast file. That may be more helpful to an operator who is trying to deal with an emergency situation.
pkg/storage/ballast.go, line 58 at r8 (raw file):
// Try to resize the ballast now, if necessary. This is necessary to // truncate the ballast if it a new, lower ballast size was provided, and
... if a new ...
6e679be
to
de924da
Compare
Add an automatically created, on-by-default emergency ballast file. This new ballast defaults to the minimum of 1% total disk capacity or 1GiB. The size of the ballast may be configured via the `--store` flag with a `ballast-size` field, accepting the same value formats as the `size` field. The ballast is automatically created when either available disk space is at least four times the ballast size or when available disk space after creating the ballast is at least 10 GiB. Creation of the ballast happens either when the engine is opened or during the periodic Capacity calculations driven by the `kvserver.Store`. During node start, if available disk space is less than or equal to half the ballast size, exit immediately with a new Disk Full (10) exit code. See cockroachdb#66493. Release note (ops change): Add an automatically created, on by default emergency ballast file. This new ballast defaults to the minimum of 1% total disk capacity or 1GiB. The size of the ballast may be configured via the `--store` flag with a `ballast-size` field, accepting the same value formats as the `size` field. Also, add a new Disk Full (10) exit code that indicates that the node exited because disk space on at least one store is exhausted. On node start, if any store has less than half the ballast's size bytes available, the node immediately exits with the Disk Full (10) exit code. The operator may manually remove the configured ballast (assuming they haven't already) to allow the node to start, and they can take action to remedy the disk space exhaustion. The ballast will automatically be recreated when available disk space is 4x the ballast size, or at least 10 GiB is available after the ballast is created.
de924da
to
54282ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bananabrick, @itsbilal, @joshimhoff, @knz, @sumeerbhola, and @tbg)
pkg/cli/start.go, line 1040 at r8 (raw file):
Previously, sumeerbhola wrote…
how about confirming that this file actually exists and classifying into two categories of errors: can or cannot be rectified by deleting the ballast file. That may be more helpful to an operator who is trying to deal with an emergency situation.
Done.
pkg/storage/ballast.go, line 58 at r8 (raw file):
Previously, sumeerbhola wrote…
... if a new ...
Done.
Build succeeded: |
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
90176: cli/start: unify code between `cockroach start` and `cockroach mt start-sql` r=andreimatei a=knz Fixes #89974. Fixes #90831. Fixes #90524. This PR merges the server initialization code between `cockroach start` and `cockroach mt start-sql`. In doing so, it brings `cockroach mt start-sql` closer to what we expect from proper CockroachDB server processes: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from #4754) - it adds a tracing span for the startup code. (from #8712!!) - it properly supports `--listening-url-file`. (from #15468) - it properly does sanitization of `--external-io-dir`. (from #19725) - it sets the proper log severity level for gRPC. (from #20308) - it reports the command-line and env config to logs. (from #21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from #42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from #44043) - it recovers support for `DISABLE_STARTING_BACKGROUND_JOBS`. (from #44786) - it sets `GOMAXPROCS` from current cgroup limits. (from #57390) - it stops the server early if the storage disk is full. (from #66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from #73876) See the individual commit for details. 90660: sql: add contention_events to cluster_execution_insights r=j82w a=j82w The original contention column will remain to make query operations faster. The events are being put into a json column because it's possible there could be multiple blocking events for a single statement. The json column avoids the complexity of adding another table and keeping it in sync with the insights table. The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id. closes: #88561 Release note (sql change): Adds contention_events to cluster_execution_insights. This is used to see which transaction is blocking the specific statement. 90719: opgen: added a bool field in struct opgen.transition r=Xiang-Gu a=Xiang-Gu This PR adds a bool field in struct opgen.transition that indicates whether it results from a `equiv(xx)` transition spec in the opgen file. It will be useful for a test where we need to find the inital status on a adding/dropping path. Without such a change, it can be problematic if we have a `equiv(xx)` spec as the first transition. E.g. ``` ToAbsent( PUBLIC, equiv(VALIDATED), to(WRITE_ONLY), to(ABSENT), ) ``` Without this change, the inital status will confusingly be `VALIDATED`, and the next status will be `PUBLIC`. With this change, the initial status will be `PUBLIC`, and the next status will be `WRITE_ONLY`. We also added some comments when we make transitions from the specs. Epic: None Release note: None 90865: sql: use bare name string of new pk to compare with pk name when altering primary key r=chengxiong-ruan a=chengxiong-ruan Fixes #90836 Release note (sql change): previously, the `DROP CONSTRAINT, ADD CONSTRAINT` in one trick to have a new primary key without moving old primary key to be a secondary index didn't work if the primary key name is a reserved SQL keyword. A `constraint already exists` error was returned. This patch fixed the bug, the trick now also works with primary key named as reserved keywords. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]>
Add an automatically created, on-by-default emergency ballast file. This
new ballast defaults to the minimum of 1% total disk capacity or 1GiB.
The size of the ballast may be configured via the
--store
flag with aballast-size
field, accepting the same value formats as thesize
field.
The ballast is automatically created when either available disk space is
at least four times the ballast size, or when available disk space after
creating the ballast is at least 10 GiB. Creation of the ballast happens
either when the engine is opened or during the periodic Capacity
calculations driven by the
kvserver.Store
.During node start, if available disk space is less than or equal to half
the ballast size, exit immediately with a new Disk Full (10) exit code.
See #66493.
Release note (ops change): Add an automatically created, on by default
emergency ballast file. This new ballast defaults to the minimum of 1%
total disk capacity or 1GiB. The size of the ballast may be configured
via the
--store
flag with aballast-size
field, accepting the samevalue formats as the
size
field. Also, add a new Disk Full (10) exitcode that indicates that the node exited because disk space on at least
one store is exhausted. On node start, if any store has less than half
the ballast's size bytes available, the node immediately exits with the
Disk Full (10) exit code. The operator may manually remove the
configured ballast (assuming they haven't already) to allow the node to
start, and they can take action to remedy the disk space exhaustion. The
ballast will automatically be recreated when available disk space is 4x
the ballast size, or at least 10 GiB is available after the ballast is
created.