Skip to content

Commit

Permalink
Helm: Revert Scheduler storage quota size to 1Gi (#8107)
Browse files Browse the repository at this point in the history
In v1.14.3, the storage quota size for the Scheduler volume was
increased from `1Gi` to `16Gi`. This is because users where encountering
disk exhaustion fatal errors on the Scheduler under normal usage.
Because the volume size request field is protected from updates, Dapr
version upgrades to v1.14.3 failed without manual intervention.

Reverts the Scheduler storage quota size back to `1Gi`, and adds
warnings that the volume size may need to be increased for production
deployments.

Signed-off-by: joshvanl <[email protected]>
  • Loading branch information
JoshVanL authored Sep 16, 2024
1 parent cde4cd2 commit e02ca9c
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 2 deletions.
2 changes: 1 addition & 1 deletion charts/dapr/charts/dapr_scheduler/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ cluster:
etcdDataDirPath: /var/run/data/dapr-scheduler
etcdDataDirWinPath: C:\\dapr-scheduler
storageClassName: ""
storageSize: 16Gi
storageSize: 1Gi
inMemoryStorage: false

etcdSpaceQuota: 16Gi
Expand Down
9 changes: 9 additions & 0 deletions charts/dapr/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,12 @@ https://github.com/dapr/quickstarts

For more information on running Dapr, visit:
https://dapr.io

{{- if eq .Values.dapr_scheduler.cluster.storageSize "1Gi" }}
> Warning: The default storage size for the Scheduler is 1Gi, which may not be sufficient for production deployments.
> You may want to consider reinstalling Dapr with a larger Scheduler storage of at least 16Gi.
>
> --set dapr_scheduler.cluster.storageSize=16Gi --set dapr_scheduler.etcdSpaceQuota=16Gi
>
> For more information, see https://docs.dapr.io/operations/hosting/kubernetes/kubernetes-persisting-scheduler
{{- end }}
10 changes: 9 additions & 1 deletion cmd/scheduler/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ import (
"github.com/dapr/kit/logger"
)

var log = logger.NewLogger("dapr.scheduler")

type Options struct {
Port int
HealthzPort int
Expand Down Expand Up @@ -65,6 +67,8 @@ type Options struct {
func New(origArgs []string) (*Options, error) {
var opts Options

defaultEtcdStorageQuota := int64(16 * 1024 * 1024 * 1024)

// Create a flag set
fs := pflag.NewFlagSet("scheduler", pflag.ExitOnError)
fs.SortFlags = true
Expand All @@ -86,7 +90,7 @@ func New(origArgs []string) (*Options, error) {
fs.StringVar(&opts.EtcdDataDir, "etcd-data-dir", "./data", "Directory to store scheduler etcd data")
fs.StringSliceVar(&opts.EtcdClientPorts, "etcd-client-ports", []string{"dapr-scheduler-server-0=2379"}, "Ports for etcd client communication")
fs.StringSliceVar(&opts.EtcdClientHTTPPorts, "etcd-client-http-ports", nil, "Ports for etcd client http communication")
fs.StringVar(&opts.etcdSpaceQuota, "etcd-space-quota", resource.NewQuantity(16*1024*1024*1024, resource.BinarySI).String(), "Space quota for etcd")
fs.StringVar(&opts.etcdSpaceQuota, "etcd-space-quota", resource.NewQuantity(defaultEtcdStorageQuota, resource.BinarySI).String(), "Space quota for etcd")
fs.StringVar(&opts.EtcdCompactionMode, "etcd-compaction-mode", "periodic", "Compaction mode for etcd. Can be 'periodic' or 'revision'")
fs.StringVar(&opts.EtcdCompactionRetention, "etcd-compaction-retention", "10m", "Compaction retention for etcd. Can express time or number of revisions, depending on the value of 'etcd-compaction-mode'")
fs.Uint64Var(&opts.EtcdSnapshotCount, "etcd-snapshot-count", 10000, "Number of committed transactions to trigger a snapshot to disk.")
Expand Down Expand Up @@ -126,5 +130,9 @@ func New(origArgs []string) (*Options, error) {
}
opts.EtcdSpaceQuota, _ = etcdSpaceQuota.AsInt64()

if etcdSpaceQuota.Value() < defaultEtcdStorageQuota {
log.Warnf("--etcd-space-quota of %s may be too low for production use. Consider increasing the value to 16Gi or larger.", etcdSpaceQuota.String())
}

return &opts, nil
}
32 changes: 32 additions & 0 deletions docs/release_notes/v1.14.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Dapr 1.14.4

This update includes bug fixes:

- [Fixes being able to upgrade Dapr v1.14.x to v1.14.3 without manual intervention](#fixes-being-able-to-upgrade-dapr-v114x-to-v1143-without-manual-intervention)

## Fixes being able to upgrade Dapr v1.14.x to v1.14.3 without manual intervention

### Problem

When upgrading from Dapr versions `v1.14.0`, `v1.14.1`, or `v1.14.2` to `v1.14.3`, the upgrade would fail with the following error.

```
Error: UPGRADE FAILED: cannot patch "dapr-scheduler-server" with kind StatefulSet: StatefulSet.apps "dapr-scheduler-server" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
```

The workaround for the user is to Delete the Scheduler StatefulSet before re-running the upgrade with the new storage size.
The StorageClass must support volume expansion for this to work.

### Impact

Users were unable to upgrade Dapr to `v1.14.3` without manual intervention, breaking automated upgrades.

### Root cause

In Dapr `v1.14.3`, the `dapr-scheduler-server` StatefulSet changed the default persistent volume request size from `1Gi` to `16Gi`.
Kubernetes prevents updating this request field, resulting in this the above error.

### Solution

This field has been reverted to the previous `1Gi` request size.
Users who wish to increase the volume size must follow the manual steps [described in the documentation](https://docs.dapr.io/operations/hosting/kubernetes/kubernetes-persisting-scheduler).

0 comments on commit e02ca9c

Please sign in to comment.