Previous change logs can be found at CHANGELOG-3.3.
See code changes and v3.4 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.4 upgrade guide.
- Add Raft learner.
- Rewrite client balancer with new gRPC balancer interface.
- Add backoff on watch retries on transient errors.
- Add jitter to watch progress notify to prevent spikes in
etcd_network_client_grpc_sent_bytes_total
. - Improve read index wait timeout warning log, which indicates that local node might have slow network.
- Improve slow request apply warning log.
- e.g.
read-only range request "key:\"/a\" range_end:\"/b\" " with result "range_response_count:3 size:96" took too long (97.966µs) to execute
. - Redact request value field.
- Provide response size.
- e.g.
- Improve "became inactive" warning log, which indicates message send to a peer failed.
- Improve TLS setup error logging to help debug TLS-enabled cluster configuring issues.
- Improve long-running concurrent read transactions under light write workloads.
- Previously, periodic commit on pending writes blocks incoming read transactions, even if there is no pending write.
- Now, periodic commit operation does not block concurrent read transactions, thus improves long-running read transaction performance.
- Improve Raft Read Index timeout warning messages.
- Adjust election timeout on server restart to reduce disruptive rejoining servers.
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
- Add Raft Pre-Vote feature to reduce disruptive rejoining servers.
- For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.
- Adjust periodic compaction retention window.
- e.g.
etcd --auto-compaction-mode=revision --auto-compaction-retention=1000
automaticallyCompact
on"latest revision" - 1000
every 5-minute (when latest revision is 30000, compact on revision 29000). - e.g. Previously,
etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h
automaticallyCompact
with 24-hour retention windown for every 2.4-hour. Now,Compact
happens for every 1-hour. - e.g. Previously,
etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m
automaticallyCompact
with 30-minute retention windown for every 3-minute. Now,Compact
happens for every 30-minute. - Periodic compactor keeps recording latest revisions for every compaction period when given period is less than 1-hour, or for every 1-hour when given compaction period is greater than 1-hour (e.g. 1-hour when
etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h
). - For every compaction period or 1-hour, compactor uses the last revision that was fetched before compaction period, to discard historical data.
- The retention window of compaction period moves for every given compaction period or hour.
- For instance, when hourly writes are 100 and
etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h
,v3.2.x
,v3.3.0
,v3.3.1
, andv3.3.2
compact revision 2400, 2640, and 2880 for every 2.4-hour, whilev3.3.3
or later compacts revision 2400, 2500, 2600 for every 1-hour. - Futhermore, when
etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m
and writes per minute are about 1000,v3.3.0
,v3.3.1
, andv3.3.2
compact revision 30000, 33000, and 36000, for every 3-minute, whilev3.3.3
or later compacts revision 30000, 60000, and 90000, for every 30-minute.
- e.g.
- Improve lease expire/revoke operation performance, address lease scalability issue.
- Make Lease
Lookup
non-blocking with concurrentGrant
/Revoke
. - Make etcd server return
raft.ErrProposalDropped
on internal Raft proposal drop in v3 applier and v2 applier.- e.g. a node is removed from cluster, or
raftpb.MsgProp
arrives at current leader while there is an ongoing leadership transfer.
- e.g. a node is removed from cluster, or
- Add
snapshot
package for easier snapshot workflow (seegodoc.org/github.com/etcd/clientv3/snapshot
for more). - Improve functional tester coverage: proxy layer to run network fault tests in CI, TLS is enabled both for server and client, liveness mode, shuffle test sequence, membership reconfiguration failure cases, disastrous quorum loss and snapshot recover from a seed member, embedded etcd.
- Improve index compaction blocking by using a copy on write clone to avoid holding the lock for the traversal of the entire index.
- Update JWT methods to allow for use of any supported signature method/algorithm.
- Add Lease checkpointing to persist remaining TTLs to the consensus log periodically so that long lived leases progress toward expiry in the presence of leader elections and server restarts.
- Add gRPC interceptor for debugging logs; enable
etcd --debug
flag to see per-request debug information. - Add consistency check in snapshot status. If consistency check on snapshot file fails,
snapshot status
returns"snapshot file integrity check failed..."
error. - Add
Verify
function to perform corruption check on WAL contents. - Improve heartbeat send failure logging.
- Require Go 1.11+.
- Use Go module for dependency management.
- Move
"github.com/coreos/etcd"
to"github.com/etcd-io/etcd"
.- Change import path to
"go.etcd.io/etcd"
. - e.g.
import "go.etcd.io/etcd/raft"
. - Updated module path to comply with Go module specification.
- e.g.
import "go.etcd.io/etcd/mvcc/backend"
is nowimport "go.etcd.io/etcd/v3/mvcc/backend"
.
- e.g.
- Change import path to
- Make
ETCDCTL_API=3 etcdctl
default.- Now,
etcdctl set foo bar
must beETCDCTL_API=2 etcdctl set foo bar
. - Now,
ETCDCTL_API=3 etcdctl put foo bar
could be justetcdctl put foo bar
.
- Now,
- Remove
etcd --ca-file
flag, instead useetcd --trusted-ca-file
(etcd --ca-file
flag has been marked deprecated since v2.1). - Remove
etcd --peer-ca-file
flag, instead useetcd --peer-trusted-ca-file
(etcd --peer-ca-file
flag has been marked deprecated since v2.1). - Remove
pkg/transport.TLSInfo.CAFile
field, instead usepkg/transport.TLSInfo.TrustedCAFile
(CAFile
field has been marked deprecated since v2.1). - Deprecate
latest
release container tag.docker pull gcr.io/etcd-development/etcd:latest
would not be up-to-date.
- Deprecate minor version release container tags.
docker pull gcr.io/etcd-development/etcd:v3.3
would still work.docker pull gcr.io/etcd-development/etcd:v3.4
would not work.- Use
docker pull gcr.io/etcd-development/etcd:v3.4.x
instead, with the exact patch version.
- Drop ACIs from official release.
- AppC was officially suspended, as of late 2016.
acbuild
is not maintained anymore.*.aci
files are not available fromv3.4
release.
- Exit on empty hosts in advertise URLs.
- Address advertise client URLs accepts empty hosts.
- e.g. exit with error on
--advertise-client-urls=http://:2379
. - e.g. exit with error on
--initial-advertise-peer-urls=http://:2380
.
- Exit on shadowed environment variables.
- Address error on shadowed environment variables.
- e.g. exit with error on
ETCD_NAME=abc etcd --name=def
. - e.g. exit with error on
ETCD_INITIAL_CLUSTER_TOKEN=abc etcd --initial-cluster-token=def
. - e.g. exit with error on
ETCDCTL_ENDPOINTS=abc.com ETCDCTL_API=3 etcdctl endpoint health --endpoints=def.com
.
- Change
etcdserverpb.AuthRoleRevokePermissionRequest/key,range_end
fields type fromstring
tobytes
. - Rename
etcd_debugging_mvcc_db_total_size_in_bytes
Prometheus metric toetcd_mvcc_db_total_size_in_bytes
. - Rename
etcdserver.ServerConfig.SnapCount
field toetcdserver.ServerConfig.SnapshotCount
, to be consistent with the flag nameetcd --snapshot-count
. - Rename
embed.Config.SnapCount
field toembed.Config.SnapshotCount
, to be consistent with the flag nameetcd --snapshot-count
. - Change
embed.Config.CorsInfo
in*cors.CORSInfo
type toembed.Config.CORS
inmap[string]struct{}
type. - Remove
embed.Config.SetupLogging
.- Now logger is set up automatically based on
embed.Config.Logger
,embed.Config.LogOutputs
,embed.Config.Debug
fields.
- Now logger is set up automatically based on
- Rename
etcd --log-output
toetcd --log-outputs
to support multiple log outputs.etcd --log-output
will be deprecated in v3.5.
- Rename
embed.Config.LogOutput
toembed.Config.LogOutputs
to support multiple log outputs. - Change
embed.Config.LogOutputs
type fromstring
to[]string
to support multiple log outputs.- Now that
etcd --log-outputs
accepts multiple writers, etcd configuration YAML filelog-outputs
field must be changed to[]string
type. - Previously,
etcd --config-file etcd.config.yaml
can havelog-outputs: default
field, now must belog-outputs: [default]
.
- Now that
- Change v3
etcdctl snapshot
exit codes withsnapshot
package.- Exit on error with exit code 1 (no more exit code 5 or 6 on
snapshot save/restore
commands).
- Exit on error with exit code 1 (no more exit code 5 or 6 on
- Migrate dependency management tool from
glide
togolang/dep
.- <= 3.3 puts
vendor
directory undercmd/vendor
directory to prevent conflicting transitive dependencies. - 3.4 moves
cmd/vendor
directory tovendor
at repository root. - Remove recursive symlinks in
cmd
directory. - Now
go get/install/build
onetcd
packages (e.g.clientv3
,tools/benchmark
) enforce builds with etcdvendor
directory.
- <= 3.3 puts
- Replace gRPC gateway endpoint
/v3beta
with/v3
.- Deprecated
/v3alpha
. - To deprecate
/v3beta
in v3.5. - In v3.4,
curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
still works as a fallback tocurl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
, butcurl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
won't work in v3.5. Usecurl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
instead.
- Deprecated
- Change
wal
package function signatures to support structured logger and logging to file in server-side.- Previously,
Open(dirpath string, snap walpb.Snapshot) (*WAL, error)
, nowOpen(lg *zap.Logger, dirpath string, snap walpb.Snapshot) (*WAL, error)
. - Previously,
OpenForRead(dirpath string, snap walpb.Snapshot) (*WAL, error)
, nowOpenForRead(lg *zap.Logger, dirpath string, snap walpb.Snapshot) (*WAL, error)
. - Previously,
Repair(dirpath string) bool
, nowRepair(lg *zap.Logger, dirpath string) bool
. - Previously,
Create(dirpath string, metadata []byte) (*WAL, error)
, nowCreate(lg *zap.Logger, dirpath string, metadata []byte) (*WAL, error)
.
- Previously,
- Remove
pkg/cors
package. - Change
etcd --experimental-enable-v2v3
flag toetcd --enable-v2v3
; v2 storage emulation is now stable. - Move internal packages to
etcdserver
."github.com/coreos/etcd/alarm"
to"go.etcd.io/etcd/etcdserver/api/v3alarm"
."github.com/coreos/etcd/compactor"
to"go.etcd.io/etcd/etcdserver/api/v3compactor"
."github.com/coreos/etcd/discovery"
to"go.etcd.io/etcd/etcdserver/api/v2discovery"
."github.com/coreos/etcd/etcdserver/auth"
to"go.etcd.io/etcd/etcdserver/api/v2auth"
."github.com/coreos/etcd/etcdserver/membership"
to"go.etcd.io/etcd/etcdserver/api/membership"
."github.com/coreos/etcd/etcdserver/stats"
to"go.etcd.io/etcd/etcdserver/api/v2stats"
."github.com/coreos/etcd/error"
to"go.etcd.io/etcd/etcdserver/api/v2error"
."github.com/coreos/etcd/rafthttp"
to"go.etcd.io/etcd/etcdserver/api/rafthttp"
."github.com/coreos/etcd/snap"
to"go.etcd.io/etcd/etcdserver/api/snap"
."github.com/coreos/etcd/store"
to"go.etcd.io/etcd/etcdserver/api/v2store"
.
- Change snapshot file permissions: On Linux, the snapshot file changes from readable by all (mode 0644) to readable by the user only (mode 0600).
- Upgrade
github.com/coreos/bbolt
fromv1.3.1-coreos.6
togo.etcd.io/bbolt
v1.3.1-etcd.7
. - Upgrade
google.golang.org/grpc
fromv1.7.5
tov1.13.0
. - Upgrade
github.com/golang/protobuf
fromgolang/protobuf@1e59b77b5
tov1.1.0
. - Migrate
github.com/ugorji/go/codec
togithub.com/json-iterator/go
, to regenerate v2client
(See #10667 for more). - Migrate
github.com/ghodss/yaml
tosigs.k8s.io/yaml
(See #10687 for more). - Upgrade
golang.org/x/crypto
fromcrypto@9419663f5
tocrypto@8ac0e0d97
. - Upgrade
golang.org/x/net
fromnet@66aacef3d
tonet@db08ff08e
. - Upgrade
golang.org/x/sys
fromsys@ebfc5b463
tosys@56ede360e
. - Upgrade
golang.org/x/text
fromtext@b19bf474d
totext@f21a4dfb5
. - Upgrade
golang.org/x/time
fromtime@c06e80d93
totime@fbb02b229
. - Upgrade
github.com/golang/protobuf
fromgolang/protobuf@1e59b77b5
tov1.1.0
. - Upgrade
gopkg.in/yaml.v2
fromyaml@cd8b52f82
toyaml@5420a8b67
. - Upgrade
github.com/dgrijalva/jwt-go
fromv3.0.0
tov3.2.0
. - Upgrade
github.com/soheilhy/cmux
fromv0.1.3
tov0.1.4
. - Upgrade
github.com/google/btree
fromgoogle/btree@925471ac9
togoogle/btree@e89373fe6
. - Upgrade
github.com/spf13/cobra
fromspf13/cobra@1c44ec8d3
tov0.0.3
. - Upgrade
github.com/spf13/pflag
fromv1.0.0
tospf13/pflag@1ce0cc6db
. - Upgrade
github.com/coreos/go-systemd
fromv15
tov17
. - Upgrade
github.com/prometheus/client_golang
fromprometheus/client_golang@5cec1d042
tov0.8.0
. - Upgrade
github.com/grpc-ecosystem/go-grpc-prometheus
fromgrpc-ecosystem/go-grpc-prometheus@0dafe0d49
tov1.2.0
. - Upgrade
github.com/grpc-ecosystem/grpc-gateway
fromv1.3.1
tov1.4.1
.
See List of metrics for all metrics per release.
Note that any etcd_debugging_*
metrics are experimental and subject to change.
- Add
etcd_snap_db_fsync_duration_seconds_count
Prometheus metric. - Add
etcd_snap_db_save_total_duration_seconds_bucket
Prometheus metric. - Add
etcd_network_snapshot_send_success
Prometheus metric. - Add
etcd_network_snapshot_send_failures
Prometheus metric. - Add
etcd_network_snapshot_send_total_duration_seconds
Prometheus metric. - Add
etcd_network_snapshot_receive_success
Prometheus metric. - Add
etcd_network_snapshot_receive_failures
Prometheus metric. - Add
etcd_network_snapshot_receive_total_duration_seconds
Prometheus metric. - Add
etcd_network_active_peers
Prometheus metric.- Let's say
"7339c4e5e833c029"
server/metrics
returnsetcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1
andetcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 1
. This indicates that the local node"7339c4e5e833c029"
currently has two active remote peers"729934363faa4a24"
and"b548c2511513015"
in a 3-node cluster. If the node"b548c2511513015"
is down, the local node"7339c4e5e833c029"
will showetcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1
andetcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 0
.
- Let's say
- Add
etcd_network_disconnected_peers_total
Prometheus metric.- If a remote peer
"b548c2511513015"
is down, the local node"7339c4e5e833c029"
server/metrics
would returnetcd_network_disconnected_peers_total{Local="7339c4e5e833c029",Remote="b548c2511513015"} 1
, while active peer metrics will showetcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1
andetcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 0
.
- If a remote peer
- Add
etcd_network_server_stream_failures_total
Prometheus metric.- e.g.
etcd_network_server_stream_failures_total{API="lease-keepalive",Type="receive"} 1
- e.g.
etcd_network_server_stream_failures_total{API="watch",Type="receive"} 1
- e.g.
- Improve
etcd_network_peer_round_trip_time_seconds
Prometheus metric to track leader heartbeats.- Previously, it only samples the TCP connection for snapshot messages.
- Increase
etcd_network_peer_round_trip_time_seconds
Prometheus metric histogram upper-bound.- Previously, highest bucket only collects requests taking 0.8192 seconds or more.
- Now, highest buckets collect 0.8192 seconds, 1.6384 seconds, and 3.2768 seconds or more.
- Add
etcd_server_is_leader
Prometheus metric. - Add
etcd_server_id
Prometheus metric. - Add
etcd_cluster_version
Prometheus metric. - Add
etcd_server_version
Prometheus metric.- To replace Kubernetes
etcd-version-monitor
.
- To replace Kubernetes
- Add
etcd_server_go_version
Prometheus metric. - Add
etcd_server_health_success
Prometheus metric. - Add
etcd_server_health_failures
Prometheus metric. - Add
etcd_server_read_indexes_failed_total
Prometheus metric. - Add
etcd_server_heartbeat_send_failures_total
Prometheus metric. - Add
etcd_server_slow_apply_total
Prometheus metric. - Add
etcd_server_slow_read_indexes_total
Prometheus metric. - Add
etcd_server_quota_backend_bytes
Prometheus metric.- Use it with
etcd_mvcc_db_total_size_in_bytes
andetcd_mvcc_db_total_size_in_use_in_bytes
. etcd_server_quota_backend_bytes 2.147483648e+09
means current quota size is 2 GB.etcd_mvcc_db_total_size_in_bytes 20480
means current physically allocated DB size is 20 KB.etcd_mvcc_db_total_size_in_use_in_bytes 16384
means future DB size if defragment operation is complete.etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes
is the number of bytes that can be saved on disk with defragment operation.
- Use it with
- Add
etcd_mvcc_db_total_size_in_bytes
Prometheus metric.- Renamed from
etcd_debugging_mvcc_db_total_size_in_bytes
.
- Renamed from
- Add
etcd_mvcc_db_total_size_in_use_in_bytes
Prometheus metric.- Use it with
etcd_mvcc_db_total_size_in_bytes
andetcd_mvcc_db_total_size_in_use_in_bytes
. etcd_server_quota_backend_bytes 2.147483648e+09
means current quota size is 2 GB.etcd_mvcc_db_total_size_in_bytes 20480
means current physically allocated DB size is 20 KB.etcd_mvcc_db_total_size_in_use_in_bytes 16384
means future DB size if defragment operation is complete.etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes
is the number of bytes that can be saved on disk with defragment operation.
- Use it with
- Add
etcd_snap_fsync_duration_seconds
Prometheus metric. - Add
etcd_disk_backend_defrag_duration_seconds
Prometheus metric. - Add
etcd_mvcc_hash_duration_seconds
Prometheus metric. - Add
etcd_mvcc_hash_rev_duration_seconds
Prometheus metric. - Add
etcd_debugging_disk_backend_commit_rebalance_duration_seconds
Prometheus metric. - Add
etcd_debugging_disk_backend_commit_spill_duration_seconds
Prometheus metric. - Add
etcd_debugging_disk_backend_commit_write_duration_seconds
Prometheus metric. - Add
etcd_debugging_lease_granted_total
Prometheus metric. - Add
etcd_debugging_lease_revoked_total
Prometheus metric. - Add
etcd_debugging_lease_renewed_total
Prometheus metric. - Add
etcd_debugging_lease_ttl_total
Prometheus metric. - Increase
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds
Prometheus metric histogram upper-bound.- Previously, highest bucket only collects requests taking 1.024 seconds or more.
- Now, highest buckets collect 1.024 seconds, 2.048 seconds, and 4.096 seconds or more.
- Fix missing
etcd_network_peer_sent_failures_total
Prometheus metric count. - Fix
etcd_debugging_server_lease_expired_total
Prometheus metric. - Fix race conditions in v2 server stat collecting.
- Change gRPC proxy to expose etcd server endpoint /metrics.
- The metrics that were exposed via the proxy were not etcd server members but instead the proxy itself.
- Fix bug where db_compaction_total_duration_milliseconds metric incorrectly measured duration as 0.
See security doc for more details.
- Support TLS cipher suite whitelisting.
- To block weak cipher suites.
- TLS handshake fails when client hello is requested with invalid cipher suites.
- Add
etcd --cipher-suites
flag. - If empty, Go auto-populates the list.
- Add
etcd --host-whitelist
flag,etcdserver.Config.HostWhitelist
, andembed.Config.HostWhitelist
, to prevent "DNS Rebinding" attack.- Any website can simply create an authorized DNS name, and direct DNS to
"localhost"
(or any other address). Then, all HTTP endpoints of etcd server listening on"localhost"
becomes accessible, thus vulnerable to DNS rebinding attacks (CVE-2018-5702). - Client origin enforce policy works as follow:
- If client connection is secure via HTTPS, allow any hostnames..
- If client connection is not secure and
"HostWhitelist"
is not empty, only allow HTTP requests whose Host field is listed in whitelist.
- By default,
"HostWhitelist"
is"*"
, which means insecure server allows all client HTTP requests. - Note that the client origin policy is enforced whether authentication is enabled or not, for tighter controls.
- When specifying hostnames, loopback addresses are not added automatically. To allow loopback interfaces, add them to whitelist manually (e.g.
"localhost"
,"127.0.0.1"
, etc.). - e.g.
etcd --host-whitelist example.com
, then the server will reject all HTTP requests whose Host field is notexample.com
(also rejects requests to"localhost"
).
- Any website can simply create an authorized DNS name, and direct DNS to
- Support
etcd --cors
in v3 HTTP requests (gRPC gateway). - Support
ttl
field foretcd
Authentication JWT token.- e.g.
etcd --auth-token jwt,pub-key=<pub key path>,priv-key=<priv key path>,sign-method=<sign method>,ttl=5m
.
- e.g.
- Allow empty token provider in
etcdserver.ServerConfig.AuthToken
. - Fix TLS reload when certificate SAN field only includes IP addresses but no domain names.
- In Go, server calls
(*tls.Config).GetCertificate
for TLS reload if and only if server's(*tls.Config).Certificates
field is not empty, or(*tls.ClientHelloInfo).ServerName
is not empty with a valid SNI from the client. Previously, etcd always populates(*tls.Config).Certificates
on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger(*tls.Config).GetCertificate
to reload TLS assets. - However, a certificate whose SAN field does not include any domain names but only IP addresses would request
*tls.ClientHelloInfo
with an emptyServerName
field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online. - Now,
(*tls.Config).Certificates
is created empty on initial TLS client handshake, first to trigger(*tls.Config).GetCertificate
, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).
- In Go, server calls
- Add
rpctypes.ErrLeaderChanged
.- Now linearizable requests with read index would fail fast when there is a leadership change, instead of waiting until context timeout.
- Add
etcd --initial-election-tick-advance
flag to configure initial election tick fast-forward.- By default,
etcd --initial-election-tick-advance=true
, then local member fast-forwards election ticks to speed up "initial" leader election trigger. - This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting
etcd --initial-election-tick-advance=false
. - Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring
etcd --initial-election-tick-advance
at the cost of slow initial bootstrap. - If single-node, it advances ticks regardless.
- Address disruptive rejoining follower node.
- By default,
- Add
etcd --pre-vote
flag to enable to run an additional Raft election phase.- For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.
etcd --pre-vote=false
by default.- v3.5 will enable
etcd --pre-vote=true
by default.
etcd --initial-corrupt-check
flag is now stable (etcd --experimental-initial-corrupt-check
haisbeen deprecated).etcd --initial-corrupt-check=true
by default, to check cluster database hashes before serving client/peer traffic.
etcd --corrupt-check-time
flag is now stable (etcd --experimental-corrupt-check-time
haisbeen deprecated).etcd --corrupt-check-time=12h
by default, to check cluster database hashes for every 12-hour.
etcd --enable-v2v3
flag is now stable.etcd --experimental-enable-v2v3
has been deprecated.- Added more v2v3 integration tests.
etcd --enable-v2=true --enable-v2v3=''
by default, to enable v2 API server that is backed by v2 store.etcd --enable-v2=true --enable-v2v3=/aaa
to enable v2 API server that is backed by v3 storage.etcd --enable-v2=false --enable-v2v3=''
to disable v2 API server.etcd --enable-v2=false --enable-v2v3=/aaa
to disable v2 API server. TODO: error?- Automatically create parent directory if it does not exist (fix issue#9609).
- v4.0 will configure
etcd --enable-v2=true --enable-v2v3=/aaa
to enable v2 API server that is backed by v3 storage.
- Add
etcd --discovery-srv-name
flag to support custom DNS SRV name with discovery.- If not given, etcd queries
_etcd-server-ssl._tcp.[YOUR_HOST]
and_etcd-server._tcp.[YOUR_HOST]
. - If
etcd --discovery-srv-name="foo"
, then query_etcd-server-ssl-foo._tcp.[YOUR_HOST]
and_etcd-server-foo._tcp.[YOUR_HOST]
. - Useful for operating multiple etcd clusters under the same domain.
- If not given, etcd queries
- Support TLS cipher suite whitelisting.
- To block weak cipher suites.
- TLS handshake fails when client hello is requested with invalid cipher suites.
- Add
etcd --cipher-suites
flag. - If empty, Go auto-populates the list.
- Support
etcd --cors
in v3 HTTP requests (gRPC gateway). - Rename
etcd --log-output
toetcd --log-outputs
to support multiple log outputs.etcd --log-output
will be deprecated in v3.5.
- Add
etcd --logger
flag to support structured logger and multiple log outputs in server-side.etcd --logger=capnslog
will be deprecated in v3.5.- Main motivation is to promote automated etcd monitoring, rather than looking back server logs when it starts breaking. Future development will make etcd log as few as possible, and make etcd easier to monitor with metrics and alerts.
etcd --logger=capnslog --log-outputs=default
is the default setting and same as previous etcd server logging format.etcd --logger=zap --log-outputs=default
is not supported whenetcd --logger=zap
.- Instead, use
etcd --logger=zap --log-outputs=stderr
. - Or, use
etcd --logger=zap --log-outputs=systemd/journal
to send logs to the local systemd journal. - Previously, if etcd parent process ID (PPID) is 1 (e.g. run with systemd),
etcd --logger=capnslog --log-outputs=default
redirects server logs to local systemd journal. And if write to journald fails, it writes toos.Stderr
as a fallback. - However, even with PPID 1, it can fail to dial systemd journal (e.g. run embedded etcd with Docker container). Then, every single log write will fail and fall back to
os.Stderr
, which is inefficient. - To avoid this problem, systemd journal logging must be configured manually.
- Instead, use
etcd --logger=zap --log-outputs=stderr
will log server operations in JSON-encoded format and writes logs toos.Stderr
. Use this to override journald log redirects.etcd --logger=zap --log-outputs=stdout
will log server operations in JSON-encoded format and writes logs toos.Stdout
Use this to override journald log redirects.etcd --logger=zap --log-outputs=a.log
will log server operations in JSON-encoded format and writes logs to the specified filea.log
.etcd --logger=zap --log-outputs=a.log,b.log,c.log,stdout
writes server logs to multiple filesa.log
,b.log
andc.log
at the same time and outputs toos.Stderr
, in JSON-encoded format.etcd --logger=zap --log-outputs=/dev/null
will discard all server logs.
- Add
etcd --backend-batch-limit
flag. - Add
etcd --backend-batch-interval
flag. - Fix
mvcc
"unsynced" watcher restore operation.- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes missing events from "unsynced" watchers.
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
- Fix
mvcc
server panic from restore operation.- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
- Now, this server-side panic has been fixed.
- Fix server panic on invalid Election Proclaim/Resign HTTP(S) requests.
- Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
- e.g.
curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}'
,curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'
.
- Fix revision-based compaction retention parsing.
- Previously,
etcd --auto-compaction-mode revision --auto-compaction-retention 1
was translated to revision retention 3600000000000. - Now,
etcd --auto-compaction-mode revision --auto-compaction-retention 1
is correctly parsed as revision retention 1.
- Previously,
- Prevent overflow by large
TTL
values forLease
Grant
.TTL
parameter toGrant
request is unit of second.- Leases with too large
TTL
values exceedingmath.MaxInt64
expire in unexpected ways. - Server now returns
rpctypes.ErrLeaseTTLTooLarge
to client, when the requestedTTL
is larger than 9,000,000,000 seconds (which is >285 years). - Again, etcd
Lease
is meant for short-periodic keepalives or sessions, in the range of seconds or minutes. Not for hours or days!
- Enable etcd server
raft.Config.CheckQuorum
when starting withForceNewCluster
. - Allow non-WAL files in
etcd --wal-dir
directory.- Previously, existing files such as
lost+found
in WAL directory prevent etcd server boot. - Now, WAL directory that contains only
lost+found
or a file that's not suffixed with.wal
is considered non-initialized.
- Previously, existing files such as
- Add
snapshot
package for snapshot restore/save operations (seegodoc.org/github.com/etcd/clientv3/snapshot
for more). - Add
watch_id
field toetcdserverpb.WatchCreateRequest
to allow user-provided watch ID tomvcc
.- Corresponding
watch_id
is returned viaetcdserverpb.WatchResponse
, if any.
- Corresponding
- Add
fragment
field toetcdserverpb.WatchCreateRequest
to request etcd server to split watch events when the total size of events exceedsetcd --max-request-bytes
flag value plus gRPC-overhead 512 bytes.- The default server-side request bytes limit is
embed.DefaultMaxRequestBytes
which is 1.5 MiB plus gRPC-overhead 512 bytes. - If watch response events exceed this server-side request limit and watch request is created with
fragment
fieldtrue
, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit. - Useful when client-side has limited bandwidths.
- For example, watch response contains 10 events, where each event is 1 MiB. And server
etcd --max-request-bytes
flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client. - For example, watch response contains 5 events, where each event is 2 MiB. And server
etcd --max-request-bytes
flag value is 1 MiB andclientv3.Config.MaxCallRecvMsgSize
is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with"code = ResourceExhausted desc = grpc: received message larger than max (...)"
. - Client must implement fragmented watch event merge (which
clientv3
does in etcd v3.4).
- The default server-side request bytes limit is
- Add
raftAppliedIndex
field toetcdserverpb.StatusResponse
for current Raft applied index. - Add
errors
field toetcdserverpb.StatusResponse
for server-side error.- e.g.
"etcdserver: no leader", "NOSPACE", "CORRUPT"
- e.g.
- Add
dbSizeInUse
field toetcdserverpb.StatusResponse
for actual DB size after compaction. - Add
WatchRequest.WatchProgressRequest
.- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as
WithProgressNotify
that can be triggered manually.
Note: v3.5 will deprecate etcd --log-package-levels
flag for capnslog
; etcd --logger=zap --log-outputs=stderr
will the default. v3.5 will deprecate [CLIENT-URL]/config/local/log
endpoint.
- Add
embed.Config.CipherSuites
to specify a list of supported cipher suites for TLS handshake between client/server and peers.- If empty, Go auto-populates the list.
- Both
embed.Config.ClientTLSInfo.CipherSuites
andembed.Config.CipherSuites
cannot be non-empty at the same time. - If not empty, specify either
embed.Config.ClientTLSInfo.CipherSuites
orembed.Config.CipherSuites
.
- Add
embed.Config.InitialElectionTickAdvance
to enable/disable initial election tick fast-forward.embed.NewConfig()
would return*embed.Config
withInitialElectionTickAdvance
as true by default.
- Define
embed.CompactorModePeriodic
forcompactor.ModePeriodic
. - Define
embed.CompactorModeRevision
forcompactor.ModeRevision
. - Change
embed.Config.CorsInfo
in*cors.CORSInfo
type toembed.Config.CORS
inmap[string]struct{}
type. - Remove
embed.Config.SetupLogging
.- Now logger is set up automatically based on
embed.Config.Logger
,embed.Config.LogOutputs
,embed.Config.Debug
fields.
- Now logger is set up automatically based on
- Add
embed.Config.Logger
to support structured loggerzap
in server-side. - Rename
embed.Config.SnapCount
field toembed.Config.SnapshotCount
, to be consistent with the flag nameetcd --snapshot-count
. - Rename
embed.Config.LogOutput
toembed.Config.LogOutputs
to support multiple log outputs. - Change
embed.Config.LogOutputs
type fromstring
to[]string
to support multiple log outputs. - Add
embed.Config.BackendBatchLimit
field. - Add
embed.Config.BackendBatchInterval
field.
- Add
CLUSTER_DEBUG
to enable test cluster logging.- Deprecated
capnslog
in integration tests.
- Deprecated
- Client may receive
rpctypes.ErrLeaderChanged
from server.- Now linearizable requests with read index would fail fast when there is a leadership change, instead of waiting until context timeout.
- Add
WithFragment
OpOption
to support watch events fragmentation when the total size of events exceedsetcd --max-request-bytes
flag value plus gRPC-overhead 512 bytes.- Watch fragmentation is disabled by default.
- The default server-side request bytes limit is
embed.DefaultMaxRequestBytes
which is 1.5 MiB plus gRPC-overhead 512 bytes. - If watch response events exceed this server-side request limit and watch request is created with
fragment
fieldtrue
, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit. - Useful when client-side has limited bandwidths.
- For example, watch response contains 10 events, where each event is 1 MiB. And server
etcd --max-request-bytes
flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client. - For example, watch response contains 5 events, where each event is 2 MiB. And server
etcd --max-request-bytes
flag value is 1 MiB andclientv3.Config.MaxCallRecvMsgSize
is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with"code = ResourceExhausted desc = grpc: received message larger than max (...)"
.
- Add
Watcher.RequestProgress
method.- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as
WithProgressNotify
that can be triggered manually.
- Fix lease keepalive interval updates when response queue is full.
- If
<-chan *clientv3LeaseKeepAliveResponse
fromclientv3.Lease.KeepAlive
was never consumed or channel is full, client was sending keepalive request every 500ms instead of expected rate of every "TTL / 3" duration.
- If
- Change snapshot file permissions: On Linux, the snapshot file changes from readable by all (mode 0644) to readable by the user only (mode 0600).
- Client may choose to send keepalive pings to server using
PermitWithoutStream
.- By setting
PermitWithoutStream
to true, client can send keepalive pings to server without any active streams(RPCs). In other words, it allows sending keepalive pings with unary or simple RPC calls. PermitWithoutStream
is set to false by default.
- By setting
- Fix logic on release lock key if cancelled in
clientv3/concurrency
package. - Fix
(*Client).Endpoints()
method race condition.
- Make
ETCDCTL_API=3 etcdctl
default.- Now,
etcdctl set foo bar
must beETCDCTL_API=2 etcdctl set foo bar
. - Now,
ETCDCTL_API=3 etcdctl put foo bar
could be justetcdctl put foo bar
.
- Now,
- Add
etcdctl --password
flag.- To support
:
character in user name. - e.g.
etcdctl --user user --password password get foo
- To support
- Add
etcdctl user add --new-user-password
flag. - Add
etcdctl check datascale
command. - Add
etcdctl check datascale --auto-compact, --auto-defrag
flags. - Add
etcdctl check perf --auto-compact, --auto-defrag
flags. - Add
etcdctl defrag --cluster
flag. - Add "raft applied index" field to
endpoint status
. - Add "errors" field to
endpoint status
. - Add
etcdctl endpoint health --write-out
support. - Fix
etcdctl watch [key] [range_end] -- [exec-command…]
parsing.- Previously,
ETCDCTL_API=3 etcdctl watch foo -- echo watch event received
panicked.
- Previously,
- Fix
etcdctl move-leader
command for TLS-enabled endpoints. - Add
progress
command toetcdctl watch --interactive
.- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as
WithProgressNotify
that can be triggered manually.
- Add timeout to
etcdctl snapshot save
.- User can specify timeout of
etcdctl snapshot save
command using flag--command-timeout
. - Fix etcdctl to strip out insecure endpoints from DNS SRV records when using discovery
- User can specify timeout of
- Fix etcd server panic from restore operation.
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
- Especially, gRPC proxy was affected, since it detects a leader loss with a key
"proxy-namespace__lostleader"
and a watch revision"int64(math.MaxInt64 - 2)"
. - Now, this server-side panic has been fixed.
- Fix memory leak in cache layer.
- Change gRPC proxy to expose etcd server endpoint /metrics.
- The metrics that were exposed via the proxy were not etcd server members but instead the proxy itself.
- Replace gRPC gateway endpoint
/v3beta
with/v3
.- Deprecated
/v3alpha
. - To deprecate
/v3beta
in v3.5. - In v3.4,
curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
still works as a fallback tocurl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
, butcurl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
won't work in v3.5. Usecurl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
instead.
- Deprecated
- Add API endpoints
/{v3beta,v3}/lease/leases, /{v3beta,v3}/lease/revoke, /{v3beta,v3}/lease/timetolive
. - Support
etcd --cors
in v3 HTTP requests (gRPC gateway).
- Fix deadlock during PreVote migration process.
- Add
raft.ErrProposalDropped
.- Now
(r *raft) Step
returnsraft.ErrProposalDropped
if a proposal has been ignored. - e.g. a node is removed from cluster, or
raftpb.MsgProp
arrives at current leader while there is an ongoing leadership transfer.
- Now
- Improve Raft
becomeLeader
andstepLeader
by keeping track of latestpb.EntryConfChange
index.- Previously record
pendingConf
boolean field scanning the entire tail of the log, which can delay hearbeat send.
- Previously record
- Fix missing learner nodes on
(n *node) ApplyConfChange
. - Add
raft.Config.MaxUncommittedEntriesSize
to limit the total size of the uncommitted entries in bytes.- Once exceeded, raft returns
raft.ErrProposalDropped
error. - Prevent unbounded Raft log growth.
- There was a bug in PR#10167 but fixed via PR#10199.
- Once exceeded, raft returns
- Add
raft.Ready.CommittedEntries
pagination usingraft.Config.MaxSizePerMsg
.- This prevents out-of-memory errors if the raft log has become very large and commits all at once.
- Fix correctness bug in CommittedEntries pagination.
- Optimize message send flow control.
- Leader now sends more append entries if it has more non-empty entries to send after updating flow control information.
- Now, Raft allows multiple in-flight append messages.
- Optimize memory allocation when boxing slice in
maybeCommit
.- By boxing a heap-allocated slice header instead of the slice header on the stack, we can avoid an allocation when passing through the sort.Interface interface.
- Avoid memory allocation in Raft entry
String
method. - Avoid multiple memory allocations when merging stable and unstable log.
- Extract progress tracking into own component.
- Add
etcd-dump-logs --entry-type
flag to support WAL log filtering by entry type. - Add
etcd-dump-logs --stream-decoder
flag to support custom decoder.