Patroni
Upgrade etcd from 3.2 to 3.3
Processes, checklists, and notes on upgrading etcd from 3.2 to 3.3
In the general case, upgrading from etcd 3.2 to 3.3 can be a zero-downtime, rolling upgrade:
-
one by one, stop the etcd v3.2 processes and replace them with etcd v3.3 processes
-
after running all v3.3 processes, new features in v3.3 are available to the cluster
Before starting an upgrade, read through the rest of this guide to prepare.
Upgrade checklists
NOTE: When migrating from v2 with no v3 data, etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 ETCD_DATA_DIR/member/snap/db
file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. db
file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
NOTE: if you enable auth and use lease(lease ttl is small), it has a high probability to encounter issue that will result in data inconsistency. It is strongly recommended upgrading to 3.2.31+ firstly to fix this problem, and then upgrade to 3.3. In addition, if the user without permission sends a LeaseRevoke
request to the 3.3 node during the upgrade process, it may still cause data corruption, so it is best to ensure that your environment doesn’t exist such abnormal calls before upgrading, see #11691 for detail.
Highlighted breaking changes in 3.3.
Changed value type of etcd --auto-compaction-retention
flag to string
Changed --auto-compaction-retention
flag to accept string values with finer granularity. Now that --auto-compaction-retention
accepts string values, etcd configuration YAML file auto-compaction-retention
field must be changed to string
type. Previously, --config-file etcd.config.yaml
can have auto-compaction-retention: 24
field, now must be auto-compaction-retention: "24"
or auto-compaction-retention: "24h"
. If configured as --auto-compaction-mode periodic --auto-compaction-retention "24h"
, the time duration value for --auto-compaction-retention
flag must be valid for time.ParseDuration
function in Go.
# etcd.config.yaml +auto-compaction-mode: periodic -auto-compaction-retention: 24 +auto-compaction-retention: "24" +# Or +auto-compaction-retention: "24h"
Changed etcdserver.EtcdServer.ServerConfig
to *etcdserver.EtcdServer.ServerConfig
etcdserver.EtcdServer
has changed the type of its member field *etcdserver.ServerConfig
to etcdserver.ServerConfig
. And etcdserver.NewServer
now takes etcdserver.ServerConfig
, instead of *etcdserver.ServerConfig
.
Before and after (e.g. k8s.io/kubernetes/test/e2e_node/services/etcd.go)
import "github.com/coreos/etcd/etcdserver" type EtcdServer struct { *etcdserver.EtcdServer - config *etcdserver.ServerConfig + config etcdserver.ServerConfig } func NewEtcd(dataDir string) *EtcdServer { - config := &etcdserver.ServerConfig{ + config := etcdserver.ServerConfig{ DataDir: dataDir, ... } return &EtcdServer{config: config} } func (e *EtcdServer) Start() error { var err error e.EtcdServer, err = etcdserver.NewServer(e.config) ...
Added embed.Config.LogOutput
struct
Note that this field has been renamed to embed.Config.LogOutputs
in []string
type in v3.4. Please see v3.4 upgrade guide for more details.
Field LogOutput
is added to embed.Config
:
package embed type Config struct { Debug bool `json:"debug"` LogPkgLevels string `json:"log-package-levels"` + LogOutput string `json:"log-output"` ...
Before gRPC server warnings were logged in etcdserver.
WARNING: 2017/11/02 11:35:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {localhost:2379 <nil>} WARNING: 2017/11/02 11:35:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {localhost:2379 <nil>}
From v3.3, gRPC server logs are disabled by default.
Note that embed.Config.SetupLogging
method has been deprecated in v3.4. Please see v3.4 upgrade guide for more details.
import "github.com/coreos/etcd/embed" cfg := &embed.Config{Debug: false} cfg.SetupLogging()
Set embed.Config.Debug
field to true
to enable gRPC server logs.
Changed /health
endpoint response
Previously, [endpoint]:[client-port]/health
returned manually marshaled JSON value. 3.3 now defines etcdhttp.Health
struct.
Note that in v3.3.0-rc.0, v3.3.0-rc.1, and v3.3.0-rc.2, etcdhttp.Health
has boolean type "health"
and "errors"
fields. For backward compatibilities, we reverted "health"
field to string
type and removed "errors"
field. Further health information will be provided in separate APIs.
$ curl \http://localhost:2379/health {"health":"true"}
Changed gRPC gateway HTTP endpoints (replaced /v3alpha
with /v3beta
)
Before
curl -L \http://localhost:2379/v3alpha/kv/put \ -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
After
curl -L \http://localhost:2379/v3beta/kv/put \ -X POST -d '{"key": "Zm9v", "value": "YmFy"}'
Requests to /v3alpha
endpoints will redirect to /v3beta
, and /v3alpha
will be removed in 3.4 release.
Changed maximum request size limits
3.3 now allows custom request size limits for both server and client side. In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB.
Server-side request limits can be configured with --max-request-bytes
flag:
# limits request size to 1.5 KiB etcd --max-request-bytes 1536 # client writes exceeding 1.5 KiB will be rejected etcdctl put foo [LARGE VALUE...] # etcdserver: request is too large
Or configure embed.Config.MaxRequestBytes
field:
import "github.com/coreos/etcd/embed" import "github.com/coreos/etcd/etcdserver/api/v3rpc/rpctypes" // limit requests to 5 MiB cfg := embed.NewConfig() cfg.MaxRequestBytes = 5 * 1024 * 1024 // client writes exceeding 5 MiB will be rejected _, err := cli.Put(ctx, "foo", [LARGE VALUE...]) err == rpctypes.ErrRequestTooLarge
If not specified, server-side limit defaults to 1.5 MiB.
Client-side request limits must be configured based on server-side limits.
# limits request size to 1 MiB etcd --max-request-bytes 1048576
import "github.com/coreos/etcd/clientv3" cli, _ := clientv3.New(clientv3.Config{ Endpoints: []string{"127.0.0.1:2379"}, MaxCallSendMsgSize: 2 * 1024 * 1024, MaxCallRecvMsgSize: 3 * 1024 * 1024, }) // client writes exceeding "--max-request-bytes" will be rejected from etcd server _, err := cli.Put(ctx, "foo", strings.Repeat("a", 1*1024*1024+5)) err == rpctypes.ErrRequestTooLarge // client writes exceeding "MaxCallSendMsgSize" will be rejected from client-side _, err = cli.Put(ctx, "foo", strings.Repeat("a", 5*1024*1024)) err.Error() == "rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (5242890 vs. 2097152)" // some writes under limits for i := range []int{0,1,2,3,4} { _, err = cli.Put(ctx, fmt.Sprintf("foo%d", i), strings.Repeat("a", 1*1024*1024-500)) if err != nil { panic(err) } } // client reads exceeding "MaxCallRecvMsgSize" will be rejected from client-side _, err = cli.Get(ctx, "foo", clientv3.WithPrefix()) err.Error() == "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5240509 vs. 3145728)"
If not specified, client-side send limit defaults to 2 MiB (1.5 MiB + gRPC overhead bytes) and receive limit to math.MaxInt32
. Please see clientv3 godoc for more detail.
Changed raw gRPC client wrapper function signatures
3.3 changes the function signatures of clientv3
gRPC client wrapper. This change was needed to support custom grpc.CallOption
on message size limits.
Before and after
-func NewKVFromKVClient(remote pb.KVClient) KV { +func NewKVFromKVClient(remote pb.KVClient, c *Client) KV { -func NewClusterFromClusterClient(remote pb.ClusterClient) Cluster { +func NewClusterFromClusterClient(remote pb.ClusterClient, c *Client) Cluster { -func NewLeaseFromLeaseClient(remote pb.LeaseClient, keepAliveTimeout time.Duration) Lease { +func NewLeaseFromLeaseClient(remote pb.LeaseClient, c *Client, keepAliveTimeout time.Duration) Lease { -func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient) Maintenance { +func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient, c *Client) Maintenance { -func NewWatchFromWatchClient(wc pb.WatchClient) Watcher { +func NewWatchFromWatchClient(wc pb.WatchClient, c *Client) Watcher {
Changed clientv3 Snapshot
API error type
Previously, clientv3 Snapshot
API returned raw [grpc/*status.statusError
] type error. v3.3 now translates those errors to corresponding public error types, to be consistent with other APIs.
Before
import "context" // reading snapshot with canceled context should error out ctx, cancel := context.WithCancel(context.Background()) rc, _ := cli.Snapshot(ctx) cancel() _, err := io.Copy(f, rc) err.Error() == "rpc error: code = Canceled desc = context canceled" // reading snapshot with deadline exceeded should error out ctx, cancel = context.WithTimeout(context.Background(), time.Second) defer cancel() rc, _ = cli.Snapshot(ctx) time.Sleep(2 * time.Second) _, err = io.Copy(f, rc) err.Error() == "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
After
import "context" // reading snapshot with canceled context should error out ctx, cancel := context.WithCancel(context.Background()) rc, _ := cli.Snapshot(ctx) cancel() _, err := io.Copy(f, rc) err == context.Canceled // reading snapshot with deadline exceeded should error out ctx, cancel = context.WithTimeout(context.Background(), time.Second) defer cancel() rc, _ = cli.Snapshot(ctx) time.Sleep(2 * time.Second) _, err = io.Copy(f, rc) err == context.DeadlineExceeded
Changed etcdctl lease timetolive
command output
Previously, lease timetolive LEASE_ID
command on expired lease prints -1s
for remaining seconds. 3.3 now outputs clearer messages.
Before
lease 2d8257079fa1bc0c granted with TTL(0s), remaining(-1s)
After
lease 2d8257079fa1bc0c already expired
Changed golang.org/x/net/context
imports
clientv3
has deprecated golang.org/x/net/context
. If a project vendors golang.org/x/net/context
in other code (e.g. etcd generated protocol buffer code) and imports github.com/coreos/etcd/clientv3
, it requires Go 1.9+ to compile.
Before
import "golang.org/x/net/context" cli.Put(context.Background(), "f", "v")
After
import "context" cli.Put(context.Background(), "f", "v")
Changed gRPC dependency
3.3 now requires grpc/grpc-go v1.7.5
.
Deprecated grpclog.Logger
grpclog.Logger
has been deprecated in favor of grpclog.LoggerV2
. clientv3.Logger
is now grpclog.LoggerV2
.
Before
import "github.com/coreos/etcd/clientv3" clientv3.SetLogger(log.New(os.Stderr, "grpc: ", 0))
After
import "github.com/coreos/etcd/clientv3" import "google.golang.org/grpc/grpclog" clientv3.SetLogger(grpclog.NewLoggerV2(os.Stderr, os.Stderr, os.Stderr)) // log.New above cannot be used (not implement grpclog.LoggerV2 interface)
Deprecated grpc.ErrClientConnTimeout
Previously, grpc.ErrClientConnTimeout
error is returned on client dial time-outs. 3.3 instead returns context.DeadlineExceeded
(see #8504).
Before
// expect dial time-out on ipv4 blackhole _, err := clientv3.New(clientv3.Config{ Endpoints: []string{"http://254.0.0.1:12345"}, DialTimeout: 2 * time.Second }) if err == grpc.ErrClientConnTimeout { // handle errors }
After
_, err := clientv3.New(clientv3.Config{ Endpoints: []string{"http://254.0.0.1:12345"}, DialTimeout: 2 * time.Second }) if err == context.DeadlineExceeded { // handle errors }
Changed official container registry
etcd now uses gcr.io/etcd-development/etcd
as a primary container registry, and quay.io/coreos/etcd
as secondary.
Before
docker pull quay.io/coreos/etcd:v3.2.5
After
docker pull gcr.io/etcd-development/etcd:v3.3.0
Upgrades to >= v3.3.14
v3.3.14 had to include some features from 3.4, while trying to minimize the difference between client balancer implementation. This release fixes “kube-apiserver 1.13.x refuses to work when first etcd-server is not available” (kubernetes#72102).
grpc.ErrClientConnClosing
has been deprecated in gRPC >= 1.10.
import ( + "go.etcd.io/etcd/clientv3" "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" ) _, err := kvc.Get(ctx, "a") -if err == grpc.ErrClientConnClosing { +if clientv3.IsConnCanceled(err) { // or +s, ok := status.FromError(err) +if ok { + if s.Code() == codes.Canceled
The new client balancer uses an asynchronous resolver to pass endpoints to the gRPC dial function. As a result, v3.3.14 or later requires grpc.WithBlock
dial option to wait until the underlying connection is up.
import ( "time" "go.etcd.io/etcd/clientv3" + "google.golang.org/grpc" ) +// "grpc.WithBlock()" to block until the underlying connection is up ccfg := clientv3.Config{ Endpoints: []string{"localhost:2379"}, DialTimeout: time.Second, + DialOptions: []grpc.DialOption{grpc.WithBlock()}, DialKeepAliveTime: time.Second, DialKeepAliveTimeout: 500 * time.Millisecond, }
Please see CHANGELOG for a full list of changes.
Server upgrade checklists
Upgrade requirements
To upgrade an existing etcd deployment to 3.3, the running cluster must be 3.2 or greater. If it’s before 3.2, please upgrade to 3.2 before upgrading to 3.3.
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the etcdctl endpoint health
command before proceeding.
Preparation
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
Before beginning, backup the etcd data. Should something go wrong with the upgrade, it is possible to use this backup to downgrade back to existing etcd version. Please note that the snapshot
command only backs up the v3 data. For v2 data, see backing up v2 datastore.
Mixed versions
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.3. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
Limitations
Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation.
If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the etcd team before upgrading, and we’ll be happy to provide advice on the procedure.
Downgrade
If all members have been upgraded to v3.3, the cluster will be upgraded to v3.3, and downgrade from this completed state is not possible. If any single member is still v3.2, however, the cluster and its operations remains “v3.2”, and it is possible from this mixed cluster state to return to using a v3.2 etcd binary on all members.
Please backup the data directory of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
Upgrade procedure
This example shows how to upgrade a 3-member v3.2 etcd cluster running on a local machine.
1. Check upgrade requirements
Is the cluster healthy and running v3.2.x?
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms $ curl \http://localhost:2379/version {"etcdserver":"3.2.7","etcdcluster":"3.2.0"}
2. Stop the existing etcd process
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership. 14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4 14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4 14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4 14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4 14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4 14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4 14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections 14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4 14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4 14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader) 14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader) 14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused) 14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive 14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
It’s a good idea at this point to backup the etcd data to provide a downgrade path should any problems occur:
$ etcdctl snapshot save backup.db
3. Drop-in etcd v3.3 binary and start the new etcd process
The new v3.3 etcd will publish its information to the cluster:
14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[\http://localhost:2379]} to cluster a9ededbffcb1b1f1
Verify that each member, and then the entire cluster, becomes healthy with the new v3.3 etcd binary:
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.3:
14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.3.0 14:15:21.073110 W | etcdserver: the local etcd version 3.2.7 is not up-to-date 14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.3.0 14:15:21.073157 W | etcdserver: the local etcd version 3.2.7 is not up-to-date 14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.3.0
5. Finish
When all members are upgraded, the cluster will report upgrading to 3.3 successfully:
14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.2 to 3.3 14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.3
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
Last modified August 19, 2023: etcd-io/website#479 Use new and better canonical link to Google Groups (cd8b01f)