etcd 集群维护

2019/03/25 etcd

节点维护

etcd 有3个节点,其中 一台节点 服务器硬盘损坏, 重新添加新节点过程

  1. 在集群中删除旧的 etcd 服务节点

     $ ./etcdctl member remove ea71bc5f701d5cdb
     Removed member ea71bc5f701d5cdb from cluster
    
  2. 删除新节点中数据目录

     rm -rf /data/etcd/etcd-server
    
  3. 添加新节点

     $ ./etcdctl member add k8s-m2 https://192.168.8.51:2380
     Added member named k8s-m2 with ID 41e5030442e2e626 to cluster
    	
     ETCD_NAME="k8s-m2"
     ETCD_INITIAL_CLUSTER="k8s-m1=https://192.168.8.50:2380,k8s-m2=https://192.168.8.51:2380,k8s-m3=https://192.168.8.52:2380"
     ETCD_INITIAL_CLUSTER_STATE="existing"
    
  4. 启动新节点

    注意: 这里的initial标记一定要指定为existing,如果为new则会自动生成一个新的member ID,和前面添加节点时生成的ID不一致,故日志中会报节点ID不匹配的错

     /opt/etcd/bin/etcd  \
         --name k8s-m2       \
         --data-dir /data/etcd/etcd-server      \
         --listen-peer-urls https://192.168.8.51:2380      \
         --listen-client-urls https://192.168.8.51:2379,http://127.0.0.1:2379      \
         --quota-backend-bytes 8000000000      \
         --initial-advertise-peer-urls https://192.168.8.51:2380      \
         --advertise-client-urls https://192.168.8.51:2379,http://127.0.0.1:2379      \
         --initial-cluster k8s-m1=https://192.168.8.50:2380,k8s-m2=https://192.168.8.51:2380,k8s-m3=https://192.168.8.52:2380      \
         --ca-file ./certs/ca.pem \
         --cert-file ./certs/etcd-peer.pem \
         --key-file ./certs/etcd-peer-key.pem \
         --client-cert-auth \
         --trusted-ca-file ./certs/ca.pem \
         --peer-ca-file ./certs/ca.pem \
         --peer-cert-file ./certs/etcd-peer.pem \
         --peer-key-file ./certs/etcd-peer-key.pem \
         --peer-client-cert-auth \
         --peer-trusted-ca-file ./certs/ca.pem \
         --initial-cluster-state existing \
         --log-output stdout
    

常用维护命令

说明 命令
集群成员查看 etcdctl member list
删除集群成员 etcdctl member remove b200a8bec19bd22e
$ curl http://127.0.0.1:2379/v2/keys/message -XPUT -d value="set by h104"
{"action":"set","node":{"key":"/message","value":"set by h104","modifiedIndex":32,"createdIndex":32}}

$ curl http://127.0.0.1:2379/v2/keys/message
{"action":"get","node":{"key":"/message","value":"set by h104","modifiedIndex":32,"createdIndex":32}}

$ curl http://127.0.0.1:2379/v2/keys/message -XDELETE -d value="set by h104"
{"action":"delete","node":{"key":"/message","modifiedIndex":31,"createdIndex":30},"prevNode":{"key":"/message","value":"set by h104","modifiedIndex":30,"createdIndex":30}}

错误记录

  1. 新增一个etcd节点报错,提示证书 过期或违反

     etcdserver: could not get cluster response from https://192.168.8.54:2380: Get https://192.168.8.54:2380/members: x509: certificate has expired or is not yet valid
    

    解决:

    发现是该节点时间未同步导致

Search

    Table of Contents