Kubernetes部署集群搭建教程三Master+三Work架构+ Kube-vip高可用

服务器安装的Ubuntu 22.04/24.04、k8s 安装的官方当前v1.35.3版本

一、规划

角色 IP hostname
Master1 192.167.8.20 k8s-master-01
Master2 192.167.8.21 k8s-master-02
Master3 192.167.8.22 k8s-master-03
Worker1 192.167.8.23 k8s-node-01
Worker2 192.167.8.24 k8s-node-02
Worker3 192.167.8.25 k8s-node-03
kube-vip 192.167.8.26 API VIP

注意:所有节点必须在同一二层网络,kube-vip ARP 模式依赖二层广播;kube-vip 官方说明它可为控制面提供虚拟 IP,ARP 模式通过广播告诉网络 VIP 当前在哪个节点上。

磁盘规划

节点类型 系统盘建议 数据盘建议 原因
Master / 控制节点 SSD 必须/强烈建议 可少量 SSD etcd、apiserver、containerd 都有频繁小 IO,机械盘容易卡顿
Worker / 工作节点 SSD 推荐 看业务决定 跑Pod、拉镜像、写日志、临时文件,SSD 体验明显好
存储节点 / 数据库节点 SSD 优先 根据数据类型决定 MySQL、PostgreSQL、Redis、MinIO、日志检索等建议 SSD
普通静态文件/备份/NFS 可用机械盘/ZFS 机械盘可接受 适合容量型存储,不适合高频随机 IO
下面假设网卡名是 ens18 你先在每台机器确认:
1
ip -br addr

如果不是 ens18,后面所有 INTERFACE=ens18 改成你的实际网卡,比如 eth0。

二、所有 6 台节点都执行

  1. 设置 hostname,分别执行对应命令。
1
hostnamectl set-hostname k8s-master-01

其他节点改成:

1
2
3
4
5
hostnamectl set-hostname k8s-master-02
hostnamectl set-hostname k8s-master-03
hostnamectl set-hostname k8s-node-01
hostnamectl set-hostname k8s-node-02
hostnamectl set-hostname k8s-node-03
  1. 写 hosts

所有节点都执行:

1
2
3
4
5
6
7
8
9
cat >> /etc/hosts <<'EOF'
192.167.8.20 k8s-master-01
192.167.8.21 k8s-master-02
192.167.8.22 k8s-master-03
192.167.8.23 k8s-node-01
192.167.8.24 k8s-node-02
192.167.8.25 k8s-node-03
192.167.8.26 k8s-vip
EOF
  1. 关闭 swap,注释 /etc/fstab 里的 swap, 彻底关闭swap,防止重启恢复
1
2
3
4
5
6
7
8
9
10
11
12
13
swapoff -a
sed -i.bak '/ swap / s/^/#/' /etc/fstab

#检查,Swap 应该是 0B。
free -h

cat /etc/fstab
cp /etc/fstab /etc/fstab.bak

sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab
cat /etc/fstab
#变成注释 swap.img
#/swap.img none swap sw 0 0

ps:Ubuntu 24 常见坑:cloud-init 自动重新创建 swap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
ls /etc/cloud/cloud.cfg.d/

#如果存在 cloud-init,建议加禁用配置:
cat >/etc/cloud/cloud.cfg.d/99-disable-swap.cfg <<'EOF'
swap:
filename: ""
size: 0
EOF
#如果系统有 swapfile,还建议删掉
ls -lh /swap.img
ls -lh /swapfile
#如果存在
rm -f /swap.img
rm -f /swapfile
  1. 加载内核模块
1
2
3
4
5
6
7
cat >/etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter
  1. 配置内核参数
1
2
3
4
5
6
7
cat >/etc/sysctl.d/99-kubernetes-cri.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF

sysctl --system
  1. 安装基础工具
1
2
3
apt update
apt install -y apt-transport-https ca-certificates curl gpg chrony nfs-common
systemctl enable --now chrony

时间同步

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apt install chrony -y

sed -i 's|pool.ntp.org|ntp.aliyun.com|g' /etc/chrony/chrony.conf

systemctl restart chrony
chronyc sources

nano /etc/chrony/chrony.conf

#注释默认 pool,添加
server ntp.aliyun.com iburst
server time1.aliyun.com iburst
server cn.ntp.org.cn iburst

systemctl restart chrony
timedatectl set-timezone Asia/Shanghai
# 验证时间
chronyc tracking
date

三、所有节点安装 containerd

1
2
3
apt install -y containerd
mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml

修改 cgroup 为 systemd:

1
2
3
4
5
6
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

#重启
systemctl enable --now containerd
systemctl restart containerd
systemctl status containerd --no-pager

四、所有节点安装 kubelet/kubeadm/kubectl v1.35.3

Kubernetes 新仓库是按小版本分仓库的,v1.35 要使用 core:/stable:/v1.35,官方也说明旧的 apt.kubernetes.io 已废弃冻结

1
2
3
4
5
6
7
8
9
10
11
mkdir -p /etc/apt/keyrings

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key \
| gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

cat >/etc/apt/sources.list.d/kubernetes.list <<'EOF'
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /
EOF

apt update
apt-cache madison kubeadm | head

确认能看到 1.35.3 后安装:

1
2
3
apt install -y kubelet=1.35.3-* kubeadm=1.35.3-* kubectl=1.35.3-*
apt-mark hold kubelet kubeadm kubectl
systemctl enable kubelet

如果这里提示找不到 1.35.3-*,先执行:

1
apt-cache madison kubeadm

把实际完整版本号贴出来,再按完整版本安装,如

1
2
3
4
5
6
root@template:~# apt-cache madison kubeadm | head
kubeadm | 1.35.4-1.1 | https://pkgs.k8s.io/core:/stable:/v1.35/deb Packages
kubeadm | 1.35.3-1.1 | https://pkgs.k8s.io/core:/stable:/v1.35/deb Packages
kubeadm | 1.35.2-1.1 | https://pkgs.k8s.io/core:/stable:/v1.35/deb Packages
kubeadm | 1.35.1-1.1 | https://pkgs.k8s.io/core:/stable:/v1.35/deb Packages
kubeadm | 1.35.0-1.1 | https://pkgs.k8s.io/core:/stable:/v1.35/deb Packages

6 台节点都执行:

1
2
3
4
apt install -y \
kubelet=1.35.3-1.1 \
kubeadm=1.35.3-1.1 \
kubectl=1.35.3-1.1

安装完成后检查版本:

1
2
3
kubelet --version
kubeadm version
kubectl version --client

五、锁定 Kubernetes 版本

6 台节点都执行:

1
2
3
4
5
6
7
8
9
apt-mark hold kubelet kubeadm kubectl

#检查是否锁定成功:
apt-mark showhold

#正常应该看到:
kubeadm
kubectl
kubelet

以后执行:apt update 或者 apt upgrade不会自动把 Kubernetes 升级

六、配置 crictl,避免后面一直报警

6 台节点都执行:

1
2
3
4
5
6
cat >/etc/crictl.yaml <<'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

如果能输出 containerd 信息,就ok

七、启动 kubelet

6 台节点都执行:

1
2
systemctl enable kubelet
systemctl restart kubelet

这时 kubelet 可能是异常状态,正常,因为还没有 kubeadm init/join。

查看状态

1
systemctl status kubelet --no-pager

只要不是命令找不到即可。

八、确认初始化前基础状态

6 台节点都执行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#确认Swap是0
free -h

#检查 containerd:
systemctl status containerd --no-pager

#检查 cgroup:
grep SystemdCgroup /etc/containerd/config.toml
#必须返回
SystemdCgroup = true

#检查网卡名:
ip -br addr

#后面 kube-vip 要用真实网卡名,比如:
ens18
eth0
enp1s0

九、只在 master-01 配置 kube-vip

我的master-01是

1
2
3
4
192.167.8.20
hostname: k8s-master-01
VIP: 192.167.8.26
网卡名:ens18

下面以 ens18 举例:

1
2
3
export VIP=192.167.8.26
export INTERFACE=ens18
export KVVERSION=v0.8.9

生成 kube-vip 静态 Pod:

1
2
3
4
5
6
7
8
9
10
11
12
mkdir -p /etc/kubernetes/manifests

ctr image pull ghcr.io/kube-vip/kube-vip:$KVVERSION

ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip manifest pod \
--interface $INTERFACE \
--address $VIP \
--controlplane \
--services \
--arp \
--leaderElection \
> /etc/kubernetes/manifests/kube-vip.yaml

检查

1
2
ls -l /etc/kubernetes/manifests/kube-vip.yaml
grep -E "192.167.8.26|vip_interface|vip_leaderelection|vip_arp" /etc/kubernetes/manifests/kube-vip.yaml

十、只在 master-01 初始化集群

在 k8s-master-01 执行:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
cat >/root/kubeadm-config.yaml <<'EOF'
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
name: k8s-master-01
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: v1.35.3
controlPlaneEndpoint: "192.167.8.26:6443"
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "10.96.0.0/12"
apiServer:
certSANs:
- "192.167.8.26"
- "k8s-vip"
- "192.167.8.20"
- "192.167.8.21"
- "192.167.8.22"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF
初始化(如果卡住请看第十一步骤):
1
kubeadm init --config=/root/kubeadm-config.yaml --upload-certs

成功后会输出两种 join 命令:

一种是 master 加入用的:

1
2
3
kubeadm join 192.167.8.26:6443 --token xxx \
--discovery-token-ca-cert-hash sha256:xxx \
--control-plane --certificate-key xxx

一种是 worker 加入用的:

1
2
kubeadm join 192.167.8.26:6443 --token xxx \
--discovery-token-ca-cert-hash sha256:xxx

十一、如果init卡住,大概率是国内k8s镜像拉不下来

  1. 先中断 init ,Ctrl + C
  2. 然后执行清理:
1
2
3
4
kubeadm reset -f
rm -rf /etc/kubernetes/pki
rm -rf /var/lib/etcd
systemctl restart containerd kubelet

3.修改 kubeadm-config.yaml,加入国内镜像仓库

1
2
3
4
5
6
7
8
9
10
11
vi /root/kubeadm-config.yaml

#在 ClusterConfiguration 里面加一行:
imageRepository: registry.aliyuncs.com/google_containers

#修改 vi /etc/containerd/config.toml
[plugins.'io.containerd.cri.v1.images'.pinned_images]
sandbox = 'registry.aliyuncs.com/google_containers/pause:3.10.1'

# 把sandox改成阿里云地址
sandbox = 'registry.aliyuncs.com/google_containers/pause:3.10.1'

最终大概是这样:

1
2
3
4
5
6
7
8
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: v1.35.3
imageRepository: registry.aliyuncs.com/google_containers
controlPlaneEndpoint: "192.167.8.26:6443"
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "10.96.0.0/12"

注意缩进,imageRepository 和 kubernetesVersion 同级。

4.先查看需要哪些镜像

1
2
3
4
5
6
7
8
9
10
kubeadm config images list --config=/root/kubeadm-config.yaml

#应该显示类似:
registry.aliyuncs.com/google_containers/kube-apiserver:v1.35.3
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.35.3
registry.aliyuncs.com/google_containers/kube-scheduler:v1.35.3
registry.aliyuncs.com/google_containers/kube-proxy:v1.35.3
registry.aliyuncs.com/google_containers/coredns:xxx
registry.aliyuncs.com/google_containers/pause:xxx
registry.aliyuncs.com/google_containers/etcd:xxx
  1. 手动预拉镜像
1
kubeadm config images pull --config=/root/kubeadm-config.yaml

如果能成功,再检查,能看到这些镜像后再 init。

1
crictl images

6、拷贝 kubeadm-config.yaml 到 master-02 跟 master-03

1
2
3
4
scp kubeadm-config.yaml root@192.167.8.21:/root
scp kubeadm-config.yaml root@192.167.8.22:/root
#分别执行
kubeadm config images pull --config=/root/kubeadm-config.yaml

再回到 k8s-master-01 上执行 init

1
kubeadm init --config=/root/kubeadm-config.yaml --upload-certs

成功后会输出两种 join 命令,记住保存一下,如果失败,请看第十七步骤

十一、配置 kubectl

只在 master-01 执行:

1
2
3
mkdir -p $HOME/.kube
cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

检查:

1
2
kubectl get nodes
kubectl get pods -A

十二、安装 Flannel 网络插件

master-01 执行:

1
2
3
4
5
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

#等待 1~2 分钟,master-01 应该变成:Ready
kubectl get pods -A -o wide
kubectl get nodes -o wide

十三、把 kube-vip 文件复制到 master-02/master-03

在 master-01 执行:

1
2
scp /etc/kubernetes/manifests/kube-vip.yaml root@192.167.8.21:/etc/kubernetes/manifests/
scp /etc/kubernetes/manifests/kube-vip.yaml root@192.167.8.22:/etc/kubernetes/manifests/

如果目标目录不存在,先在 master-02/master-03 执行:

1
mkdir -p /etc/kubernetes/manifests

十四、master-02/master-03 加入集群

分别在 master-02、master-03 执行你保存的 master join 命令,例如:

1
2
3
4
5
6
7
kubeadm join 192.167.8.26:6443 --token xxx \
--discovery-token-ca-cert-hash sha256:xxx \
--control-plane --certificate-key xxx

#加入后在 master-01 检查:
kubectl get nodes -o wide
kubectl get pods -n kube-system -o wide

十五、worker 节点加入集群

在 3 台 worker 上执行你保存的 worker join 命令,例如:

1
2
3
4
5
kubeadm join 192.167.8.26:6443 --token xxx \
--discovery-token-ca-cert-hash sha256:xxx

#完成后 master-01 检查:
kubectl get nodes -o wide

最终应该是:

1
2
3
4
5
6
k8s-master-01   Ready   control-plane
k8s-master-02 Ready control-plane
k8s-master-03 Ready control-plane
k8s-node-01 Ready
k8s-node-02 Ready
k8s-node-03 Ready

十六、如果 token 过期

在 master-01 重新生成 worker join:

1
2
3
4
5
6
7
8
9
kubeadm token create --print-join-command

#重新生成 master 加入证书 key:
kubeadm init phase upload-certs --upload-certs

#它会输出新的:certificate-key,然后组合成:
kubeadm join 192.167.8.26:6443 --token xxx \
--discovery-token-ca-cert-hash sha256:xxx \
--control-plane --certificate-key 新的key

十七、初始化失败重置命令

失败节点执行:

1
2
3
4
5
6
7
8
9
kubeadm reset -f
rm -rf /etc/kubernetes
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet
rm -rf ~/.kube
systemctl restart containerd
systemctl restart kubelet

mkdir -p /etc/kubernetes/manifests

先不要用 kube-vip,先让 init 成功,把 /root/kubeadm-config.yaml 里的

vip地址临时改成本机ip
1
2
3
controlPlaneEndpoint: "192.167.8.26:6443"
#192.167.8.26:6443 修改成192.167.8.20:6443
controlPlaneEndpoint: "192.167.8.20:6443"

然后 init:

1
kubeadm init --config=/root/kubeadm-config.yaml --upload-certs

成功后配置 kubectl:

1
2
3
4
mkdir -p ~/.kube
cp -f /etc/kubernetes/admin.conf ~/.kube/config
chown $(id -u):$(id -g) ~/.kube/config
kubectl get nodes

init 成功后再部署 kube-vip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cat >/root/kube-vip-rbac.yaml <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-vip
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-vip-role
rules:
- apiGroups: [""]
resources: ["services", "services/status", "endpoints", "nodes"]
verbs: ["list", "get", "watch", "create", "update", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["list", "get", "watch", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-vip-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-vip-role
subjects:
- kind: ServiceAccount
name: kube-vip
namespace: kube-system
EOF
启动kuvevip的rbac
1
kubectl apply -f /root/kube-vip-rbac.yaml

生成 kube-vip,注意修改成自己实际的ip地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

#先删除残留手动 VIP

ip addr del 192.167.8.26/32 dev ens18 2>/dev/null || true
ip addr show ens18 | grep 192.167.8.26 || echo "VIP 已删除"

mkdir -p /etc/kubernetes/manifests

ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:v0.8.9 vip /kube-vip manifest pod \
> --interface ens18 \
> --address 192.167.8.26 \
> --controlplane \
> --services \
> --arp \
> --leaderElection \
> > /etc/kubernetes/manifests/kube-vip.yaml

加 ServiceAccount:

1
2
3
4
sed -i '/^spec:/a\  serviceAccountName: kube-vip' /etc/kubernetes/manifests/kube-vip.yaml
systemctl restart kubelet
sleep 20
crictl ps -a | grep kube-vip

验证 VIP

1
2
3
4
sleep 20
ip addr show ens18 | grep 192.167.8.26
curl -k https://192.167.8.26:6443/livez
crictl logs $(crictl ps -a --name kube-vip -q | head -n1) | tail -n 50

成功之后 安装 CNI 网络插件

1
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

等 1 分钟检查:

1
2
kubectl get pods -A -o wide
kubectl get nodes -o wide

如果国内拉不下来,先用这个查看具体卡在哪:

1
2
kubectl get pods -A
kubectl describe pod -n kube-flannel -l app=flannel

VIP 正常后改 kubeconfig

1
2
3
4
5
6
7
grep server ~/.kube/config

#如果还是:https://192.167.8.20:6443
#改成 VIP:
sed -i 's#https://192.167.8.20:6443#https://192.167.8.26:6443#g' ~/.kube/config

kubectl get nodes

现在 kube-vip 已经正常接管 VIP 了,下一步要把集群配置里的 controlPlaneEndpoint 改回 VIP。

1
2
3
4
5
6
7
8
grep controlPlaneEndpoint /root/kubeadm-config.yaml
#如果是controlPlaneEndpoint: "192.167.8.20:6443"
#改回:
sed -i 's#controlPlaneEndpoint: "192.167.8.20:6443"#controlPlaneEndpoint: "192.167.8.26:6443"#' /root/kubeadm-config.yaml

kubeadm init phase upload-config kubeadm --config=/root/kubeadm-config.yaml
再改本机 kubeconfig:
sed -i 's#https://192.167.8.20:6443#https://192.167.8.26:6443#g' ~/.kube/config

验证:

1
2
3
grep server ~/.kube/config
kubectl get nodes
curl -k https://192.167.8.26:6443/livez

没问题之后加入 master节点

master02 + master03 配置
1
mkdir -p /etc/kubernetes/manifests
master-01 复制 kube-vip.yaml
1
2
scp /etc/kubernetes/manifests/kube-vip.yaml root@192.167.8.21:/etc/kubernetes/manifests/
scp /etc/kubernetes/manifests/kube-vip.yaml root@192.167.8.22:/etc/kubernetes/manifests/
加入master控制节点
1
2
3
4
5
kubeadm join 192.167.8.26:6443 \
--token uduzmn.xxx \
--discovery-token-ca-cert-hash sha256:xxx \
--control-plane \
--certificate-key xxx

alt text

其他work 节点加入,如 work01 - 03
1
kubeadm join 192.167.8.26:6443 --token zjron3.xxx         --discovery-token-ca-cert-hash sha256:xxx

alt text

-------------本文结束感谢您的阅读-------------
0%