k8s故障问题收集帖

网络问题

Pod 一直处于 ContainerCreating 状态,显示”cni0” already has an IP address different

通过 kubectl describe pod <pod-name> 命令查看到当前 Pod 的事件

Events:
Type Reason Age From Message


Normal Scheduled 89s default-scheduler Successfully assigned local-path-storage/local-path-provisioner-ccbdd96dc-cbthj to ip-172-31-9-78
Warning FailedCreatePodSandBox 88s kubelet, ip-172-31-9-78 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container “dbe0dc21f80b8778ceff11a98de477e59f5c3fa982563626ed0c01eba5eaed2c” network for pod “local-path-provisioner-ccbdd96dc-cbthj”: NetworkPlugin cni failed to set up pod “local-path-provisioner-ccbdd96dc-cbthj_local-path-storage” network: failed to set bridge addr: “cni0” already has an IP address different from 10.42.0.1/24

查看 kubelet 日志也是显示:

E1216 17:30:30.675697 22632 cni.go:331] Error adding local-path-storage_local-path-provisioner-ccbdd96dc-cbthj/ 0d2b1cd6de25ac114e2075f70f8ac25ef72b299048e728038086f3e7324f400a to network flannel/cbr0: failed to set bridge addr: “cni0” already has an IP address different from 10.42.0.1/24
E1216 17:30:30.922504 22632 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container “0d2b1cd6de25ac114e2075f70f8ac25ef72b299048e728038086f3e7324f400a” network for pod “local-path-provisioner-ccbdd96dc-cbthj”: NetworkPlugin cni failed to set up pod “local-path-provisioner-ccbdd96dc-cbthj_local-path-storage” network: failed to set bridge addr: “cni0” already has an IP address different from 10.42.0.1/24

这类错误是因为 cni0 网桥配置了一个不同网段的 IP 地址导致, 做法是删除cni0让网络插件重新自动创建(由于cni0是作为docker的网桥,这里需要先暂停对于机器的容器):

1
2
3
systemctl stop docker
ip link set cni0 down
brctl delbr cni0

Coredns CrashLoopBackOff 问题

log日志:

1
2
3
4
kubectl -n kube-system logs coredns-6998d84bf5-r4dbk  
E1028 06:36:35.489403 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E1028 06:36:35.489403 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-8686dcc4fd-7fwcz.unknownuser.log.ERROR.20191028-063635.1: no such file or directory

防火墙(iptables)规则错乱或者缓存导致的,解决方案:

1
2
iptables --flush
iptables -tnat --flush

该操作会丢失防火墙规则

shikanon wechat
欢迎您扫一扫,订阅我滴↑↑↑的微信公众号!