一个centos7.8内核bug引起的docker构建失败

现象:在 docker build 的时候出现OCI runtime create failed错误:

docker build -t slog-recall .
Sending build context to Docker daemon 52.56MB

Step 1/14 : FROM xxxx:v0.1 as builder
...
---> 89c701e8140a
Step 7/14 : RUN chmod +x /root/mptools/protoc-gen-go && python gen.py
---> Running in 1fe927a37721

OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown
ERROR: Job failed: exit status 1

跟踪 message 日志:

$ tail -f /var/log/messages

Jan 11 08:14:54 test-gitlab-runner kernel: runc:[1:CHILD]: page allocation failure: order:7, mode:0xc0d0
Jan 11 08:14:54 test-gitlab-runner kernel: CPU: 0 PID: 2139292 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G ------------ T 3.10.0-1127.el7.x86_64 #1
Jan 11 08:14:54 test-gitlab-runner kernel: Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
Jan 11 08:14:54 test-gitlab-runner kernel: Call Trace:
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b17ff85>] dump_stack+0x19/0x1b
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abc4ac0>] warn_alloc_failed+0x110/0x180
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b17b4a0>] __alloc_pages_slowpath+0x6bb/0x729
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abc9146>] __alloc_pages_nodemask+0x436/0x450
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8ac18e18>] alloc_pages_current+0x98/0x110
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abe5748>] kmalloc_order+0x18/0x40
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8ac243d6>] kmalloc_order_trace+0x26/0xa0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8ac28361>] __kmalloc+0x211/0x230
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8ac41211>] memcg_alloc_cache_params+0x81/0xb0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abe53f4>] do_kmem_cache_create+0x74/0xf0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abe5572>] kmem_cache_create+0x102/0x1b0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffffc05aede0>] nf_conntrack_init_net+0x100/0x270 [nf_conntrack]
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffffc05af6e4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b048074>] ops_init+0x44/0x150
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b04823b>] setup_net+0xbb/0x170
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b0489d5>] copy_net_ns+0xb5/0x180
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aacb909>] create_new_namespaces+0xf9/0x180
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aacbb4a>] unshare_nsproxy_namespaces+0x5a/0xc0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aa9b24b>] SyS_unshare+0x1cb/0x340
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b192ed2>] system_call_fastpath+0x25/0x2a
Jan 11 08:14:54 test-gitlab-runner kernel: Mem-Info:
Jan 11 08:14:54 test-gitlab-runner kernel: active_anon:529037 inactive_anon:42965 isolated_anon:0#012 active_file:452366 inactive_file:522095 isolated_file:0#012 unevictable:0 dirty:136 writeback:0 unstable:0#012 slab_reclaimable:58275 slab_unreclaimable:231111#012 mapped:78597 shmem:98825 pagetables:5028 bounce:0#012 free:69876 free_pcp:0 free_cma:0
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 11 08:14:54 test-gitlab-runner kernel: lowmem_reserve[]: 0 2812 7801 7801
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 DMA32 free:183584kB min:24312kB low:30388kB high:36468kB active_anon:236324kB inactive_anon:23636kB active_file:840008kB inactive_file:1090544kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129200kB managed:2882772kB mlocked:0kB dirty:276kB writeback:0kB mapped:130040kB shmem:47248kB slab_reclaimable:107268kB slab_unreclaimable:285836kB kernel_stack:5440kB pagetables:5588kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 11 08:14:54 test-gitlab-runner kernel: lowmem_reserve[]: 0 0 4989 4989
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 Normal free:80012kB min:43136kB low:53920kB high:64704kB active_anon:1879824kB inactive_anon:148224kB active_file:969456kB inactive_file:997836kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5109356kB mlocked:0kB dirty:268kB writeback:0kB mapped:184348kB shmem:348052kB slab_reclaimable:125832kB slab_unreclaimable:638608kB kernel_stack:12608kB pagetables:14524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 11 08:14:54 test-gitlab-runner kernel: lowmem_reserve[]: 0 0 0 0
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 DMA32: 16230*4kB (UEM) 5005*8kB (UEM) 1748*16kB (UEM) 760*32kB (UM) 273*64kB (M) 69*128kB (M) 5*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 184832kB
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 Normal: 8666*4kB (UEM) 3748*8kB (UEM) 1017*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80920kB
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 11 08:14:54 test-gitlab-runner kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 11 08:14:54 test-gitlab-runner kernel: 1073190 total pagecache pages
Jan 11 08:14:54 test-gitlab-runner kernel: 0 pages in swap cache
Jan 11 08:14:54 test-gitlab-runner kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 11 08:14:54 test-gitlab-runner kernel: Free swap = 0kB
Jan 11 08:14:54 test-gitlab-runner kernel: Total swap = 0kB
Jan 11 08:14:54 test-gitlab-runner kernel: 2097018 pages RAM
Jan 11 08:14:54 test-gitlab-runner kernel: 0 pages HighMem/MovableOnly
Jan 11 08:14:54 test-gitlab-runner kernel: 95009 pages reserved
Jan 11 08:14:54 test-gitlab-runner kernel: kmem_cache_create(nf_conntrack_153504) failed with error -12
Jan 11 08:14:54 test-gitlab-runner kernel: CPU: 0 PID: 2139292 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G ------------ T 3.10.0-1127.el7.x86_64 #1
Jan 11 08:14:54 test-gitlab-runner kernel: Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
Jan 11 08:14:54 test-gitlab-runner kernel: Call Trace:
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b17ff85>] dump_stack+0x19/0x1b
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8abe55f7>] kmem_cache_create+0x187/0x1b0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffffc05aede0>] nf_conntrack_init_net+0x100/0x270 [nf_conntrack]
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffffc05af6e4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b048074>] ops_init+0x44/0x150
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b04823b>] setup_net+0xbb/0x170
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b0489d5>] copy_net_ns+0xb5/0x180
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aacb909>] create_new_namespaces+0xf9/0x180
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aacbb4a>] unshare_nsproxy_namespaces+0x5a/0xc0
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8aa9b24b>] SyS_unshare+0x1cb/0x340
Jan 11 08:14:54 test-gitlab-runner kernel: [<ffffffff8b192ed2>] system_call_fastpath+0x25/0x2a
Jan 11 08:14:54 test-gitlab-runner kernel: Unable to create nf_conn slab cache
Jan 11 08:14:54 test-gitlab-runner containerd: time="2021-01-11T16:14:54.439755765+08:00" level=info msg="shim reaped" id=9468bd981a0c1e82ddd06a90cfc9d1ecdafeaec4ba6f94e76980a18b30873292
Jan 11 08:14:54 test-gitlab-runner dockerd: time="2021-01-11T16:14:54.453909645+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 11 08:14:54 test-gitlab-runner dockerd: time="2021-01-11T16:14:54.454256700+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 11 08:14:54 test-gitlab-runner kernel: docker0: port 3(veth4fb05a8) entered disabled state
Jan 11 08:14:54 test-gitlab-runner kernel: device veth4fb05a8 left promiscuous mode
Jan 11 08:14:54 test-gitlab-runner kernel: docker0: port 3(veth4fb05a8) entered disabled state

发现是 kmem_cache_create(nf_conntrack_153504) failed with error -12引起,slab cache无法创建,进而导致closed fifo

google 搜了下,发现相关问题:
https://github.com/docker/for-linux/issues/856
https://github.com/moby/moby/issues/37722

slabtop查看下:

$ slabtop

Active / Total Objects (% used) : 5094496 / 5557266 (91.7%)
Active / Total Slabs (% used) : 148098 / 148098 (100.0%)
Active / Total Caches (% used) : 114 / 136 (83.8%)
Active / Total Size (% used) : 945322.98K / 1128995.96K (83.7%)
Minimum / Average / Maximum Object : 0.01K / 0.20K / 8.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
2850866 2849117 99% 0.12K 83849 34 335396K kernfs_node_cache
375186 366084 97% 0.09K 8933 42 35732K kmalloc-96
259456 208228 80% 0.03K 2027 128 8108K kmalloc-32
192736 82978 43% 1.00K 6037 32 193184K kmalloc-1024
175440 118687 67% 0.04K 1720 102 6880K selinux_inode_security
172032 117449 68% 0.02K 672 256 2688K kmalloc-16
168441 168441 100% 0.10K 4319 39 17276K buffer_head
162944 146478 89% 0.06K 2546 64 10184K kmalloc-64
154870 154360 99% 0.02K 911 170 3644K fsnotify_mark_connector
133632 118510 88% 0.01K 261 512 1044K kmalloc-8
116529 95659 82% 0.19K 5549 21 22196K dentry
99036 34979 35% 0.57K 3537 28 56592K radix_tree_node

slabs 使用已经 100%

解决方案

  1. 手工释放内存:

    To free pagecache:
    echo 1 > /proc/sys/vm/drop_caches

    To free reclaimable slab objects (includes dentries and inodes):
    echo 2 > /proc/sys/vm/drop_caches

    To free slab objects and pagecache:
    echo 3 > /proc/sys/vm/drop_caches
  2. 打补丁

红帽发布补丁了,可以给内核打补丁:
https://bugzilla.redhat.com/show_bug.cgi?id=1507149

shikanon wechat
欢迎您扫一扫,订阅我滴↑↑↑的微信公众号!