【问题标题】:Understand kernel panic / advice了解内核恐慌/建议
【发布时间】:2017-06-06 06:49:14
【问题描述】:

我正在使用带有内核 3.16.39-1 的 debian jessie:

# apt-cache policy linux-image-3.16.0-4-amd64
linux-image-3.16.0-4-amd64:
Installed: 3.16.39-1
Candidate: 3.16.39-1
Version table:
*** 3.16.39-1 0
    500 http://ftp.fr.debian.org/debian/ jessie/main amd64 Packages
    100 /var/lib/dpkg/status

本机使用2个绑定接口:

  • bond0: 2*10Gb/s ixgbe X520
  • bond1:2*10Gb/s ixgbe X520

irqbalance 正在这台机器上运行。

在网络负载下(bond1 上 12Gb/s)我得到以下内核恐慌:

kernel: [26339.017497] Call Trace:
kernel: [26339.017499]  <IRQ>  [<ffffffff81514c11>] ?     dump_stack+0x5d/0x78
kernel: [26339.017509]  [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513]  [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515]  [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519]  [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524]  [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528]  [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531]  [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534]  [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558]  [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564]  [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567]  [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0
kernel: [26339.017570]  [<ffffffff81425509>] ? net_rx_action+0x129/0x250
kernel: [26339.017573]  [<ffffffff8106d911>] ? __do_softirq+0xf1/0x2d0
kernel: [26339.017575]  [<ffffffff8106dd25>] ? irq_exit+0x95/0xa0
kernel: [26339.017578]  [<ffffffff8151dbe2>] ? do_IRQ+0x52/0xe0
kernel: [26339.017582]  [<ffffffff8151ba2d>] ? common_interrupt+0x6d/0x6d
kernel: [26339.017583]  <EOI>  [<ffffffff8108c31d>] ? __hrtimer_start_range_ns+0x1cd/0x3a0
kernel: [26339.017588]  [<ffffffff813e32a2>] ? cpuidle_enter_state+0x52/0xc0
kernel: [26339.017590]  [<ffffffff813e3298>] ? cpuidle_enter_state+0x48/0xc0
kernel: [26339.017592]  [<ffffffff810a9b28>] ? cpu_startup_entry+0x328/0x470
kernel: [26339.017595]  [<ffffffff81043fdf>] ? start_secondary+0x20f/0x2d0
[....]
kernel: [26339.017647] swapper/13: page allocation failure: order:0, mode:0x20
kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
kernel: [26339.017667]  active_file:284318 inactive_file:151176 isolated_file:0
kernel: [26339.017667]  unevictable:20736 dirty:24804 writeback:4297 unstable:0
kernel: [26339.017667]  free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
kernel: [26339.017667]  mapped:22343 shmem:413 pagetables:10111 bounce:0
kernel: [26339.017667]  free_cma:0
kernel: [26339.017670] Node 0 DMA free:15896kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: [26339.017675] lowmem_reserve[]: 0 3191 16016 16016
kernel: [26339.017680] Node 0 DMA32 free:56312kB min:13456kB low:16820kB high:20184kB active_anon:589468kB inactive_anon:141384kB active_file:1132312kB inactive_file:597576kB unevictable:16616kB isolated(anon):0kB isolated(file):0kB present:3345344kB managed:3270860kB mlocked:16616kB dirty:33860kB writeback:4288kB mapped:18616kB shmem:180kB slab_reclaimable:17036kB slab_unreclaimable:83696kB kernel_stack:34016kB pagetables:8384kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: [26339.017686] lowmem_reserve[]: 0 0 12824 12824
kernel: [26339.017691] Node 0 Normal free:20108kB min:54060kB low:67572kB high:81088kB active_anon:10853680kB inactive_anon:1020528kB active_file:4960kB inactive_file:7128kB unevictable:66328kB isolated(anon):62892kB isolated(file):0kB present:13369344kB managed:13131968kB mlocked:66328kB dirty:65356kB writeback:12900kB mapped:70756kB shmem:1472kB slab_reclaimable:92136kB slab_unreclaimable:262992kB kernel_stack:10880kB pagetables:32060kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4275 all_unreclaimable? no
kernel: [26339.017696] lowmem_reserve[]: 0 0 0 0
kernel: [26339.017701] Node 0 DMA: 0*4kB  0000000000000020 ffff88042f1a3bf0
kernel: [26339.017706] 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15896kB
kernel: [26339.017723] Node 0 DMA32: 250*4kB 
kernel: [26339.017726]  ffffffff81144a3f 0000000000000000 0000000000000000 ffffffff00000002
kernel: [26339.017730] (EM) 967*8kB (UEM) 2628*16kB (UM) 83*32kB (UMR) 15*64kB (R) 8*128kB (R) 4*256kB (R) 0*512kB 0*1024kB 0*2048kB <4>[26339.017747] swapper/0: page allocation failure: order:0, mode:0x20
kernel: [26339.017748] 0*4096kB = 56448kB
kernel: [26339.017751] Node 0 Normal: 3653*4kB (M) 0*8kB 0*16kB 1*32kB (R) 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 17332kB
kernel: [26339.017767] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kernel: [26339.017768] 466495 total pagecache pages
kernel: [26339.017769] 10046 pages in swap cache
kernel: [26339.017771] Swap cache stats: add 4415081, delete 4405035, find 1682225/2488531
kernel: [26339.017772] Free swap  = 19301256kB
kernel: [26339.017773] Total swap = 19764220kB
kernel: [26339.017774] 4182667 pages RAM
kernel: [26339.017775] 0 pages HighMem/MovableOnly
kernel: [26339.017776] 59344 pages reserved
kernel: [26339.017777] 0 pages hwpoisoned

内核恐慌显示有关 irq 和 ixgbe 的消息。

有人可以给我一些建议来解决这个问题吗?服务器在 2 小时内运行良好,网络负载相同,没有任何问题。

问候,

【问题讨论】:

  • swapper/13: page allocation failure

标签: linux-kernel


【解决方案1】:

调用跟踪无法显示任何与内核崩溃相关的调试信息。

kernel: [26339.017509]  [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513]  [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515]  [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519]  [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524]  [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528]  [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531]  [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534]  [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558]  [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564]  [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567]  [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0

而不是上面的调用跟踪,下面的签名表示页面饥饿的迹象。

kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
kernel: [26339.017667]  active_file:284318 inactive_file:151176 isolated_file:0
kernel: [26339.017667]  unevictable:20736 dirty:24804 writeback:4297 unstable:0
kernel: [26339.017667]  free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
kernel: [26339.017667]  mapped:22343 shmem:413 pagetables:10111 bounce:0
kernel: [26339.017667]  free_cma:0

正如“inactive_anon:290478, inactive_file:151176”签名所示,DMA 区域页面饥饿的可能性很高。 如果您参考以下说明,您会发现我们的系统是否正在经历内核内存泄漏。

  1. 内核:添加与 kmem 泄漏相关的配置

diff --git a/arch/arm/configs/pompeii_defconfig b/arch/arm/configs/pompeii_defconfig 索引 2e97f97..aac678a 100644 --- a/arch/arm/configs/pompeii_defconfig +++ b/arch/arm/configs/pompeii_defconfig @@ -754,8 +754,8 @@ CONFIG_SLUB_DEBUG_PANIC_ON=y CONFIG_SLUB_DEBUG_ON=y CONFIG_DEBUG_KMEMLEAK=y -CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=4000 -CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y +CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=40000 +# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF 未设置 CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_MEMORY_INIT=y

  1. 确保将内核命令行添加到“kmemleak=on”。

  2. 在输入以下命令 10 分钟后, 回声扫描 > /sys/kernel/debug/kmemleak

内核内存泄漏的输出可以用下面的命令显示。 cat > /sys/kernel/debug/kmemleak

【讨论】:

    猜你喜欢
    • 2013-06-15
    • 2013-12-12
    • 2020-01-28
    • 2012-02-08
    • 2015-12-11
    • 2023-03-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多