Warning: hit due WQ_MEM_RECLAIM for ib_addr work queue
Sagi Grimberg
sagi at grimberg.me
Sun Feb 20 04:59:53 PST 2022
> Hi Team,
>
> We have following warning
>
> 13:25:49 kernel: workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
> 13:25:49 kernel: WARNING: CPU: 7 PID: 1067276 at kernel/workqueue.c:2620 check_flush_dependency+0x110/0x130
> 13:25:49 kernel: CPU: 7 PID: 1067276 Comm: kworker/u65:3 Kdump: loaded Not tainted 4.18.0-348.9.1.el8_5.test.x86_64 #1
> 13:25:49 kernel: Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> 13:25:49 kernel: RIP: 0010:check_flush_dependency+0x110/0x130
> 13:25:49 kernel: Code: ff ff 48 8b 50 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 50 b6 8c 82 c6 05 d3 1e 6d 01 01 e8 09 40 fe ff <0f> 0b e9 0a ff ff ff 80 3d c1 1e 6d 01 00 75 95 e9 41 ff ff ff 66
> 13:25:49 kernel: RSP: 0018:ffffa9e4072b3c70 EFLAGS: 00010082
> 13:25:49 kernel: RAX: 0000000000000000 RBX: ffff8c87c5136800 RCX: 0000000000000000
> 13:25:49 kernel: RDX: 0000000000000001 RSI: ffffffff836dffa9 RDI: 0000000000000046
> 13:25:49 kernel: RBP: ffffffffc06fc380 R08: ffffffff836dff20 R09: 0000000000029a00
> 13:25:49 kernel: R10: 0002ee301d21f585 R11: 000000000010490c R12: ffff8c844c5b4740
> 13:25:49 kernel: R13: ffff8c87c3d9bb00 R14: 0000000000000001 R15: ffff8c84538da178
> 13:25:49 kernel: FS: 0000000000000000(0000) GS:ffff8c87af1c0000(0000) knlGS:0000000000000000
> 13:25:49 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 13:25:49 kernel: CR2: 00007f8e63a4f000 CR3: 0000000160b3c000 CR4: 0000000000350ee0
> 13:25:49 kernel: Call Trace:
> 13:25:49 kernel: flush_work+0x8b/0x1c0
> 13:25:49 kernel: ? lock_timer_base+0x67/0x80
> 13:25:49 kernel: ? try_to_del_timer_sync+0x4d/0x80
> 13:25:49 kernel: ? work_busy+0x80/0x80
> 13:25:49 kernel: __cancel_work_timer+0x105/0x190
> 13:25:49 kernel: rdma_addr_cancel+0xa3/0xc0 [ib_core]
> 13:25:49 kernel: _destroy_id+0x17/0x240 [rdma_cm]
> 13:25:49 kernel: nvme_rdma_alloc_queue+0x193/0x200 [nvme_rdma]
> 13:25:49 kernel: nvme_rdma_setup_ctrl+0x34/0xa70 [nvme_rdma]
> 13:25:49 kernel: ? __switch_to_asm+0x41/0x70
> 13:25:49 kernel: ? __switch_to+0x10c/0x470
> 13:25:49 kernel: ? finish_task_switch+0xaa/0x2e0
> 13:25:49 kernel: nvme_rdma_reconnect_ctrl_work+0x22/0x70 [nvme_rdma]
> 13:25:49 kernel: process_one_work+0x1a7/0x360
> 13:25:49 kernel: ? create_worker+0x1a0/0x1a0
> 13:25:49 kernel: worker_thread+0x30/0x390
> 13:25:49 kernel: ? create_worker+0x1a0/0x1a0
> 13:25:49 kernel: kthread+0x116/0x130
> 13:25:49 kernel: ? kthread_flush_work_fn+0x10/0x10
> 13:25:49 kernel: ret_from_fork+0x22/0x40
>
> Seems like in IB code path for WQ ib_core WQ_MEM_RECLAIM removed with 39baf10310e6. As per the description of the patch
>
> "The ib_addr workqueue is not memory reclaim path due to its
> nature of invoking callback that might allocate memory or don't free any
> memory under memory pressure."
>
> I see other patch when hits similar warnings related to flag WQ_MEM_RECLAIM
>
> commit 659e3c71d106b0dd4ce6520e4dcd6e
> Author: Jack Wang <jinpu.wang at cloud.ionos.com>
> Date: Fri Jul 24 16:45:08 2020 +0530
>
> RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
>
> Do you think nvme-core should also do the same to stop this warning ?. Let me know if you agree with removing the flag I can send a patch
> or any other suggestion to avoid the warning would be helpful.
The wq that hosts reconnect/reset work does allocate memory though.
More information about the Linux-nvme
mailing list