Warning: hit due WQ_MEM_RECLAIM for ib_addr work queue

Sagi Grimberg sagi at grimberg.me
Sun Feb 20 04:59:53 PST 2022


> Hi Team,
> 
> We have following warning
> 
>   13:25:49  kernel: workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
>   13:25:49  kernel: WARNING: CPU: 7 PID: 1067276 at kernel/workqueue.c:2620 check_flush_dependency+0x110/0x130
>   13:25:49  kernel: CPU: 7 PID: 1067276 Comm: kworker/u65:3 Kdump: loaded Not tainted 4.18.0-348.9.1.el8_5.test.x86_64 #1
>   13:25:49  kernel: Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
>   13:25:49  kernel: RIP: 0010:check_flush_dependency+0x110/0x130
>   13:25:49  kernel: Code: ff ff 48 8b 50 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 50 b6 8c 82 c6 05 d3 1e 6d 01 01 e8 09 40 fe ff <0f> 0b e9 0a ff ff ff 80 3d c1 1e 6d 01 00 75 95 e9 41 ff ff ff 66
>   13:25:49  kernel: RSP: 0018:ffffa9e4072b3c70 EFLAGS: 00010082
>   13:25:49  kernel: RAX: 0000000000000000 RBX: ffff8c87c5136800 RCX: 0000000000000000
>   13:25:49  kernel: RDX: 0000000000000001 RSI: ffffffff836dffa9 RDI: 0000000000000046
>   13:25:49  kernel: RBP: ffffffffc06fc380 R08: ffffffff836dff20 R09: 0000000000029a00
>   13:25:49  kernel: R10: 0002ee301d21f585 R11: 000000000010490c R12: ffff8c844c5b4740
>   13:25:49  kernel: R13: ffff8c87c3d9bb00 R14: 0000000000000001 R15: ffff8c84538da178
>   13:25:49  kernel: FS:  0000000000000000(0000) GS:ffff8c87af1c0000(0000) knlGS:0000000000000000
>   13:25:49  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   13:25:49  kernel: CR2: 00007f8e63a4f000 CR3: 0000000160b3c000 CR4: 0000000000350ee0
>   13:25:49  kernel: Call Trace:
>   13:25:49  kernel: flush_work+0x8b/0x1c0
>   13:25:49  kernel: ? lock_timer_base+0x67/0x80
>   13:25:49  kernel: ? try_to_del_timer_sync+0x4d/0x80
>   13:25:49  kernel: ? work_busy+0x80/0x80
>   13:25:49  kernel: __cancel_work_timer+0x105/0x190
>   13:25:49  kernel: rdma_addr_cancel+0xa3/0xc0 [ib_core]
>   13:25:49  kernel: _destroy_id+0x17/0x240 [rdma_cm]
>   13:25:49  kernel: nvme_rdma_alloc_queue+0x193/0x200 [nvme_rdma]
>   13:25:49  kernel: nvme_rdma_setup_ctrl+0x34/0xa70 [nvme_rdma]
>   13:25:49  kernel: ? __switch_to_asm+0x41/0x70
>   13:25:49  kernel: ? __switch_to+0x10c/0x470
>   13:25:49  kernel: ? finish_task_switch+0xaa/0x2e0
>   13:25:49  kernel: nvme_rdma_reconnect_ctrl_work+0x22/0x70 [nvme_rdma]
>   13:25:49  kernel: process_one_work+0x1a7/0x360
>   13:25:49  kernel: ? create_worker+0x1a0/0x1a0
>   13:25:49  kernel: worker_thread+0x30/0x390
>   13:25:49  kernel: ? create_worker+0x1a0/0x1a0
>   13:25:49  kernel: kthread+0x116/0x130
>   13:25:49  kernel: ? kthread_flush_work_fn+0x10/0x10
>   13:25:49  kernel: ret_from_fork+0x22/0x40
>   
>   Seems like in IB code path for WQ ib_core WQ_MEM_RECLAIM removed with 39baf10310e6.  As per the description of the patch
> 
>      "The ib_addr workqueue is not memory reclaim path due to its
>      nature of invoking callback that might allocate memory or don't free any
>      memory under memory pressure."
>   
>   I see other patch when hits similar warnings related to flag WQ_MEM_RECLAIM
>   
> commit 659e3c71d106b0dd4ce6520e4dcd6e
> Author: Jack Wang <jinpu.wang at cloud.ionos.com>
> Date:   Fri Jul 24 16:45:08 2020 +0530
> 
>      RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
>      
> Do you think nvme-core should also do the same to stop this warning ?. Let me know if you agree with removing the flag I can send a patch
> or any other suggestion to avoid the warning would be helpful.

The wq that hosts reconnect/reset work does allocate memory though.



More information about the Linux-nvme mailing list