BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
Michal Kalderon
mkalderon at marvell.com
Sun May 30 00:33:18 PDT 2021
Hi Christoph, Sagi,
We're testing some device error recovery scenarios and hit the following BUG, stack trace below.
In the error scenario, nvmet_rdma_queue_response receives an error from the device when trying to post a wr,
this leads to nvmet_rdma_release_rsp being called from softirq eventually
reaching the blk_mq_delay_run_hw_queue which tries to schedule in softirq. (full stack below)
could you please advise what the correct solution should be in this case ?
thanks,
Michal
[ 8790.082863] nvmet_rdma: post_recv cmd failed
[ 8790.083484] nvmet_rdma: sending cmd response failed
[ 8790.084131] ------------[ cut here ]------------
[ 8790.084140] WARNING: CPU: 7 PID: 46 at block/blk-mq.c:1422 __blk_mq_run_hw_queue+0xb7/0x100
[ 8790.084619] Modules linked in: null_blk nvmet_rdma nvmet nvme_rdma nvme_fabrics nvme_core netconsole qedr(OE) qede(OE) qed(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nft_counter nft_compat tun bridge stp llc nf_tables nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma rdma_ucm ib_iser rdma_cm iw_cm intel_rapl_msr intel_rapl_common ib_cm sb_edac libiscsi scsi_transport_iscsi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc rapl ib_uverbs ib_core cirrus drm_kms_helper drm virtio_balloon i2c_piix4 pcspkr crc32c_intel virtio_net serio_raw net_failover failover floppy crc8 ata_generic pata_acpi qemu_fw_cfg [last unloaded: qedr]
[ 8790.084748] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G OE 5.8.10 #1
[ 8790.084749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 8790.084752] RIP: 0010:__blk_mq_run_hw_queue+0xb7/0x100
[ 8790.084753] Code: 00 48 89 ef e8 ea 34 c8 ff 48 89 df 41 89 c4 e8 1f 7f 00 00 f6 83 a8 00 00 00 20 74 b1 41 f7 c4 fe ff ff ff 74 b7 0f 0b eb b3 <0f> 0b eb 86 48 83 bf 98 00 00 00 00 48 c7 c0 df 81 3f 82 48 c7 c2
[ 8790.084754] RSP: 0018:ffffc9000020ba60 EFLAGS: 00010206
[ 8790.084755] RAX: 0000000000000100 RBX: ffff88809fe8c400 RCX: 00000000ffffffff
[ 8790.084756] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88809fe8c400
[ 8790.084756] RBP: ffff888137b81a50 R08: ffffffffffffffff R09: 0000000000000020
[ 8790.084757] R10: 0000000000000001 R11: ffff8881365d4968 R12: 0000000000000000
[ 8790.084758] R13: ffff888137b81a40 R14: ffff88811e2b9e80 R15: ffff8880b3d964f0
[ 8790.084759] FS: 0000000000000000(0000) GS:ffff88813bbc0000(0000) knlGS:0000000000000000
[ 8790.084759] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8790.084760] CR2: 000055ca53900da8 CR3: 000000012b83e006 CR4: 0000000000360ee0
[ 8790.084763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8790.084763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8790.084764] Call Trace:
[ 8790.084767] __blk_mq_delay_run_hw_queue+0x140/0x160
[ 8790.084768] blk_mq_get_tag+0x1d1/0x270
[ 8790.084771] ? finish_wait+0x80/0x80
[ 8790.084773] __blk_mq_alloc_request+0xb1/0x100
[ 8790.084774] blk_mq_make_request+0x144/0x5d0
[ 8790.084778] generic_make_request+0x2db/0x340
[ 8790.084779] ? bvec_alloc+0x82/0xe0
[ 8790.084781] submit_bio+0x43/0x160
[ 8790.084781] ? bio_add_page+0x39/0x90
[ 8790.084794] nvmet_bdev_execute_rw+0x28c/0x360 [nvmet]
[ 8790.084800] nvmet_rdma_execute_command+0x72/0x110 [nvmet_rdma]
[ 8790.084802] nvmet_rdma_release_rsp+0xc1/0x1e0 [nvmet_rdma]
[ 8790.084804] nvmet_rdma_queue_response.cold.63+0x14/0x19 [nvmet_rdma]
[ 8790.084806] nvmet_req_complete+0x11/0x40 [nvmet]
[ 8790.084809] nvmet_bio_done+0x27/0x100 [nvmet]
[ 8790.084811] blk_update_request+0x23e/0x3b0
[ 8790.084812] blk_mq_end_request+0x1a/0x120
[ 8790.084814] blk_done_softirq+0xa1/0xd0
[ 8790.084818] __do_softirq+0xe4/0x2f8
[ 8790.084821] ? sort_range+0x20/0x20
[ 8790.084824] run_ksoftirqd+0x26/0x40
[ 8790.084825] smpboot_thread_fn+0xc5/0x160
[ 8790.084827] kthread+0x116/0x130
[ 8790.084828] ? kthread_park+0x80/0x80
[ 8790.084832] ret_from_fork+0x22/0x30
[ 8790.084833] ---[ end trace 16ec813ee3f82b56 ]---
[ 8790.085314] BUG: scheduling while atomic: ksoftirqd/7/46/0x00000100
More information about the Linux-nvme
mailing list