[PATCH] nvme-pci: fix sleeping function called from interrupt context

Keith Busch kbusch at kernel.org
Fri Dec 15 09:25:29 PST 2023


On Fri, Dec 15, 2023 at 03:31:36PM +0100, Maurizio Lombardi wrote:
> the nvme_handle_cqe() interrupt handler calls nvme_complete_async_event()
> but the latter may call some blocking functions. Sleeping functions
> can't be called in interrupt context.
> 
> BUG: sleeping function called from invalid context
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/15
>  Call Trace:
>  <IRQ>
>   __cancel_work_timer+0x31e/0x460
>   ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
>   ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
>   nvme_complete_async_event+0x365/0x480 [nvme_core]
>   nvme_poll_cq+0x262/0xe50 [nvme]
> 
> Fix the bug by deferring the call to nvme_complete_async_event() to
> the nvme_wq workqueue, add a wait_queue to be sure there are no async
> events waiting to be completed before stopping the controller.

What async event occured to cause this? It looks like the only AEN
handling that cancels anything is the FW activation one, but that only
applies to fabrics, not pci. Still, that AEN hanlder sets the state to
"RESETTING", but we only cancel work when transitioning to "LIVE" from a
CONNECTING state.



More information about the Linux-nvme mailing list