[PATCH] nvme-pci: fix sleeping function called from interrupt context
Yi Zhang
yi.zhang at redhat.com
Mon Dec 18 18:28:45 PST 2023
On Sat, Dec 16, 2023 at 1:33 AM Maurizio Lombardi <mlombard at redhat.com> wrote:
>
> pá 15. 12. 2023 v 18:25 odesílatel Keith Busch <kbusch at kernel.org> napsal:
> >
> > What async event occured to cause this? It looks like the only AEN
> > handling that cancels anything is the FW activation one, but that only
> > applies to fabrics, not pci. Still, that AEN hanlder sets the state to
> > "RESETTING", but we only cancel work when transitioning to "LIVE" from a
> > CONNECTING state.
> >
>
> It has been hit by Yi Zhang (added to CC now, if he wants to provide
> more details)
> during an nvme drive firmware update.
> He also said that he is able to reproduce it reliably.
Yes, it was observed during NVMe SSD FW update, add the full log here
in case it helps.
[ 432.740505] BUG: sleeping function called from invalid context at
kernel/workqueue.c:3341
[ 432.748751] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
0, name: swapper/3
[ 432.756582] preempt_count: 10001, expected: 0
[ 432.760951] RCU nest depth: 0, expected: 0
[ 432.765059] no locks held by swapper/3/0.
[ 432.769081] irq event stamp: 505030
[ 432.772582] hardirqs last enabled at (505029):
[<ffffffff8c789e61>] cpuidle_idle_call+0x1f1/0x280
[ 432.781550] hardirqs last disabled at (505030):
[<ffffffff8ec3e059>] common_interrupt+0x19/0xd0
[ 432.790252] softirqs last enabled at (505010):
[<ffffffff8ec7d59b>] __do_softirq+0x5db/0x8f8
[ 432.798779] softirqs last disabled at (505003):
[<ffffffff8c657afc>] __irq_exit_rcu+0xbc/0x210
[ 432.807394] Preemption disabled at:
[ 432.807399] [<ffffffff8c56c8e0>] unwind_next_frame+0xb0/0x2030
[ 432.816736] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.7.0-rc5+ #2
[ 432.823011] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.9.3 08/05/2022
[ 432.830584] Call Trace:
[ 432.833048] <IRQ>
[ 432.835075] dump_stack_lvl+0x64/0xb0
[ 432.838754] __might_resched+0x367/0x590
[ 432.842702] ? __flush_work+0xb3/0x180
[ 432.846463] start_flush_work+0x2e/0x870
[ 432.850399] ? __flush_work+0xb3/0x180
[ 432.854167] ? __flush_work+0xb3/0x180
[ 432.857931] __flush_work+0xef/0x180
[ 432.861516] ? __flush_work+0xb3/0x180
[ 432.865284] ? __pfx___flush_work+0x10/0x10
[ 432.869519] ? srso_return_thunk+0x5/0x5f
[ 432.873535] ? find_held_lock+0x33/0x120
[ 432.877473] ? srso_return_thunk+0x5/0x5f
[ 432.881491] ? local_clock_noinstr+0xd/0xc0
[ 432.885688] ? srso_return_thunk+0x5/0x5f
[ 432.889708] ? find_held_lock+0x33/0x120
[ 432.893644] ? srso_return_thunk+0x5/0x5f
[ 432.897664] ? local_clock_noinstr+0xd/0xc0
[ 432.901866] ? srso_return_thunk+0x5/0x5f
[ 432.905887] ? try_to_grab_pending.part.0+0x23/0x5d0
[ 432.910862] ? __pfx___lock_acquired+0x10/0x10
[ 432.915328] __cancel_work_timer+0x31a/0x460
[ 432.919612] ? __pfx___cancel_work_timer+0x10/0x10
[ 432.924419] ? __wake_up+0x44/0x60
[ 432.927840] ? nvme_change_ctrl_state+0x148/0x370 [nvme_core]
[ 432.933626] ? srso_return_thunk+0x5/0x5f
[ 432.937651] ? do_raw_spin_unlock+0x59/0x1f0
[ 432.941935] ? srso_return_thunk+0x5/0x5f
[ 432.945953] ? _raw_spin_unlock_irqrestore+0x46/0x80
[ 432.950928] ? srso_return_thunk+0x5/0x5f
[ 432.954948] ? nvme_change_ctrl_state+0x148/0x370 [nvme_core]
[ 432.960735] nvme_complete_async_event+0x270/0x4e0 [nvme_core]
[ 432.966600] ? srso_return_thunk+0x5/0x5f
[ 432.970633] nvme_poll_cq+0x253/0x10d0 [nvme]
[ 432.975042] ? __pfx_nvme_irq+0x10/0x10 [nvme]
[ 432.979508] nvme_irq+0x91/0x100 [nvme]
[ 432.983364] ? __pfx_nvme_irq+0x10/0x10 [nvme]
[ 432.987820] ? __pfx___lock_release+0x10/0x10
[ 432.992197] ? __lock_acquired+0x207/0x7b0
[ 432.996314] ? srso_return_thunk+0x5/0x5f
[ 433.000333] ? rcu_is_watching+0x15/0xb0
[ 433.004274] __handle_irq_event_percpu+0x1f3/0x630
[ 433.009104] handle_irq_event+0xaa/0x1c0
[ 433.013047] handle_edge_irq+0x211/0xb70
[ 433.017005] __common_interrupt+0x99/0x180
[ 433.021126] common_interrupt+0xa8/0xd0
[ 433.024976] </IRQ>
[ 433.027088] <TASK>
[ 433.029212] asm_common_interrupt+0x26/0x40
[ 433.033405] RIP: 0010:mwait_idle+0x53/0x80
[ 433.037514] Code: 31 d2 48 89 d1 65 48 8b 04 25 40 54 20 00 0f 01
c8 48 8b 00 a8 08 75 13 eb 07 0f 00 2d 86 de 3e 00 31 c0 48 89 c1 fb
0f 01 c9 <fa> 65 48 8b 04 25 40 54 20 00 f0 80 60 02 df e9 09 a5 03 00
0f ae
[ 433.056268] RSP: 0018:ffffc9000035fde0 EFLAGS: 00000246
[ 433.061502] RAX: 0000000000000000 RBX: 1ffff9200006bfbe RCX: 0000000000000000
[ 433.068641] RDX: 0000000000000000 RSI: ffffffff8f0e82e0 RDI: ffffffff8f381020
[ 433.075786] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c79c009d
[ 433.082924] R10: ffff888e3ce004eb R11: 0000000000000000 R12: 0000000000000000
[ 433.090067] R13: ffff888100d437c0 R14: dffffc0000000000 R15: ffffffff911ca560
[ 433.097248] default_idle_call+0x67/0xa0
[ 433.101185] cpuidle_idle_call+0x1f1/0x280
[ 433.105298] ? __pfx_cpuidle_idle_call+0x10/0x10
[ 433.109933] ? srso_return_thunk+0x5/0x5f
[ 433.113950] ? tsc_verify_tsc_adjust+0x23/0x2e0
[ 433.118491] ? rcu_is_watching+0x15/0xb0
[ 433.122439] do_idle+0x13f/0x220
[ 433.125690] cpu_startup_entry+0x55/0x60
[ 433.129623] start_secondary+0x210/0x290
[ 433.133560] ? __pfx_start_secondary+0x10/0x10
[ 433.138018] ? srso_return_thunk+0x5/0x5f
[ 433.142048] ? soft_restart_cpu+0x15/0x15
[ 433.146078] secondary_startup_64_no_verify+0x17d/0x18b
[ 433.151354] </TASK>
[ 433.848442] nvme nvme2: controller capabilities changed, reset may
be required to take effect.
[ 447.145607] usbcore: registered new interface driver usb-storage
[ 448.847188] usb 3-1.1.2: new high-speed USB device number 5 using xhci_hcd
[ 448.944597] usb 3-1.1.2: New USB device found, idVendor=413c,
idProduct=0003, bcdDevice= 0.00
[ 448.953264] usb 3-1.1.2: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[ 448.960842] usb 3-1.1.2: Product: iDRAC VIRTUAL MEDIA
[ 448.965943] usb 3-1.1.2: Manufacturer: DELL INC.
[ 448.970595] usb 3-1.1.2: SerialNumber: 1028 123456
[ 448.985241] usb-storage 3-1.1.2:1.0: USB Mass Storage device detected
[ 448.995572] scsi host10: usb-storage 3-1.1.2:1.0
[ 449.057487] usbcore: registered new interface driver uas
[ 450.045215] scsi 10:0:0:0: Direct-Access Linux SECUPD
0001 PQ: 0 ANSI: 0
[ 450.061871] sd 10:0:0:0: Attached scsi generic sg1 type 0
[ 450.062322] sd 10:0:0:0: Power-on or device reset occurred
[ 450.075276] sd 10:0:0:0: [sdb] 61440 512-byte logical blocks: (31.5
MB/30.0 MiB)
[ 450.203155] sd 10:0:0:0: [sdb] Write Protect is off
[ 450.208140] sd 10:0:0:0: [sdb] Mode Sense: 23 00 00 00
[ 450.322600] sd 10:0:0:0: [sdb] No Caching mode page found
[ 450.328074] sd 10:0:0:0: [sdb] Assuming drive cache: write through
[ 450.584913] sdb: sdb1
[ 450.590688] sd 10:0:0:0: [sdb] Attached SCSI removable disk
[ 470.432864] usb 3-1.1.2: USB disconnect, device number 5
>
> Maurizio
>
--
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list