[PATCH] nvme-multipath: fix double initialization of ANA state

Fri May 14 16:05:47 PDT 2021

> 
> Our partner reported a crash during NVMe controller initialization with
> the kernel I built with this patch applied. I'm still looking at the
> dump, and it's not impossible that I made a mistake backporting your
> patch. But I thought I should inform you anyway.

Strange, the log indicates the discovery controller, which should not
even have any ANA related activity...

> 
> [ 1010.869437] nvme-fabrics ctl: Failed to read smart log (error -5)
> [ 1010.869444] nvme nvme0: queue_size 128 > ctrl sqsize 32, clamping down
> [ 1010.879383] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.1.14:4420
> [ 1010.929700] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> [ 1011.041659] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [ 1011.041665] #PF: supervisor write access in kernel mode
> [ 1011.041666] #PF: error_code(0x0002) - not-present page
> [ 1011.041668] PGD 0 P4D 0
> [ 1011.041672] Oops: 0002 [#1] SMP PTI
> [ 1011.041675] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Tainted: G               X    5.3.18-6.g7ea043c-default #1 SLE15-SP2 (unreleased)
> [ 1011.041678] Hardware name: FUJITSU PRIMERGY RX2530 M2/D3279-B1, BIOS V5.0.0.11 R1.20.0 for D3279-B1x                    06/15/2018
> [ 1011.041689] RIP: 0010:bio_copy_kern_endio_read+0xc6/0x130
> [ 1011.041691] Code: c0 75 87 8b 4e 0c 44 89 df 89 ca 81 e1 ff 0f 00 00 c1 ea 0c 29 cf 48 c1 e2 06 89 f9 48 03 16 e9 6f ff ff ff 48 8b 3e 4c 89 c5 <49> 89 38 4a 8b 7c 0e f8 4b 89 7c 08 f8 49 8d 78 08 4d 01 c8 48 83
> [ 1011.041695] RSP: 0018:ffffab41804c8ee8 EFLAGS: 00010212
> [ 1011.041697] RAX: 0000000000000000 RBX: ffff9ff1b73e1500 RCX: 0000000000001000
> [ 1011.041699] RDX: fffff2b810ce1240 RSI: ffff9ff1b3849000 RDI: 0000000000000000
> [ 1011.041701] RBP: 0000000000000010 R08: 0000000000000010 R09: 0000000000001000
> [ 1011.041702] R10: 0000000000000001 R11: 0000000000001000 R12: ffff9ff168ace140
> [ 1011.041703] R13: 0000000000004810 R14: 0000000000000000 R15: 0000000000000000
> [ 1011.041705] FS:  0000000000000000(0000) GS:ffff9ff1ff2c0000(0000) knlGS:0000000000000000
> [ 1011.041707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1011.041708] CR2: 0000000000000010 CR3: 00000001de60a006 CR4: 00000000003606e0
> [ 1011.041710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1011.041711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1011.041713] Call Trace:
> [ 1011.041716]  <IRQ>
> [ 1011.041722]  blk_update_request+0x8a/0x3a0
> [ 1011.041726]  blk_mq_end_request+0x1a/0x130
> [ 1011.041729]  blk_done_softirq+0x8f/0xc0
> [ 1011.041736]  __do_softirq+0xe3/0x2dc
> [ 1011.041744]  irq_exit+0xd5/0xe0
> [ 1011.041747]  call_function_single_interrupt+0xf/0x20
> [ 1011.041749]  </IRQ>
> 
> bio_copy_kern_endio_read() means that this was a command sent via
> __nvme_submit_sync_cmd(). I don't know yet which one.
> 
> Regards,
> Martin
> 
> 
> 
>