[PATCH] nvme-multipath: fix double initialization of ANA state

Martin Wilck mwilck at suse.com
Fri May 14 14:08:25 PDT 2021


Hello Christoph,

On Wed, 2021-05-12 at 16:53 +0200, Martin Wilck wrote:
> On Thu, 2021-05-06 at 15:48 +0200, Christoph Hellwig wrote:
> > nvme_init_identify and thus nvme_mpath_init can be called multiple
> > times and thus must not overwrite potentially initialized or in-use
> > fields.  Split out a helper for the basic initialization when the
> > controller is initialized and make sure the init_identify path does
> > not blindly change in-use data structures.
> > 
> > Fixes: 0d0b660f214d ("nvme: add ANA support")
> > Reported-by: Martin Wilck <mwilck at suse.com>
> > Signed-off-by: Christoph Hellwig <hch at lst.de>
> 
> Thank you. I'll prepare another test kernel for our partner.

Our partner reported a crash during NVMe controller initialization with
the kernel I built with this patch applied. I'm still looking at the
dump, and it's not impossible that I made a mistake backporting your
patch. But I thought I should inform you anyway.

[ 1010.869437] nvme-fabrics ctl: Failed to read smart log (error -5)
[ 1010.869444] nvme nvme0: queue_size 128 > ctrl sqsize 32, clamping down
[ 1010.879383] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.1.14:4420
[ 1010.929700] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 1011.041659] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 1011.041665] #PF: supervisor write access in kernel mode
[ 1011.041666] #PF: error_code(0x0002) - not-present page
[ 1011.041668] PGD 0 P4D 0 
[ 1011.041672] Oops: 0002 [#1] SMP PTI
[ 1011.041675] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Tainted: G               X    5.3.18-6.g7ea043c-default #1 SLE15-SP2 (unreleased)
[ 1011.041678] Hardware name: FUJITSU PRIMERGY RX2530 M2/D3279-B1, BIOS V5.0.0.11 R1.20.0 for D3279-B1x                    06/15/2018
[ 1011.041689] RIP: 0010:bio_copy_kern_endio_read+0xc6/0x130
[ 1011.041691] Code: c0 75 87 8b 4e 0c 44 89 df 89 ca 81 e1 ff 0f 00 00 c1 ea 0c 29 cf 48 c1 e2 06 89 f9 48 03 16 e9 6f ff ff ff 48 8b 3e 4c 89 c5 <49> 89 38 4a 8b 7c 0e f8 4b 89 7c 08 f8 49 8d 78 08 4d 01 c8 48 83
[ 1011.041695] RSP: 0018:ffffab41804c8ee8 EFLAGS: 00010212
[ 1011.041697] RAX: 0000000000000000 RBX: ffff9ff1b73e1500 RCX: 0000000000001000
[ 1011.041699] RDX: fffff2b810ce1240 RSI: ffff9ff1b3849000 RDI: 0000000000000000
[ 1011.041701] RBP: 0000000000000010 R08: 0000000000000010 R09: 0000000000001000
[ 1011.041702] R10: 0000000000000001 R11: 0000000000001000 R12: ffff9ff168ace140
[ 1011.041703] R13: 0000000000004810 R14: 0000000000000000 R15: 0000000000000000
[ 1011.041705] FS:  0000000000000000(0000) GS:ffff9ff1ff2c0000(0000) knlGS:0000000000000000
[ 1011.041707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1011.041708] CR2: 0000000000000010 CR3: 00000001de60a006 CR4: 00000000003606e0
[ 1011.041710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1011.041711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1011.041713] Call Trace:
[ 1011.041716]  <IRQ>
[ 1011.041722]  blk_update_request+0x8a/0x3a0
[ 1011.041726]  blk_mq_end_request+0x1a/0x130
[ 1011.041729]  blk_done_softirq+0x8f/0xc0
[ 1011.041736]  __do_softirq+0xe3/0x2dc
[ 1011.041744]  irq_exit+0xd5/0xe0
[ 1011.041747]  call_function_single_interrupt+0xf/0x20
[ 1011.041749]  </IRQ>

bio_copy_kern_endio_read() means that this was a command sent via 
__nvme_submit_sync_cmd(). I don't know yet which one.

Regards,
Martin







More information about the Linux-nvme mailing list