Kernel panic is seen while running iozone over multiple ports and toggling the link on TOT kernel

Sagi Grimberg sagi at grimberg.me
Tue Sep 29 04:05:40 EDT 2020


> Hi all,
> I observed the following trace with TOT linux kernel while running NVMF over multiple ports and toggling the link/interface. Issue is observed very intermittently.

What is TOT linux kernel?

> Attached is the target configuration on actual disks.
> 
> On host, started iozone then toggling the multiport interface1 for 5 seconds and interface2 for 8 seconds one after the other in a loop
> Observed the below kernel panic after couple of hours.
> 
> [142799.524961] BUG: kernel NULL pointer dereference, address: 0000000000000198
> [142799.524965] #PF: supervisor write access in kernel mode
> [142799.524966] #PF: error_code(0x0002) - not-present page
> [142799.524967] PGD 0 P4D 0
> [142799.524970] Oops: 0002 [#1] SMP PTI
> [142799.524973] CPU: 1 PID: 16 Comm: ksoftirqd/1 Kdump: loaded Tainted: G S      W         5.9.0-rc6 #1
> [142799.524974] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0 01/28/2016
> [142799.524980] RIP: 0010:blk_mq_free_request+0x80/0x110
> [142799.524982] Code: 00 00 00 00 8b 53 18 b8 01 00 00 00 84 d2 74 0b 31 c0 81 e2 00 08 06 00 0f 95 c0 48 83 84 c5 80 00 00 00 01 f6 43 1c 40 74 08 <f0> 41 ff 8d 98 01 00 00 8b 05 5a 4a c4 01 85 c0 75 5e 49 8b 7c 24
> [142799.524983] RSP: 0018:ffffbb96c0123dc0 EFLAGS: 00010202
> [142799.524984] RAX: 0000000000000000 RBX: ffff9e6b70f60280 RCX: 0000000000000018
> [142799.524986] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff9e6b70f60280
> [142799.524987] RBP: ffffdb96be8fd400 R08: 0000000000000000 R09: 0000000000000000
> [142799.524988] R10: 00513cf381a1baf1 R11: 0000000000000000 R12: ffff9e6b590f7698
> [142799.524989] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff948050c0
> [142799.524990] FS:  0000000000000000(0000) GS:ffff9e6befc40000(0000) knlGS:0000000000000000
> [142799.524992] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [142799.524993] CR2: 0000000000000198 CR3: 00000001d6a0a001 CR4: 00000000003706e0
> [142799.524994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [142799.524995] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [142799.524995] Call Trace:
> [142799.525006]  nvme_keep_alive_end_io+0x15/0x70 [nvme_core]
> [142799.525011]  nvme_rdma_complete_rq+0x68/0xc0 [nvme_rdma]
> [142799.525014]  ? set_next_entity+0xae/0x1f0
> [142799.525016]  blk_done_softirq+0x95/0xc0
> [142799.525021]  __do_softirq+0xde/0x2ec
> [142799.525025]  ? sort_range+0x20/0x20
> [142799.525029]  run_ksoftirqd+0x1a/0x20
> [142799.525031]  smpboot_thread_fn+0xc5/0x160
> [142799.525034]  kthread+0x116/0x130
> [142799.525036]  ? kthread_park+0x80/0x80
> [142799.525040]  ret_from_fork+0x22/0x30
> 
> Keepalive request structure is freed and is accessed by nvme_keep_alive_end_io. looks like a probable race to me.
> 
> Thanks,
> Dakshaja
> 



More information about the Linux-nvme mailing list