blktests failures with v6.10-rc1 kernel

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Mon Jun 3 23:57:20 PDT 2024


On May 31, 2024 / 13:46, Bart Van Assche wrote:
> On 5/31/24 13:35, Zhu Yanjun wrote:
> > On Fri, May 31, 2024 at 10:08 PM Bart Van Assche <bvanassche at acm.org> wrote:
> > > 
> > > On 5/31/24 13:06, Zhu Yanjun wrote:
> > > > On Fri, May 31, 2024 at 10:01 PM Bart Van Assche <bvanassche at acm.org> wrote:
> > > > > 
> > > > > On 5/31/24 07:35, Zhu Yanjun wrote:
> > > > > > IIRC, the problem with srp/002, 011 also occurs with siw driver, do you make
> > > > > > tests with siw driver to verify whether the problem with srp/002, 011 is also > fixed or not?
> > > > > 
> > > > > I have not yet seen any failures of any of the SRP tests when using the siw driver.
> > > > > What am I missing?
> > >   >
> > >   > (left out a bunch of forwarded emails)
> > > 
> > > Forwarding emails is not useful, especially if these emails do not answer the question
> > > that I asked.
> > 
> > Bob had made tests with siw. From his mail, it seems that the similar
> > problem also occurs with SIW.
> 
> I'm not aware of anyone other than Bob having reported failures of the SRP tests
> in combination with the siw driver.

I had the same understanding as Bart, and was not aware of the failure with the
siw driver. To confirm it, I tried the repeated srp/002 test run with the siw
driver and the kernel v6.10-rc2. I expected no failure, but alas, a different
failure symptom was observed. It failed due to KASAN slab-use-after-free [1].

I'm not sure if this is the problem that Bob observed. I guess this is the same,
old issue that I reported in the past [2]. I will send out a bug report to the
linux-rdma list with some more details.

[1]

...
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000006d1c31fe with status 5
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000916ce050 with status 5
Jun 04 09:23:11 testnode2 kernel: ==================================================================
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001770ef1b with status 5
Jun 04 09:23:11 testnode2 kernel: BUG: KASAN: slab-use-after-free in __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: Read of size 8 at addr ffff888131a3e418 by task kworker/u16:6/1345
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: CPU: 1 PID: 1345 Comm: kworker/u16:6 Not tainted 6.10.0-rc2+ #288
Jun 04 09:23:11 testnode2 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Jun 04 09:23:11 testnode2 kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Jun 04 09:23:11 testnode2 kernel: Call Trace:
Jun 04 09:23:11 testnode2 kernel:  <TASK>
Jun 04 09:23:11 testnode2 kernel:  dump_stack_lvl+0x6a/0x90
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f727e5c2 with status 5
Jun 04 09:23:11 testnode2 kernel:  ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel:  print_report+0x174/0x505
Jun 04 09:23:11 testnode2 kernel:  ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel:  ? __virt_addr_valid+0x1b9/0x400
Jun 04 09:23:11 testnode2 kernel:  ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel:  kasan_report+0xa7/0x180
Jun 04 09:23:11 testnode2 kernel:  ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel:  __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel:  ? cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel:  ? __lock_acquire+0x139d/0x5d60
Jun 04 09:23:11 testnode2 kernel:  ? __pfx___mutex_lock+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  ? mark_lock+0xf5/0x1580
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_mark_lock+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000009bc71497 with status 5
Jun 04 09:23:11 testnode2 kernel:  ? cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 0000000041c0fa4b with status 5
Jun 04 09:23:11 testnode2 kernel:  cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_cma_iw_handler+0x10/0x10 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel:  ? mark_held_locks+0x94/0xe0
Jun 04 09:23:11 testnode2 kernel:  ? _raw_spin_unlock_irqrestore+0x4c/0x60
Jun 04 09:23:11 testnode2 kernel:  cm_work_handler+0xb54/0x1c50 [iw_cm]
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_cm_work_handler+0x10/0x10 [iw_cm]
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_lock_release+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f48094cb with status 5
Jun 04 09:23:11 testnode2 kernel:  process_one_work+0x865/0x1410
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001c3faa8a with status 5
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_lock_acquire+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_process_one_work+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  ? assign_work+0x16c/0x240
Jun 04 09:23:11 testnode2 kernel:  ? lock_is_held_type+0xd5/0x130
Jun 04 09:23:11 testnode2 kernel:  worker_thread+0x5e2/0x1010
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_worker_thread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  kthread+0x2d1/0x3a0
Jun 04 09:23:11 testnode2 kernel:  ? _raw_spin_unlock_irq+0x24/0x50
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_kthread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  ret_from_fork+0x30/0x70
Jun 04 09:23:11 testnode2 kernel:  ? __pfx_kthread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel:  ret_from_fork_asm+0x1a/0x30
Jun 04 09:23:11 testnode2 kernel:  </TASK>
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: Allocated by task 75327:
Jun 04 09:23:11 testnode2 kernel:  kasan_save_stack+0x2c/0x50
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001bd9ea09 with status 5
Jun 04 09:23:11 testnode2 kernel:  kasan_save_track+0x10/0x30
Jun 04 09:23:11 testnode2 kernel:  __kasan_kmalloc+0xa6/0xb0
Jun 04 09:23:11 testnode2 kernel:  __rdma_create_id+0x5b/0x5d0 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel:  __rdma_create_kernel_id+0x12/0x40 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel:  srp_new_rdma_cm_id+0x7c/0x200 [ib_srp]
Jun 04 09:23:11 testnode2 kernel:  add_target_store+0x135e/0x29f0 [ib_srp]
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000005afc8065 with status 5
Jun 04 09:23:11 testnode2 kernel:  kernfs_fop_write_iter+0x3a4/0x5a0
Jun 04 09:23:11 testnode2 kernel:  vfs_write+0x5e3/0xe70
Jun 04 09:23:11 testnode2 kernel:  ksys_write+0xf7/0x1d0
Jun 04 09:23:11 testnode2 kernel:  do_syscall_64+0x93/0x180
Jun 04 09:23:11 testnode2 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: Freed by task 66344:
Jun 04 09:23:11 testnode2 kernel:  kasan_save_stack+0x2c/0x50
Jun 04 09:23:11 testnode2 kernel:  kasan_save_track+0x10/0x30
Jun 04 09:23:11 testnode2 kernel:  kasan_save_free_info+0x37/0x60
Jun 04 09:23:11 testnode2 kernel:  poison_slab_object+0x109/0x180
Jun 04 09:23:11 testnode2 kernel:  __kasan_slab_free+0x2e/0x50
Jun 04 09:23:11 testnode2 kernel:  kfree+0x11a/0x390
Jun 04 09:23:11 testnode2 kernel:  srp_free_ch_ib+0x895/0xc80 [ib_srp]
Jun 04 09:23:11 testnode2 kernel:  srp_remove_work+0x309/0x6c0 [ib_srp]
Jun 04 09:23:11 testnode2 kernel:  process_one_work+0x865/0x1410
Jun 04 09:23:11 testnode2 kernel:  worker_thread+0x5e2/0x1010
Jun 04 09:23:11 testnode2 kernel:  kthread+0x2d1/0x3a0
Jun 04 09:23:11 testnode2 kernel:  ret_from_fork+0x30/0x70
Jun 04 09:23:11 testnode2 kernel:  ret_from_fork_asm+0x1a/0x30
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the object at ffff888131a3e000
                                   which belongs to the cache kmalloc-2k of size 2048
Jun 04 09:23:11 testnode2 kernel: The buggy address is located 1048 bytes inside of
                                   freed 2048-byte region [ffff888131a3e000, ffff888131a3e800)
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the physical page:
Jun 04 09:23:11 testnode2 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131a38000 pfn:0x131a38
Jun 04 09:23:11 testnode2 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
Jun 04 09:23:11 testnode2 kernel: flags: 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
Jun 04 09:23:11 testnode2 kernel: page_type: 0xffffefff(slab)
Jun 04 09:23:11 testnode2 kernel: raw: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010
Jun 04 09:23:11 testnode2 kernel: raw: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010
Jun 04 09:23:11 testnode2 kernel: head: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000003 ffffea0004c68e01 ffffffffffffffff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: page dumped because: kasan: bad access detected
Jun 04 09:23:11 testnode2 kernel: 
Jun 04 09:23:11 testnode2 kernel: Memory state around the buggy address:
Jun 04 09:23:11 testnode2 kernel:  ffff888131a3e300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel:  ffff888131a3e380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: >ffff888131a3e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel:                             ^
Jun 04 09:23:11 testnode2 kernel:  ffff888131a3e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel:  ffff888131a3e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: ==================================================================
Jun 04 09:23:11 testnode2 kernel: Disabling lock debugging due to kernel taint
Jun 04 09:23:11 testnode2 kernel: device-mapper: multipath: 253:2: Failing path 8:80.
Jun 04 09:23:11 testnode2 kernel: device-mapper: uevent: dm_send_uevents: skipping sending uevent for lost device
...

[2] https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/


More information about the Linux-nvme mailing list