[bug report][regression][bisected] most of blktests nvme/tcp failed with the last linux code

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Mon Sep 23 23:38:36 PDT 2024


On Sep 23, 2024 / 08:31, Hannes Reinecke wrote:
[...]
> How utterly curious.
> This mentioned patch moves some sysfs attributes to a different location in
> the code. The stacktrace you've posted indicates that we're creating a
> controller while the previous one is still present in sysfs, ie that the
> lifetime of the controller has changed.
> I find it difficult to understand how the cited path could have changed
> the lifetime of the controller object, but will continue to check.

I tried to recreate the failure, and observed a very similar but different
symptom. Kernel reported the KASAN BUG global-out-of-bounds, in
create_files() [3]. I confirmed that this symptom is triggered with the commit
1e48b34c9bc7.

> 
> Does the error disappear if you just revert the cited patch?

As for the KASAN BUG observed on my test system, yes. To be precise, I needed to
revert two dependent commits 02a3688c53d6 and f5eb7397471 together with
1e48b34c9bc7. With these reverts, the BUG went away.


The BUG message provides some more clues. The BUG happened at line 54 of
fs/sysfs/group.c, and it indicated that the loop to access grp->attrs went
beyond the allocated memory. And I noticed that the commit 1e48b34c9bc7
introduced the array nvme_tls_attrs but the array is not null terminated. I
created a quick fix patch to add the null terminator [4], and confirmed the
BUG goes away.

Yi, I suggest to try out the patch [4] and see if it avoids the failure you
observe in the CKI system.



[3]

[  717.749561] [   T1361] run blktests nvme/003 at 2024-09-20 10:48:30
[  717.947434] [   T1407] loop0: detected capacity change from 0 to 2097152
[  718.007387] [   T1410] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  718.200391] [   T1417] block nvme0n1: No UUID available providing old NGUID
[  718.246928] [   T1417] ==================================================================
[  718.254856] [   T1417] BUG: KASAN: global-out-of-bounds in create_files+0x3d8/0x4c8
[  718.262257] [   T1417] Read of size 8 at addr ffffdcc298356e38 by task nvme/1417

[  718.271572] [   T1417] CPU: 13 UID: 0 PID: 1417 Comm: nvme Not tainted 6.11.0-kts+ #3
[  718.279140] [   T1417] Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Aug  9 2021
[  718.287919] [   T1417] Call trace:
[  718.291053] [   T1417]  dump_backtrace+0xdc/0x138
[  718.295496] [   T1417]  show_stack+0x20/0x38
[  718.299502] [   T1417]  dump_stack_lvl+0x70/0x98
[  718.303859] [   T1417]  print_address_description.constprop.0+0x90/0x320
[  718.310300] [   T1417]  print_report+0x108/0x1f8
[  718.314654] [   T1417]  kasan_report+0xb8/0x110
[  718.318922] [   T1417]  __asan_report_load8_noabort+0x20/0x30
[  718.324404] [   T1417]  create_files+0x3d8/0x4c8
[  718.328757] [   T1417]  internal_create_group+0x354/0x7c8
[  718.333891] [   T1417]  internal_create_groups+0x88/0x140
[  718.339025] [   T1417]  sysfs_create_groups+0x20/0x40
[  718.343811] [   T1417]  device_add_attrs+0x35c/0x478
[  718.348514] [   T1417]  device_add+0x540/0xfc8
[  718.352693] [   T1417]  cdev_device_add+0xdc/0x208
[  718.357221] [   T1417]  nvme_add_ctrl+0x120/0x238 [nvme_core]
[  718.362754] [   T1417]  nvme_loop_create_ctrl+0x210/0xad0 [nvme_loop]
[  718.368934] [   T1417]  nvmf_create_ctrl+0x318/0x840 [nvme_fabrics]
[  718.374945] [   T1417]  nvmf_dev_write+0xdc/0x170 [nvme_fabrics]
[  718.380692] [   T1417]  vfs_write+0x188/0x5a8
[  718.384785] [   T1417]  ksys_write+0xfc/0x1f8
[  718.388877] [   T1417]  __arm64_sys_write+0x74/0xb8
[  718.393490] [   T1417]  invoke_syscall+0xd8/0x260
[  718.397929] [   T1417]  el0_svc_common.constprop.0+0xb4/0x240
[  718.403411] [   T1417]  do_el0_svc+0x48/0x68
[  718.407416] [   T1417]  el0_svc+0x50/0x150
[  718.411247] [   T1417]  el0t_64_sync_handler+0x120/0x130
[  718.416294] [   T1417]  el0t_64_sync+0x194/0x198

[  718.422826] [   T1417] The buggy address belongs to the variable:
[  718.428651] [   T1417]  nvme_tls_attrs+0x18/0xffffffffffff21e0 [nvme_core]

[  718.437514] [   T1417] The buggy address belongs to the virtual mapping at
                           [ffffdcc29834a000, ffffdcc298374000) created by:
                           move_module+0x33c/0x5c8

[  718.456989] [   T1417] The buggy address belongs to the physical page:
[  718.463249] [   T1417] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20b8b73
[  718.472033] [   T1417] flags: 0x2fffff00000000(node=0|zone=2|lastcpupid=0xfffff)
[  718.479170] [   T1417] raw: 002fffff00000000 0000000000000000 dead000000000122 0000000000000000
[  718.487602] [   T1417] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  718.496033] [   T1417] page dumped because: kasan: bad access detected

[  718.504471] [   T1417] Memory state around the buggy address:
[  718.509950] [   T1417]  ffffdcc298356d00: f9 f9 f9 f9 00 00 00 00 00 00 00 f9 f9 f9 f9 f9
[  718.517860] [   T1417]  ffffdcc298356d80: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 00 00 00 f9
[  718.525769] [   T1417] >ffffdcc298356e00: f9 f9 f9 f9 00 00 00 f9 f9 f9 f9 f9 00 00 00 00
[  718.533678] [   T1417]                                         ^
[  718.539418] [   T1417]  ffffdcc298356e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  718.547327] [   T1417]  ffffdcc298356f00: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
[  718.555236] [   T1417] ==================================================================
[  718.563206] [   T1417] Disabling lock debugging due to kernel taint
[  718.570790] [    T477] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  718.588532] [   T1417] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
[  730.521978] [   T1514] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"


[4]

diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index eb345551d6fe..b68a9e5f1ea3 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -767,6 +767,7 @@ static struct attribute *nvme_tls_attrs[] = {
 	&dev_attr_tls_key.attr,
 	&dev_attr_tls_configured_key.attr,
 	&dev_attr_tls_keyring.attr,
+	NULL,
 };
 
 static umode_t nvme_tls_attrs_are_visible(struct kobject *kobj,
-- 
2.46.1



More information about the Linux-nvme mailing list