[bug report][regression][bisected] most of blktests nvme/tcp failed with the last linux code
Hannes Reinecke
hare at suse.de
Sun Sep 22 23:31:19 PDT 2024
On 9/20/24 16:20, Yi Zhang wrote:
> + Hannes
> I did bisect and it seems was introduced with the below commit:
>
> commit 1e48b34c9bc79aa36700fccbfdf87e61e4431d2b
> Author: Hannes Reinecke <hare at suse.de>
> Date: Mon Jul 22 14:02:22 2024 +0200
>
> nvme: split off TLS sysfs attributes into a separate group
>
>
> On Thu, Sep 19, 2024 at 12:09 AM Yi Zhang <yi.zhang at redhat.com> wrote:
>>
>> Hello
>>
>> CKI reported most of the blktests nvme/tcp tests failed on the linux
>> tree[1], here is the reproducer and dmesg log, the issue cannot be
>> reproduced with 6.11.0, seems
>> it was introduced with the latest block code merge, please help check
>> it and let me know if you need any info/testing about it, thanks.
>>
>>
>> [1]
>> https://datawarehouse.cki-project.org/kcidb/tests/14394423
>>
>> [2]
>> # nvme_trtype=tcp ./check nvme/003
>> nvme/003 (tr=tcp) (test if we're sending keep-alives to a discovery
>> controller) [failed]
>> runtime 11.280s ... 11.188s
>> --- tests/nvme/003.out 2024-09-18 11:30:11.243366401 -0400
>> +++ /root/blktests/results/nodev_tr_tcp/nvme/003.out.bad
>> 2024-09-18 11:52:32.977112834 -0400
>> @@ -1,3 +1,3 @@
>> Running nvme/003
>> -disconnected 1 controller(s)
>> +disconnected 0 controller(s)
>> Test complete
>> # dmesg
>> [ 447.213539] run blktests nvme/003 at 2024-09-18 11:52:21
>> [ 447.229285] loop0: detected capacity change from 0 to 2097152
>> [ 447.233104] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [ 447.242398] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>> [ 447.251089] sysfs: cannot create duplicate filename
>> '/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller'
>> [ 447.251810] CPU: 2 UID: 0 PID: 5241 Comm: nvme Kdump: loaded Not
>> tainted 6.12.0-0.rc0.adfc3ded5c33.2.test.el10.aarch64 #1
>> [ 447.252540] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>> [ 447.253006] Call trace:
>> [ 447.253171] dump_backtrace+0xd8/0x130
>> [ 447.253432] show_stack+0x20/0x38
>> [ 447.253657] dump_stack_lvl+0x80/0xa8
>> [ 447.253925] dump_stack+0x18/0x30
>> [ 447.254152] sysfs_warn_dup+0x6c/0x90
>> [ 447.254406] sysfs_add_file_mode_ns+0x12c/0x138
>> [ 447.254713] create_files+0xa8/0x1f8
>> [ 447.254973] internal_create_group+0x18c/0x358
>> [ 447.255274] internal_create_groups+0x58/0xe0
>> [ 447.255558] sysfs_create_groups+0x20/0x40
>> [ 447.255826] device_add_attrs+0x19c/0x218
>> [ 447.256093] device_add+0x310/0x6d0
>> [ 447.256327] cdev_device_add+0x58/0xc0
>> [ 447.256579] nvme_add_ctrl+0x78/0xd0 [nvme_core]
>> [ 447.256895] nvme_tcp_create_ctrl+0x3c/0x178 [nvme_tcp]
>> [ 447.257248] nvmf_create_ctrl+0x150/0x288 [nvme_fabrics]
>> [ 447.257614] nvmf_dev_write+0x98/0xf8 [nvme_fabrics]
>> [ 447.257948] vfs_write+0xdc/0x380
>> [ 447.258174] ksys_write+0x7c/0x120
>> [ 447.258408] __arm64_sys_write+0x24/0x40
>> [ 447.258673] invoke_syscall.constprop.0+0x74/0xd0
>> [ 447.258994] do_el0_svc+0xb0/0xe8
>> [ 447.259225] el0_svc+0x44/0x1a0
>> [ 447.259449] el0t_64_sync_handler+0x120/0x130
>> [ 447.259745] el0t_64_sync+0x1a4/0x1a8
>>
>> --
>> Best Regards,
>> Yi Zhang
>
>
>
How utterly curious.
This mentioned patch moves some sysfs attributes to a different location
in the code. The stacktrace you've posted indicates that we're creating
a controller while the previous one is still present in sysfs, ie that
the lifetime of the controller has changed.
I find it difficult to understand how the cited path could have changed
the lifetime of the controller object, but will continue to check.
Does the error disappear if you just revert the cited patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list