[PATCH] nvme: test nvmet-wq sysfs interface

Thu Nov 7 21:00:29 PST 2024

On Nov 06, 2024 / 13:13, Daniel Wagner wrote:
> On Wed, Nov 06, 2024 at 03:46:26AM GMT, Chaitanya Kulkarni wrote:
> > On 11/4/24 23:02, Daniel Wagner wrote:
> > > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
> > >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload 
> > > What do you want to test here? What's the objective? I don't see any 
> > > block layer related parts or do I miss something? 
> > 
> > For NVMeOF target we have exported nvmet-wq to userspace with
> > recent patch. This behavior is new and I don't think there is any test
> > exists that runs the fio workload while changing the CPUMASK randomly
> > to ensure the stability for nvmet-wq when application is using the block
> > layer. Hence we need the test for latest patch we have added to the
> > 6.13.
> 
> Though this is not a performance measurement just a functional testing if
> the scheduler works. I don't think we should add such tests
> because it will add to the overall runtime for little benefit. I am
> pretty sure there already tests (e.g. kunit) which do more elaborate
> scheduler tests.

When I ran the test case on my test system using the kernel v6.12-rc6 and the
patch "nvmet: make nvmet_wq visible in sysfs", I observed a kernel INFO [1].
This INFO looks indicating a bug in nvmet driver. Is this a known issue? The
INFO is stably recreated with loop, rdma and tcp transports on my test system.

If this is a new, unknown issue, this test case a good test to exercise nvmet
driver. It looks worth adding to blktests.

As for the runtime cost, I think the test case can be modified to reflect the
TIMEOUT variable users set. This will allow users to control the runtime cost to
some extent.

[1]

[   87.365354][  T963] run blktests nvme/055 at 2024-11-08 09:32:42
[   87.433572][ T1011] loop0: detected capacity change from 0 to 2097152
[   87.452511][ T1014] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   87.478018][ T1019] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   87.565565][  T226] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   87.570700][ T1026] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   87.573064][ T1026] nvme nvme1: creating 4 I/O queues.
[   87.577041][ T1026] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   87.580437][ T1026] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[  101.337422][ T1073] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  101.383814][  T104] INFO: trying to register non-static key.
[  101.384456][  T104] The code is fine but needs lockdep annotation, or maybe
[  101.385136][  T104] you didn't initialize this object before use?
[  101.385758][  T104] turning off the locking correctness validator.
[  101.386392][  T104] CPU: 1 UID: 0 PID: 104 Comm: kworker/u16:4 Not tainted 6.12.0-rc6+ #361
[  101.387197][  T104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/204
[  101.388109][  T104] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[  101.388868][  T104] Call Trace:
[  101.389203][  T104]  <TASK>
[  101.389484][  T104]  dump_stack_lvl+0x6a/0x90
[  101.389927][  T104]  register_lock_class+0xe2a/0x10a0
[  101.390444][  T104]  ? __lock_acquire+0xd1b/0x5f20
[  101.390921][  T104]  ? __pfx_register_lock_class+0x10/0x10
[  101.391479][  T104]  __lock_acquire+0x81e/0x5f20
[  101.391939][  T104]  ? lock_is_held_type+0xd5/0x130
[  101.392434][  T104]  ? find_held_lock+0x2d/0x110
[  101.392891][  T104]  ? __pfx___lock_acquire+0x10/0x10
[  101.393399][  T104]  ? lock_release+0x460/0x7a0
[  101.393853][  T104]  ? __pfx_lock_release+0x10/0x10
[  101.394351][  T104]  lock_acquire.part.0+0x12d/0x360
[  101.394841][  T104]  ? xa_erase+0xd/0x30
[  101.395255][  T104]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  101.395793][  T104]  ? rcu_is_watching+0x11/0xb0
[  101.396271][  T104]  ? trace_lock_acquire+0x12f/0x1a0
[  101.396770][  T104]  ? __pfx___flush_work+0x10/0x10
[  101.397266][  T104]  ? xa_erase+0xd/0x30
[  101.397658][  T104]  ? lock_acquire+0x2d/0xc0
[  101.398089][  T104]  ? xa_erase+0xd/0x30
[  101.398501][  T104]  _raw_spin_lock+0x2f/0x40
[  101.398933][  T104]  ? xa_erase+0xd/0x30
[  101.400167][  T104]  xa_erase+0xd/0x30
[  101.401359][  T104]  nvmet_ctrl_destroy_pr+0x10e/0x1c0 [nvmet]
[  101.402767][  T104]  ? __pfx_nvmet_ctrl_destroy_pr+0x10/0x10 [nvmet]
[  101.404235][  T104]  ? __pfx___might_resched+0x10/0x10
[  101.405558][  T104]  nvmet_ctrl_free+0x2f0/0x830 [nvmet]
[  101.406900][  T104]  ? lockdep_hardirqs_on+0x78/0x100
[  101.408192][  T104]  ? __cancel_work+0x166/0x230
[  101.409398][  T104]  ? __pfx_nvmet_ctrl_free+0x10/0x10 [nvmet]
[  101.410726][  T104]  ? rcu_is_watching+0x11/0xb0
[  101.411938][  T104]  ? kfree+0x13e/0x4a0
[  101.413073][  T104]  ? lockdep_hardirqs_on+0x78/0x100
[  101.414318][  T104]  nvmet_sq_destroy+0x1f2/0x3a0 [nvmet]
[  101.415556][  T104]  nvmet_tcp_release_queue_work+0x4c0/0xe40 [nvmet_tcp]
[  101.416912][  T104]  process_one_work+0x85a/0x1460
[  101.418064][  T104]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  101.419266][  T104]  ? __pfx_process_one_work+0x10/0x10
[  101.420427][  T104]  ? assign_work+0x16c/0x240
[  101.421486][  T104]  ? lock_is_held_type+0xd5/0x130
[  101.422555][  T104]  worker_thread+0x5e2/0xfc0
[  101.423574][  T104]  ? __pfx_worker_thread+0x10/0x10
[  101.424639][  T104]  kthread+0x2d1/0x3a0
[  101.425587][  T104]  ? _raw_spin_unlock_irq+0x24/0x50
[  101.426644][  T104]  ? __pfx_kthread+0x10/0x10
[  101.427637][  T104]  ret_from_fork+0x30/0x70
[  101.428596][  T104]  ? __pfx_kthread+0x10/0x10
[  101.429562][  T104]  ret_from_fork_asm+0x1a/0x30
[  101.430535][  T104]  </TASK>