nvme-6.3: blktests nvme/003 null pointer for fc transport

Daniel Wagner dwagner at suse.de
Mon Feb 20 08:04:15 PST 2023


On Wed, Feb 01, 2023 at 09:10:03AM +0100, Daniel Wagner wrote:
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port

I've retested with v6.2.0 with

--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -1607,6 +1607,11 @@ static void __exit fcloop_exit(void)
                tport = __unlink_target_port(nport);
                rport = __unlink_remote_port(nport);

+                if (!tport && !rport) {
+                        pr_warn("%s: target port and remote port have already been freed\n", __func__);
+                        break;
+                }
+
                spin_unlock_irqrestore(&fcloop_lock, flags);

                ret = __targetport_unreg(nport, tport);

but I was not able to trigger it anymore. My guess is

0cab4404874f ("nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association")

fixed the problem.

Though I run into a lockdep warning:


 ================================
 WARNING: inconsistent lock state
 6.2.0+ #4 Not tainted
 --------------------------------
 inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 ksoftirqd/1/23 [HC0[0]:SC1[1]:HE0:SE0] takes:
 ffff88817e63a160 (&fq->mq_flush_lock){?.-.}-{2:2}, at: flush_end_io+0x69/0x460
 {IN-HARDIRQ-W} state was registered at:
   lock_acquire+0x128/0x270
   _raw_spin_lock_irqsave+0x55/0x90
   flush_end_io+0x69/0x460
   __blk_mq_end_request+0x16c/0x190
   virtblk_done+0x12e/0x220 [virtio_blk]
   vring_interrupt+0xc0/0x140
   __handle_irq_event_percpu+0x10c/0x230
   handle_irq_event+0x4e/0xb0
   handle_edge_irq+0x14a/0x3f0
   __common_interrupt+0x8a/0x120
   common_interrupt+0x7b/0x90
   asm_common_interrupt+0x22/0x40
   native_safe_halt+0xb/0x10
   default_idle+0xa/0x10
   default_idle_call+0x70/0xc0
   do_idle+0x185/0x450
   cpu_startup_entry+0x25/0x30
   start_secondary+0xbd/0xc0
   secondary_startup_64_no_verify+0xe6/0xeb
 irq event stamp: 781843
 hardirqs last  enabled at (781842): [<ffffffff8e69f6a1>] seqcount_lockdep_reader_access+0x91/0xa0
 hardirqs last disabled at (781843): [<ffffffff8f99ca0c>] _raw_spin_lock_irqsave+0x2c/0x90
 softirqs last  enabled at (781834): [<ffffffff8e52b143>] run_ksoftirqd+0x33/0x70
 softirqs last disabled at (781839): [<ffffffff8e52b143>] run_ksoftirqd+0x33/0x70

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&fq->mq_flush_lock);
   <Interrupt>
     lock(&fq->mq_flush_lock);

  *** DEADLOCK ***

 1 lock held by ksoftirqd/1/23:
  #0: ffff88817e63a160 (&fq->mq_flush_lock){?.-.}-{2:2}, at: flush_end_io+0x69/0x460

 stack backtrace:
 CPU: 1 PID: 23 Comm: ksoftirqd/1 Kdump: loaded Not tainted 6.2.0+ #4 7ac3eece8c11633fa2815efb5a20b1c5b55011f5
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x80
  print_usage_bug+0x6db/0x6f0
  ? print_lockdep_off+0x30/0x30
  ? pvclock_clocksource_read+0xec/0x180
  ? kvm_sched_clock_read+0x14/0x40
  ? sched_clock_cpu+0x6f/0x1f0
  ? rcu_read_lock_sched_held+0x34/0x70
  ? trace_irq_disable+0x47/0x110
  ? lockdep_hardirqs_on_prepare+0x17e/0x2e0
  ? mod_find+0x100/0x120
  ? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  ? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  ? __module_address+0xe1/0x100
  ? ftrace_graph_ret_addr+0x1d/0xa0
  ? orc_find+0x164/0x4f0
  ? pvclock_clocksource_read+0xec/0x180
  ? __update_load_avg_cfs_rq+0x34a/0x500
  ? kvm_sched_clock_read+0x14/0x40
  ? pvclock_clocksource_read+0xec/0x180
  ? kvm_sched_clock_read+0x14/0x40
  ? sched_clock_cpu+0x6f/0x1f0
  ? pvclock_clocksource_read+0xec/0x180
  ? mod_find+0x100/0x120
  ? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  ? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  ? __module_address+0xe1/0x100
  ? ftrace_graph_ret_addr+0x1d/0xa0
  ? orc_find+0x164/0x4f0
  ? unwind_next_frame+0xd55/0xd90
  ? ret_from_fork+0x1f/0x30
  ? stack_trace_save+0x130/0x130
  mark_lock_irq+0x7d0/0x7e0
  ? save_trace+0x4c0/0x4c0
  ? stack_trace_save+0xe0/0x130
  ? stack_trace_snprint+0xe0/0xe0
  ? lock_is_held_type+0x8b/0x110
  ? fcloop_fcp_op+0x92/0x310 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
  ? save_trace+0x447/0x4c0
  mark_lock+0x19c/0x220
  lockdep_hardirqs_on_prepare+0x169/0x2e0
  trace_hardirqs_on+0x34/0x40
  _raw_spin_unlock_irq+0x24/0x40
  fcloop_fcp_op+0x92/0x310 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
  ? fcloop_h2t_xmt_ls_rsp+0x150/0x150 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
  nvmet_fc_fcp_nvme_cmd_done+0x325/0x480 [nvmet_fc 14b6afa92b89f97b59495a77b43c19ce2c6642ff]
  __nvmet_req_complete+0x20e/0x270 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  ? nvmet_req_complete+0x30/0x30 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  nvmet_req_complete+0xe/0x30 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  nvmet_bio_done+0x38/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
  blk_update_request+0x343/0x6a0
  blk_mq_end_request+0x22/0x40
  blk_flush_complete_seq+0x22b/0x530
  flush_end_io+0x424/0x460
  __blk_mq_end_request+0x16c/0x190
  blk_done_softirq+0x95/0xb0
  __do_softirq+0x14b/0x369
  ? run_ksoftirqd+0x33/0x70
  ? ksoftirqd_should_run+0x20/0x20
  run_ksoftirqd+0x33/0x70
  smpboot_thread_fn+0x294/0x390
  ? cpu_report_death+0x170/0x170
  kthread+0x16d/0x190
  ? kthread_blkcg+0x60/0x60
  ret_from_fork+0x1f/0x30
  </TASK>

It looks similar to the reports/fixes from here:

b3c6a5997541 ("block: Fix a lockdep complaint triggered by request queue flushing")
fb01a2932e81 ("blk-mq: add new API of blk_mq_hctx_set_fq_lock_class")

Daniel



More information about the Linux-nvme mailing list