lockdep warning: fs_reclaim_acquire vs tcp_sendpage

Wed Oct 19 00:51:38 PDT 2022

Hi Sagi,

While working on something else I got the lockdep splat below. As this
is a dirty tree and not latest greatest it might be a false alarm.

I haven't really looked into yet, this is just to let you know that
there might be something going on.

Cheers,
Daniel

 ======================================================
 WARNING: possible circular locking dependency detected
 6.0.0-rc2+ #25 Tainted: G        W         
 ------------------------------------------------------
 kswapd0/92 is trying to acquire lock:
 ffff888114003240 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_sendpage+0x23/0xa0

 but task is already holding lock:
 ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire+0x11e/0x160
        kmem_cache_alloc_node+0x44/0x530
        __alloc_skb+0x158/0x230
        tcp_send_active_reset+0x7e/0x730
        tcp_disconnect+0x1272/0x1ae0
        __tcp_close+0x707/0xd90
        tcp_close+0x26/0x80
        inet_release+0xfa/0x220
        sock_release+0x85/0x1a0
        nvme_tcp_free_queue+0x1fd/0x470 [nvme_tcp]
        nvme_do_delete_ctrl+0x130/0x13d [nvme_core]
        nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
        kernfs_fop_write_iter+0x356/0x530
        vfs_write+0x4e8/0xce0
        ksys_write+0xfd/0x1d0
        do_syscall_64+0x58/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd

 -> #0 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
        __lock_acquire+0x2a0c/0x5690
        lock_acquire+0x18e/0x4f0
        lock_sock_nested+0x37/0xc0
        tcp_sendpage+0x23/0xa0
        inet_sendpage+0xad/0x120
        kernel_sendpage+0x156/0x440
        nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp]
        nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp]
        __blk_mq_try_issue_directly+0x452/0x660
        blk_mq_plug_issue_direct.constprop.0+0x207/0x700
        blk_mq_flush_plug_list+0x6f5/0xc70
        __blk_flush_plug+0x264/0x410
        blk_finish_plug+0x4b/0xa0
        shrink_lruvec+0x1263/0x1ea0
        shrink_node+0x736/0x1a80
        balance_pgdat+0x740/0x10d0
        kswapd+0x5f2/0xaf0
        kthread+0x256/0x2f0
        ret_from_fork+0x1f/0x30

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(fs_reclaim);
                                lock(sk_lock-AF_INET-NVME);
                                lock(fs_reclaim);
   lock(sk_lock-AF_INET-NVME);

  *** DEADLOCK ***

 3 locks held by kswapd0/92:
  #0: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0
  #1: ffff88811f21b0b0 (q->srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x6b3/0xc70
  #2: ffff888170b11470 (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp]

 stack backtrace:
 CPU: 7 PID: 92 Comm: kswapd0 Tainted: G        W          6.0.0-rc2+ #25 910779b354c48f37d01f55ab57fbca0c616a47fd
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x77
  check_noncircular+0x26e/0x320
  ? lock_chain_count+0x20/0x20
  ? print_circular_bug+0x1e0/0x1e0
  ? kvm_sched_clock_read+0x14/0x40
  ? sched_clock_cpu+0x69/0x240
  ? __bfs+0x317/0x6f0
  ? usage_match+0x110/0x110
  ? lockdep_lock+0xbe/0x1c0
  ? call_rcu_zapped+0xc0/0xc0
  __lock_acquire+0x2a0c/0x5690
  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
  ? lock_chain_count+0x20/0x20
  lock_acquire+0x18e/0x4f0
  ? tcp_sendpage+0x23/0xa0
  ? lock_downgrade+0x6c0/0x6c0
  ? __lock_acquire+0xd3f/0x5690
  lock_sock_nested+0x37/0xc0
  ? tcp_sendpage+0x23/0xa0
  tcp_sendpage+0x23/0xa0
  inet_sendpage+0xad/0x120
  kernel_sendpage+0x156/0x440
  nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp 9175a0e5b6247ff4e2c0da5432ec9d6d589fc288]
  ? lock_downgrade+0x6c0/0x6c0
  ? lock_release+0x6cd/0xd30
  ? nvme_tcp_state_change+0x150/0x150 [nvme_tcp 9175a0e5b6247ff4e2c0da5432ec9d6d589fc288]
  ? mutex_trylock+0x204/0x330
  ? nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp 9175a0e5b6247ff4e2c0da5432ec9d6d589fc288]
  ? ww_mutex_unlock+0x270/0x270
  nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp 9175a0e5b6247ff4e2c0da5432ec9d6d589fc288]
  ? kvm_sched_clock_read+0x14/0x40
  __blk_mq_try_issue_directly+0x452/0x660
  ? __blk_mq_get_driver_tag+0x980/0x980
  ? lock_downgrade+0x6c0/0x6c0
  blk_mq_plug_issue_direct.constprop.0+0x207/0x700
  ? __mem_cgroup_uncharge+0x140/0x140
  blk_mq_flush_plug_list+0x6f5/0xc70
  ? blk_mq_flush_plug_list+0x6b3/0xc70
  ? blk_mq_insert_requests+0x450/0x450
  __blk_flush_plug+0x264/0x410
  ? memset+0x1f/0x40
  ? __mem_cgroup_uncharge_list+0x84/0x150
  ? blk_start_plug_nr_ios+0x280/0x280
  blk_finish_plug+0x4b/0xa0
  shrink_lruvec+0x1263/0x1ea0
  ? reclaim_throttle+0x790/0x790
  ? sched_clock_cpu+0x69/0x240
  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
  ? lock_is_held_type+0xa9/0x120
  ? mem_cgroup_iter+0x2b2/0x780
  shrink_node+0x736/0x1a80
  balance_pgdat+0x740/0x10d0
  ? shrink_node+0x1a80/0x1a80
  ? lock_is_held_type+0xa9/0x120
  ? find_held_lock+0x34/0x120
  ? lock_is_held_type+0xa9/0x120
  ? reacquire_held_locks+0x4f0/0x4f0
  kswapd+0x5f2/0xaf0
  ? balance_pgdat+0x10d0/0x10d0
  ? destroy_sched_domains_rcu+0x60/0x60
  ? trace_hardirqs_on+0x2d/0x110
  ? __kthread_parkme+0x83/0x140
  ? balance_pgdat+0x10d0/0x10d0
  kthread+0x256/0x2f0
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>