[PATCH net v3 1/8] net/handshake: Use spin_lock_bh for hn_lock
Hannes Reinecke
hare at suse.de
Wed May 27 10:32:21 PDT 2026
On 5/27/26 16:24, Chuck Lever wrote:
>
>
> On Wed, May 27, 2026, at 4:59 AM, Hannes Reinecke wrote:
>> On 5/25/26 18:51, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever at oracle.com>
>>>
>>> nvmet_tcp_state_change(), a socket callback that runs in BH context,
>>> can reach handshake_req_cancel() via nvmet_tcp_schedule_release_queue()
>>> and tls_handshake_cancel(). handshake_req_cancel() acquires
>>> hn->hn_lock with plain spin_lock(). If a process-context thread on
>>> the same CPU holds hn->hn_lock when a softirq invokes the cancel path,
>>> the lock attempt deadlocks. This is the only caller that invokes
>>> tls_handshake_cancel() from BH context; every other consumer calls it
>>> from process context.
>>>
>>> Deferring the cancel to process context in the NVMe target is not
>>> straightforward: nvmet_tcp_schedule_release_queue() must call
>>> tls_handshake_cancel() atomically with its state transition to
>>> DISCONNECTING. If the cancel were deferred, the handshake completion
>>> callback could fire in the window before the cancel runs, observe the
>>> unexpected state, and return without dropping its kref on the queue.
>>> Reworking that interlock is considerably more invasive than hardening
>>> the handshake lock. Convert all hn->hn_lock acquisitions from
>>> spin_lock/spin_unlock to spin_lock_bh/spin_unlock_bh so the lock is
>>> never taken with softirqs enabled.
>>>
>>> Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall")
>>> Signed-off-by: Chuck Lever <chuck.lever at oracle.com>
>>> ---
>>> net/handshake/netlink.c | 4 ++--
>>> net/handshake/request.c | 14 +++++++-------
>>> net/handshake/tlshd.c | 2 ++
>>> 3 files changed, 11 insertions(+), 9 deletions(-)
>>>
>> ... and there is always the question whather we should be calling
>> tls_handshake_cancel() in the first place.
>> We already call 'tls_handshake_cancel()' from the handshake timeout
>> handler, and this instance of 'tls_handshake_cancel()' is called
>> from a socket callback (ie when the soecket is closed or something).
>> I would have expected that the handshake code cleans up outstanding
>> requests on socket close already; if so, we can delete the cancel
>> here.
>
> The handshake layer doesn't auto-cancel on socket close.
>
> handshake_sk_destruct() fires when the sock refcount reaches
> zero, but it only frees the handshake_req and removes it from
> the hash table. It does not set HANDSHAKE_F_REQ_COMPLETED or
> invoke the consumer's done callback. It assumes the request
> was already completed or canceled before the socket reached
> final destruction.
>
> The timeout handler and the state-change path cover different
> failure modes. The timeout fires when tlshd is slow or absent.
> The state-change callback fires when the remote peer resets or
> closes the connection while the handshake is still in
> progress -- that can happen well before the timeout expires.
>
> Without the cancel in nvmet_tcp_schedule_release_queue(), the
> race plays out as follows: the state-change callback
> transitions the queue to DISCONNECTING and drops its kref.
> Then tlshd finishes and the handshake layer calls
> nvmet_tcp_tls_handshake_done(). That callback sees the queue
> is not in TLS_HANDSHAKE state, hits the WARN_ON, and returns
> without dropping its own kref -- leaking the queue.
>
> The cancel makes the teardown race-free: either cancel wins
> and the completion callback never fires, or completion already
> won and cancel returns false. Either way, exactly one side
> drops the kref.
>
> The SunRPC consumers (client and server) avoid this problem by
> blocking a thread on wait_for_completion() during the
> handshake. A socket state change wakes the waiter indirectly
> through the completion callback, and the waiter handles
> cleanup. nvme-tcp's handshake is fully asynchronous -- there
> is no blocked thread to wake, so the state-change callback is
> the only context that can cancel the handshake before queue
> teardown begins.
>
Thanks for the clarification. That's what I had (implicitely)
assumed, so the tls_handshake_cancel() needs to stay.
So you can add my:
Reviewed-by: Hannes Reinecke <hare at kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list