[PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch

Christoph Hellwig hch at lst.de
Mon Feb 24 06:13:49 PST 2025


On Sun, Feb 23, 2025 at 09:28:45AM +0200, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
> 
> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
> 
> Signed-off-by: Meir Elisha <meir.elisha at volumez.com>
> ---
> Changes from v2:
> 	- Fix barrier semantics
> 	- Use rcv_state instead of state variable
> 
>  drivers/nvme/target/tcp.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 7c51c2a8c109..714d920d14e1 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -571,10 +571,13 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
>  	struct nvmet_tcp_cmd *cmd =
>  		container_of(req, struct nvmet_tcp_cmd, req);
>  	struct nvmet_tcp_queue	*queue = cmd->queue;
> +	/* Pairs with store_release in nvmet_prepare_receive_pdu() */
> +	enum nvmet_tcp_recv_state queue_state = smp_load_acquire(&queue->rcv_state);

Ovely long line.

And another thing purely cosmetic: while I generally like initializing
variables at declaration time, doing that for something like
smp_load_acquire which should go with a comment looks kinda weird.

So maybe just split the assignment out from the declaration?

> -- 
> 2.34.1
> 
> This ordering is critical on weakly ordered architectures (such as ARM)

Something weird is going on with this description below the actual
patch.  This should normally go above.




More information about the Linux-nvme mailing list