[PATCH] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
Caleb Sander Mateos
csander at purestorage.com
Thu Feb 20 10:58:01 PST 2025
On Thu, Feb 20, 2025 at 10:29 AM Meir Elisha <meir.elisha at volumez.com> wrote:
>
> Hi Caleb
>
> Thanks for the review. I'll resend patch after testing.
>
> On 20/02/2025 19:17, Caleb Sander Mateos wrote:
> > On Thu, Feb 20, 2025 at 3:56 AM Meir Elisha <meir.elisha at volumez.com> wrote:
> >>
> >> The order in which queue->cmd and rcv_state are updated is crucial.
> >> If these assignments are reordered by the compiler, the worker might not
> >> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> >> the correct reordering, set rcv_state using smp_store_release().
> >>
> >> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
> >> Signed-off-by: Meir Elisha <meir.elisha at volumez.com>
> >> ---
> >> drivers/nvme/target/tcp.c | 15 +++++++++++----
> >> 1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> >> index 7c51c2a8c109..4021468c8857 100644
> >> --- a/drivers/nvme/target/tcp.c
> >> +++ b/drivers/nvme/target/tcp.c
> >> @@ -571,10 +571,16 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
> >> struct nvmet_tcp_cmd *cmd =
> >> container_of(req, struct nvmet_tcp_cmd, req);
> >> struct nvmet_tcp_queue *queue = cmd->queue;
> >> + enum nvmet_tcp_recv_state queue_state = READ_ONCE(queue->state);
> >
> > Why did this change from queue->rcv_state to queue->state? Doesn't
> > look like enum nvmet_tcp_recv_state is the correct type for
> > queue->state either.
> it was by mistake. should be queue->rcv_state.
> >
> >> + /*
> >> + * Use an acquire load to ensure that any updates to queue->state are visible
> >> + * before loading queue->cmd.
> >> + */
> >> + struct nvmet_tcp_cmd *queue_cmd = smp_load_acquire(&queue->cmd);
> >
> > Acquire ordering prevents memory operations that come *after* from
> > being reordered *before*. It does not prevent earlier operations (such
> > as the load of queue->state) from being reordered after the acquire
> > load. Additionally, an acquire must pair with a release store *on the
> > same value* to have any effect. But the release store is to
> > queue->rcv_state, not queue->cmd.
> >
> > Correct uses of release-acquire ordering generally look something like this:
> > Thread 1:
> > Non-atomic store to A
> > Release-ordering store to B
> >
> > Thread 2:
> > Acquire-ordering load from B
> > Non-atomic load from A
> >
> > This ensures that if thread 2 observes the new value thread 1 stored
> > in B, it will also observe the new value in A.
>
> Thanks for noticed that.
> >
> >> struct nvme_sgl_desc *sgl;
> >> u32 len;
> >>
> >> - if (unlikely(cmd == queue->cmd)) {
> >> + if (unlikely(cmd == queue_cmd)) {
> >> sgl = &cmd->req.cmd->common.dptr.sgl;
> >> len = le32_to_cpu(sgl->length);
> >>
> >> @@ -583,7 +589,7 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
> >> * Avoid using helpers, this might happen before
> >> * nvmet_req_init is completed.
> >> */
> >> - if (queue->rcv_state == NVMET_TCP_RECV_PDU &&
> >> + if (queue_state == NVMET_TCP_RECV_PDU &&
> >> len && len <= cmd->req.port->inline_data_size &&
> >> nvme_is_write(cmd->req.cmd))
> >> return;
> >> @@ -847,8 +853,9 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
> >> {
> >> queue->offset = 0;
> >> queue->left = sizeof(struct nvme_tcp_hdr);
> >> - queue->cmd = NULL;
> >> - queue->rcv_state = NVMET_TCP_RECV_PDU;
> >> + WRITE_ONCE(queue->cmd, NULL);
> >> + /* Ensure rcv_state is visible only after queue->cmd is set */
> >> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
> >
> > Is this also needed in the other places updating queue->rcv_state and
> > queue->cmd, e.g. nvmet_tcp_handle_h2c_data_pdu()?
> nvmet_tcp_handle_h2c_data_pdu() doesn't set rcv_state to NVMET_TCP_RECV_PDU.
> the other context wont exit early in nvmet_tcp_queue_response() so I don't
> think we need it.
Okay. Even if the memory ordering is not required, all stores to
queue->cmd and queue->rcv_state should probably at least be using
WRITE_ONCE() since those values may be concurrently read by
nvmet_tcp_queue_response() on another thread. But that's a
pre-existing issue, so no need to fix it in this patch.
Best,
Caleb
More information about the Linux-nvme
mailing list