[PATCH] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
Meir Elisha
meir.elisha at volumez.com
Sun Feb 16 07:08:04 PST 2025
The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().
Signed-off-by: Meir Elisha <meir.elisha at volumez.com>
---
drivers/nvme/target/tcp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..b66aa93baaf4 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
queue->offset = 0;
queue->left = sizeof(struct nvme_tcp_hdr);
queue->cmd = NULL;
- queue->rcv_state = NVMET_TCP_RECV_PDU;
+ // Ensure rcv_state is visible only after queue->cmd is set
+ smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
}
static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
@@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
cmd->pdu_recv = 0;
nvmet_tcp_build_pdu_iovec(cmd);
queue->cmd = cmd;
- queue->rcv_state = NVMET_TCP_RECV_DATA;
+ // Ensure rcv_state is visible only after queue->cmd is set
+ smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
return 0;
--
2.34.1
This ordering is critical on weakly ordered architectures (such as ARM)
so that any observer which sees the new rcv_state is guaranteed to also
see the updated cmd. Without this guarantee (i.e if the two stores were
reordered), a parallel context might see the new state while queue->cmd
still holds a stale value. This could cause the inline-data check to
return early and ultimately hang the IO.
Additionally, I reviewed the assembly code for ARM and confirmed that
the instructions were reordered(unlike x86), reinforcing the need for
this change.
This scenario was encountered during fio testing, which involved
running 2 min of 4K random writes using an ARM-based machine as the
target. We observed hanging I/O typically after 10-20 iterations.
fio config used:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/mnt/volumez/vol0
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
More information about the Linux-nvme
mailing list