[PATCH v2] nvmet-rdma: handle inline data with a nonzero offset
Bryam Vargas
hexlabsecurity at proton.me
Thu Jun 4 12:36:54 PDT 2026
nvmet_rdma_use_inline_sg() maps the host-controlled inline data offset
into the per-command inline scatterlist. The bounds check admits any
offset with off + len <= inline_data_size, but the mapping still assumes
the data begins in the first inline page:
sg->offset = off;
sg->length = min_t(int, len, PAGE_SIZE - off);
When a port is configured with inline_data_size > PAGE_SIZE (settable up
to max(SZ_16K, PAGE_SIZE)), an offset in (PAGE_SIZE, inline_data_size]
makes "PAGE_SIZE - off" underflow, so sg->length is set to ~4 GiB and
the block backend reads far past the first inline page. num_pages(len)
also ignores the offset, so an in-bounds offset whose [off, off+len)
span crosses a page boundary under-counts the scatterlist.
Map the offset properly: split it into a page index and an in-page
offset, start the scatterlist at that page, and size the page count from
page_off + len. Because the request scatterlist may now start at
inline_sg[page_idx] rather than inline_sg[0], generalize the inline-SGL
identity test in nvmet_rdma_release_rsp() to a range test; otherwise the
persistent inline scatterlist is mistaken for an allocated one and
nvmet_req_free_sgls() frees an inline page (and warns in
free_large_kmalloc()).
Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable at vger.kernel.org
Suggested-by: Keith Busch <kbusch at kernel.org>
Reported-by: Bryam Vargas <hexlabsecurity at proton.me>
Signed-off-by: Bryam Vargas <hexlabsecurity at proton.me>
---
v1 rejected a nonzero offset; per Keith's note a nonzero in-capsule SGL
offset is legitimate (it is the per-command SGL Offset field, distinct
from the controller ICDOFF attribute that nvme_rdma_setup_ctrl() refuses
when nonzero), so v2 handles it instead, using Keith's suggested
page_idx/page_off form for nvmet_rdma_use_inline_sg().
Review context (not for the commit log):
Bound safety: with off + len <= inline_data_size the highest inline_sg[]
index touched is page_idx + sg_count - 1 = floor((off + len - 1) /
PAGE_SIZE) <= num_pages(inline_data_size) - 1 = inline_page_count - 1
(<= NVMET_RDMA_MAX_INLINE_SGE - 1), and page_off < PAGE_SIZE so
PAGE_SIZE - page_off cannot underflow. The release_rsp range test is a
strict generalization of the old "!= inline_sg" test: inline_sg[0] is in
range (unchanged: not freed), allocated/keyed SGLs are outside it (still
freed), and only the new inline_sg[1..] starts are additionally treated
as inline.
Decides identically on 32- and 64-bit builds: off is u64, so the offset
arithmetic and PAGE_SIZE - page_off are evaluated in 64-bit on both ABIs;
num_pages() sees page_off + len <= 16384 (positive, int-safe on both);
the release_rsp comparison is a pointer comparison, identical semantics
on ILP32 and LP64. (-m32/-m64 model output identical.)
A/B on a KASAN build (inline_data_size = 16384) over an rdma_rxe
loopback nvmet-rdma target with a block backend, inline write:
- offset 0: succeeds, clean (control + no regression).
- offset 8192: before this patch the block backend reads out of bounds
BUG: KASAN: slab-out-of-bounds in copy_folio_from_iter_atomic
(sg->length = 0xfffff000); with this patch it is served from the
correct inline page, in bounds, no KASAN and no free_large_kmalloc
warning.
- the use_inline_sg() rework alone (without the release_rsp change)
trips on offset 8192:
WARNING: ... free_large_kmalloc ... Not a kmalloc allocation
nvmet_req_free_sgls <- nvmet_rdma_release_rsp <- nvmet_rdma_send_done
drivers/nvme/target/rdma.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 565183a20007..eb975fbd74a1 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -666,7 +666,8 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
if (rsp->n_rdma)
nvmet_rdma_rw_ctx_destroy(rsp);
- if (rsp->req.sg != rsp->cmd->inline_sg)
+ if (rsp->req.sg < rsp->cmd->inline_sg ||
+ rsp->req.sg >= rsp->cmd->inline_sg + queue->dev->inline_page_count)
nvmet_req_free_sgls(&rsp->req);
if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
@@ -821,24 +822,25 @@ static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc)
static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
u64 off)
{
- int sg_count = num_pages(len);
+ u64 page_off = off % PAGE_SIZE;
+ u64 page_idx = off / PAGE_SIZE;
+ int sg_count = num_pages(page_off + len);
struct scatterlist *sg;
int i;
- sg = rsp->cmd->inline_sg;
+ sg = &rsp->cmd->inline_sg[page_idx];
for (i = 0; i < sg_count; i++, sg++) {
if (i < sg_count - 1)
sg_unmark_end(sg);
else
sg_mark_end(sg);
- sg->offset = off;
- sg->length = min_t(int, len, PAGE_SIZE - off);
+ sg->offset = page_off;
+ sg->length = min_t(u64, len, PAGE_SIZE - page_off);
len -= sg->length;
- if (!i)
- off = 0;
+ page_off = 0;
}
- rsp->req.sg = rsp->cmd->inline_sg;
+ rsp->req.sg = &rsp->cmd->inline_sg[page_idx];
rsp->req.sg_cnt = sg_count;
}
--
2.43.0
More information about the Linux-nvme
mailing list