[RFC] mipi-i3c-hci: Support for DMA Ring Pipelining / High-throughput Streaming

Sam Agazaryan samagazaryan at google.com
Wed Jun 3 00:43:40 PDT 2026


On Sat, May 23, 2026 at 7:48 AM Frank Li <Frank.li at nxp.com> wrote:
>
> On Fri, May 22, 2026 at 03:37:35PM -0700, Sam Agazaryan wrote:
> > Hello all,
> >
> > I am working on a project using the mipi-i3c-hci driver that involves
> > large packet bursts (exceeding the physical hardware DMA ring size,
> > such as large MCTP-over-I3C payloads).
>
> How large to exceed DMA ring size, can you increase ring size?
>
> And if large transfer, it will defer IBI handle for while. IBI check only
> happen at every START phase.
>

Right now we are working with the DMA ring maxed out at 255 entries.
The mipi i3c hci
controller I'm working with allows up to 128 bytes per transaction due
to hardware limitations of
the SoC.

To my understanding with the driver implementation right now if I send
down 255 entries, TOC gets
set on the final 255th transaction, and according to the MIPI HCi
standard v1.2 section 6.12 this
means that the STOP signal will be sent on the bus after that final
transaction. My
design proposes a change to this:

For transfers involving transfer descriptors less than or equal to the
maximum ring size the driver
functions exactly the same.

For transfers involving more than the ring size, say ring size = 255
and we send down 500 descriptors
to the driver - We begin setting up TOC = 1 on the 127th descriptor
and the last descriptor in the buffer,
this is so we can wake up the thread responsible for sending data down
the bus to queue up another 127
descriptors and then go back to waiting for completion again - until
the 127 descriptors the DMA controller
is currently handling is done. This is repeated until all 500
descriptors are transferred. In theory, we should
still have relatively good IBI responsiveness compared to the current
state of the driver.

> >
> > I noticed that the current driver implementation treats these as
> > discrete batches.
> >
> > I am considering implementing a ring pipelining or DMA streaming
> > mechanism to allow for asynchronous refills while the ring is running.
> > This would leverage the
> > standard ENQ_PTR doorbell mechanism (per MIPI HCI v1.2, Section 6.8.2)
> > to continuously feed the hardware. I figured in that case it may be
> > worth while to see how the upstream community feels about this
> > feature.
> >
> > Before I dive into the implementation for upstream, I wanted to check:
> > 1. Is there any existing work or a roadmap for DMA
> > streaming/pipelining in the HCI driver?
> > 2. Is a generic dma streaming mechanism for large transfers something
> > you would be interested in seeing as a contribution to the mainline
> > driver?
>
> Usb\network\storage is async. The I3C's framework is sync.
>

I think I made this sound like an asynchronous operation where we
return before finishing all
transactions; the driver would still be synchronous in this case.

> >
> > Currently, my proof-of-concept handles the pipelining at the core
> > transfer level, but I
>
> Can you send your patch as RFC to check what you already did?
>

Yes, below is a implementation of the streaming logic
implemented in core.c.

I initially considered moving this into dma.c to support multiple Ring
Bundles. However, because dma.c expects pre-translated hci_xfer structs,
moving the logic there would require core.c to allocate the
entire transfer array upfront (e.g., a massive kmalloc for 500+ structs
or however big the transfers end up being). This leads to heap exhaustion and
"Sleep while Atomic" crashes if an IBI triggers a readback
concurrently. By keeping
the sliding window in core.c, we can reuse a single, pre-allocated 255-entry
hci_xfer array.


Signed-off-by: Sam Agazaryan <samagazaryan at google.com>
---
drivers/i3c/master/mipi-i3c-hci/core.c | 133 +++++++++++++++++++------
drivers/i3c/master/mipi-i3c-hci/dma.c | 1 -
drivers/i3c/master/mipi-i3c-hci/hci.h | 6 ++
3 files changed, 106 insertions(+), 34 deletions(-)

diff --git a/drivers/i3c/master/mipi-i3c-hci/core.c
b/drivers/i3c/master/mipi-i3c-hci/core.c
index b781dbed2165..77e04585150b 100644
--- a/drivers/i3c/master/mipi-i3c-hci/core.c
+++ b/drivers/i3c/master/mipi-i3c-hci/core.c
@@ -290,9 +290,12 @@ static int i3c_hci_send_ccc_cmd(struct
i3c_master_controller *m,
dev_dbg(&hci->master.dev, "cmd=%#x rnw=%d ndests=%d data[0].len=%d",
ccc->id, ccc->rnw, ccc->ndests, ccc->dests[0].payload.len);
+ mutex_lock(&hci->xfer_lock);
xfer = hci_alloc_xfer(nxfers);
- if (!xfer)
+ if (!xfer) {
+ mutex_unlock(&hci->xfer_lock);
return -ENOMEM;
+ }
if (prefixed) {
xfer->data = NULL;
@@ -346,6 +349,7 @@ static int i3c_hci_send_ccc_cmd(struct
i3c_master_controller *m,
ccc->dests[0].payload.len, ccc->dests[0].payload.data);
out:
+ mutex_unlock(&hci->xfer_lock);
hci_free_xfer(xfer, nxfers);
return ret;
}
@@ -363,53 +367,100 @@ static int i3c_hci_i3c_xfers(struct i3c_dev_desc *dev,
{
struct i3c_master_controller *m = i3c_dev_get_master(dev);
struct i3c_hci *hci = to_i3c_hci(m);
- struct hci_xfer *xfer;
- DECLARE_COMPLETION_ONSTACK(done);
+ struct hci_xfer *xfer = hci->xfer_ring;
unsigned int size_limit;
- int i, last, ret = 0;
+ int i, ret = 0;
+ int processed = 0;
+ int queued = 0;
+ int ring_size = XFER_RING_ENTRIES;
+ int chunk_limit = (ring_size - 1) / 2;
+ int sw_ring_size = chunk_limit * 2;
dev_dbg(&hci->master.dev, "nxfers = %d", nxfers);
- xfer = hci_alloc_xfer(nxfers);
if (!xfer)
- return -ENOMEM;
+ return -ENODEV;
+
+ mutex_lock(&hci->xfer_lock);
+ reinit_completion(&hci->xfer_done);
size_limit = 1U << (16 + FIELD_GET(HC_CAP_MAX_DATA_LENGTH, hci->caps));
- for (i = 0; i < nxfers; i++) {
- xfer[i].data_len = i3c_xfers[i].len;
- ret = -EFBIG;
- if (xfer[i].data_len >= size_limit)
- goto out;
- xfer[i].rnw = i3c_xfers[i].rnw;
- if (i3c_xfers[i].rnw) {
- xfer[i].data = i3c_xfers[i].data.in;
- } else {
- /* silence the const qualifier warning with a cast */
- xfer[i].data = (void *) i3c_xfers[i].data.out;
+ /* 1. Prime the ring */
+ while (queued < nxfers && (queued - processed) < sw_ring_size) {
+ int n = min(nxfers - queued, chunk_limit);
+ struct hci_xfer *chunk_start = xfer + (queued % sw_ring_size);
+
+ for (i = 0; i < n; i++) {
+ int idx = queued + i;
+ chunk_start[i].data_len = i3c_xfers[idx].len;
+ chunk_start[i].rnw = i3c_xfers[idx].rnw;
+ chunk_start[i].data = i3c_xfers[idx].rnw ?
+ i3c_xfers[idx].data.in :
+ (void *)i3c_xfers[idx].data.out;
+ hci->cmd->prep_i3c_xfer(hci, dev, &chunk_start[i]);
+ chunk_start[i].cmd_desc[0] |= CMD_0_ROC;
+ chunk_start[i].completion = NULL;
}
- hci->cmd->prep_i3c_xfer(hci, dev, &xfer[i]);
- xfer[i].cmd_desc[0] |= CMD_0_ROC;
+ chunk_start[n - 1].cmd_desc[0] |= CMD_0_TOC;
+ chunk_start[n - 1].completion = &hci->xfer_done;
+
+ ret = hci->io->queue_xfer(hci, chunk_start, n);
+ if (ret)
+ goto out;
+ queued += n;
}
- last = i - 1;
- xfer[last].cmd_desc[0] |= CMD_0_TOC;
- xfer[last].completion = &done;
- xfer[last].timeout = HZ;
- ret = i3c_hci_process_xfer(hci, xfer, nxfers);
- if (ret)
- goto out;
- for (i = 0; i < nxfers; i++) {
- if (i3c_xfers[i].rnw)
- i3c_xfers[i].len = RESP_DATA_LENGTH(xfer[i].response);
- if (RESP_STATUS(xfer[i].response) != RESP_SUCCESS) {
- ret = -EIO;
+ /* 2. Sliding Window Loop (Counting Semaphore) */
+ while (processed < nxfers) {
+ if (!wait_for_completion_timeout(&hci->xfer_done, HZ)) {
+ hci->io->dequeue_xfer(hci, xfer, ring_size);
+ ret = -ETIME;
goto out;
}
+
+ int n_done = min(nxfers - processed, chunk_limit);
+ struct hci_xfer *done_chunk = xfer + (processed % sw_ring_size);
+
+ for (i = 0; i < n_done; i++) {
+ int idx = processed + i;
+ if (i3c_xfers[idx].rnw)
+ i3c_xfers[idx].len = RESP_DATA_LENGTH(done_chunk[i].response);
+ if (RESP_STATUS(done_chunk[i].response) != RESP_SUCCESS) {
+ ret = -EIO;
+ goto out;
+ }
+ }
+ processed += n_done;
+
+ /* 3. Refill the ring */
+ if (queued < nxfers) {
+ int n_next = min(nxfers - queued, chunk_limit);
+ struct hci_xfer *next_chunk = xfer + (queued % sw_ring_size);
+
+ for (i = 0; i < n_next; i++) {
+ int idx = queued + i;
+ next_chunk[i].data_len = i3c_xfers[idx].len;
+ next_chunk[i].rnw = i3c_xfers[idx].rnw;
+ next_chunk[i].data = i3c_xfers[idx].rnw ?
+ i3c_xfers[idx].data.in :
+ (void *)i3c_xfers[idx].data.out;
+ hci->cmd->prep_i3c_xfer(hci, dev, &next_chunk[i]);
+ next_chunk[i].cmd_desc[0] |= CMD_0_ROC;
+ next_chunk[i].completion = NULL;
+ }
+ next_chunk[n_next - 1].cmd_desc[0] |= CMD_0_TOC;
+ next_chunk[n_next - 1].completion = &hci->xfer_done;
+
+ ret = hci->io->queue_xfer(hci, next_chunk, n_next);
+ if (ret)
+ goto out;
+ queued += n_next;
+ }
}
out:
- hci_free_xfer(xfer, nxfers);
+ mutex_unlock(&hci->xfer_lock);
return ret;
}
@@ -424,9 +475,12 @@ static int i3c_hci_i2c_xfers(struct i2c_dev_desc *dev,
dev_dbg(&hci->master.dev, "nxfers = %d", nxfers);
+ mutex_lock(&hci->xfer_lock);
xfer = hci_alloc_xfer(nxfers);
- if (!xfer)
+ if (!xfer) {
+ mutex_unlock(&hci->xfer_lock);
return -ENOMEM;
+ }
for (i = 0; i < nxfers; i++) {
xfer[i].data = i2c_xfers[i].buf;
@@ -451,6 +505,7 @@ static int i3c_hci_i2c_xfers(struct i2c_dev_desc *dev,
}
out:
+ mutex_unlock(&hci->xfer_lock);
hci_free_xfer(xfer, nxfers);
return ret;
}
@@ -1019,6 +1074,18 @@ static int i3c_hci_probe(struct platform_device *pdev)
if (hci->quirks & HCI_QUIRK_RPM_IBI_ALLOWED)
hci->master.rpm_ibi_allowed = true;
+ /* Pre-allocate ring for high-throughput sliding window transfers */
+
+ hci->xfer_ring = devm_kcalloc(&pdev->dev, XFER_RING_ENTRIES,
sizeof(struct hci_xfer), GFP_KERNEL);
+
+ if (!hci->xfer_ring)
+
+ return -ENOMEM;
+
+ init_completion(&hci->xfer_done);
+
+ mutex_init(&hci->xfer_lock);
+
return i3c_master_register(&hci->master, &pdev->dev, &i3c_hci_ops, false);
}
diff --git a/drivers/i3c/master/mipi-i3c-hci/dma.c
b/drivers/i3c/master/mipi-i3c-hci/dma.c
index e4daaa612055..2cfd6ff25040 100644
--- a/drivers/i3c/master/mipi-i3c-hci/dma.c
+++ b/drivers/i3c/master/mipi-i3c-hci/dma.c
@@ -26,7 +26,6 @@
*/
#define XFER_RINGS 1 /* max: 8 */
-#define XFER_RING_ENTRIES 16 /* max: 255 */
#define IBI_RINGS 1 /* max: 8 */
#define IBI_STATUS_RING_ENTRIES 32 /* max: 255 */
diff --git a/drivers/i3c/master/mipi-i3c-hci/hci.h
b/drivers/i3c/master/mipi-i3c-hci/hci.h
index f17f43494c1b..c6c8eabbcba8 100644
--- a/drivers/i3c/master/mipi-i3c-hci/hci.h
+++ b/drivers/i3c/master/mipi-i3c-hci/hci.h
@@ -37,6 +37,8 @@ struct dat_words {
};
/* Our main structure */
+#define XFER_RING_ENTRIES 255 /* max: 255 */
+
struct i3c_hci {
struct i3c_master_controller master;
void __iomem *base_regs;
@@ -70,6 +72,10 @@ struct i3c_hci {
u32 vendor_version_id;
u32 vendor_product_id;
void *vendor_data;
+ /* High-throughput sliding window support */
+ struct hci_xfer *xfer_ring;
+ struct completion xfer_done;
+ struct mutex xfer_lock;
};
/*
--



More information about the linux-i3c mailing list