[PATCH 1/3] i3c: mipi-i3c-hci: Make bounce buffer code generic to all DMA transfers
Frank Li
Frank.li at nxp.com
Tue Jun 17 08:09:41 PDT 2025
On Tue, Jun 10, 2025 at 01:08:12PM -0400, Frank Li wrote:
> On Tue, Jun 10, 2025 at 04:42:35PM +0300, Jarkko Nikula wrote:
> > On 6/9/25 6:04 PM, Frank Li wrote:
> > > On Mon, Jun 09, 2025 at 05:15:44PM +0300, Jarkko Nikula wrote:
> > > > On 6/6/25 6:02 PM, Frank Li wrote:
> > > > > On Fri, Jun 06, 2025 at 10:16:44AM +0300, Jarkko Nikula wrote:
> > > > > > Hi
> > > > > >
> > > > > > On 6/5/25 6:13 PM, Frank Li wrote:
> > > > > > > On Thu, Jun 05, 2025 at 05:07:19PM +0300, Jarkko Nikula wrote:
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > On 6/4/25 6:00 PM, Frank Li wrote:
> > > > > > > > > On Wed, Jun 04, 2025 at 03:55:11PM +0300, Jarkko Nikula wrote:
> > > > > > > > > > Move DMA bounce buffer code for I3C private transfers to be generic for
> > > > > > > > > > all DMA transfers, and round up the receive bounce buffer size to a
> > > > > > > > > > multiple of DWORDs.
> > > > > > > > > >
> > > > > > > > > > It was observed that when the device DMA is IOMMU mapped and the receive
> > > > > > > > > > length is not a multiple of DWORDs, the last DWORD is padded with stale
> > > > > > > > > > data from the RX FIFO, corrupting 1-3 bytes beyond the expected data.
> > > > > > > > > >
> > > > > > > > > > A similar issue, though less severe, occurs when an I3C target returns
> > > > > > > > > > less data than requested. In this case, the padding does not exceed the
> > > > > > > > > > requested number of bytes, assuming the device DMA is not IOMMU mapped.
> > > > > > > > > >
> > > > > > > > > > Therefore, all I3C private transfer, CCC command payload and I2C
> > > > > > > > > > transfer receive buffers must be properly sized for the DMA being IOMMU
> > > > > > > > > > mapped. Even if those buffers are already DMA safe, their size may not
> > > > > > > > > > be, and I don't have a clear idea how to guarantee this other than
> > > > > > > > > > using a local bounce buffer.
> > > > > > > > > >
> > > > > > > > > > To prepare for the device DMA being IOMMU mapped and to address the
> > > > > > > > > > above issue, implement a local, properly sized bounce buffer for all
> > > > > > > > > > DMA transfers. For now, allocate it only when the buffer is in the
> > > > > > > > > > vmalloc() area to avoid unnecessary copying with CCC commands and
> > > > > > > > > > DMA-safe I2C transfers.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Jarkko Nikula <jarkko.nikula at linux.intel.com>
> > > > > > > > > > ---
> > > > > > > > > > drivers/i3c/master/mipi-i3c-hci/core.c | 34 -------------------
> > > > > > > > > > drivers/i3c/master/mipi-i3c-hci/dma.c | 47 +++++++++++++++++++++++++-
> > > > > > > > > > 2 files changed, 46 insertions(+), 35 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/i3c/master/mipi-i3c-hci/core.c b/drivers/i3c/master/mipi-i3c-hci/core.c
> > > > > > > > > > index bc4538694540..24c5e7d5b439 100644
> > > > > > > > > > --- a/drivers/i3c/master/mipi-i3c-hci/core.c
> > > > > > > > > > +++ b/drivers/i3c/master/mipi-i3c-hci/core.c
> > > > > > > > > > @@ -272,34 +272,6 @@ static int i3c_hci_daa(struct i3c_master_controller *m)
> > > > > > > > > > return hci->cmd->perform_daa(hci);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > ...
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > +static void *hci_dma_alloc_safe_xfer_buf(struct i3c_hci *hci,
> > > > > > > > > > + struct hci_xfer *xfer)
> > > > > > > > > > +{
> > > > > > > > > > + if (!is_vmalloc_addr(xfer->data))
> > > > > > > > > > + return xfer->data;
> > > > > > > > > > +
> > > > > > > > > > + if (xfer->rnw)
> > > > > > > > > > + /*
> > > > > > > > > > + * Round up the receive bounce buffer length to a multiple of
> > > > > > > > > > + * DWORDs. Independently of buffer alignment, DMA_FROM_DEVICE
> > > > > > > > > > + * transfers may corrupt the last DWORD when transfer length is
> > > > > > > > > > + * not a multiple of DWORDs. This was observed when the device
> > > > > > > > > > + * DMA is IOMMU mapped or when an I3C target device returns
> > > > > > > > > > + * less data than requested. Latter case is less severe and
> > > > > > > > > > + * does not exceed the requested number of bytes, assuming the
> > > > > > > > > > + * device DMA is not IOMMU mapped.
> > > > > > > > > > + */
> > > > > > > > > > + xfer->bounce_buf = kzalloc(ALIGN(xfer->data_len, 4),
> > > > > > > > > > + GFP_KERNEL);
> > > > > > > > > > + else
> > > > > > > > > > + xfer->bounce_buf = kmemdup(xfer->data, xfer->data_len,
> > > > > > > > > > + GFP_KERNEL);
> > > > > > > > > > +
> > > > > > > > > > + return xfer->bounce_buf;
> > > > > > > > >
> > > > > > > > > Why need use bounce_buf? why not use dma_map_single()?, which will check
> > > > > > > > > alignment and size to decide if use swiotlb as bounce buffer.
> > > > > > > > >
> > > > > > > > We do pass the buffer to the dma_map_single(). I've understood swiotlb is
> > > > > > > > transparently used when the DMA cannot directly access the memory but that's
> > > > > > > > not the case here.
> > > > > > >
> > > > > > > why not? even though you have to use internal buf, why not use
> > > > > > > dma_alloc_coherent().
> > > > > > >
> > > > > > I don't think it's suitable for "smallish" per transfer allocations/mappings
> > > > > > and buffer is not accessed concurrently by the CPU and DMA during the
> > > > > > transfer.
> > > > >
> > > > > I still can't understand why need this special handle for i3c. Basic it use
> > > > > dma transfer to transfer data. Can you simple descript hci's data flow?
> > > > >
> > > > Unfortunately my knowledge doesn't go deeply enough in HW but I could guess
> > > > it's somewhere in MIPI I3C HCI DMA engine implementation and/or HCI IP to
> > > > memory bridge/bus integration.
> > > >
> > > > If it would be a cache line issue then I would think we would see corruption
> > > > independently is the DMA IOMMU mapped or not and in larger number of bytes
> > > > than 1-3 last bytes.
> > > >
> > > > Also another finding that if I3C target returns less data than expected then
> > > > there is also padding in the last DWORD with the real data. But not more
> > > > than requested, i.e. 2 bytes expected, target returns 1, only the 2nd byte
> > > > contains stale data and bytes 3-4 untouched. Or 8 bytes expected, target
> > > > returns 1, bytes 2-4 contain stale data and bytes 5-8 untouched.
> > > >
> > > > So something around trailing bytes handling in the HW when the transfer
> > > > length is not multiple of DWORDs as far as I could guess.
> > >
> > > That's reason why should not manually handle dma buffer here. It is
> > > small memory and less proformance requirement.
> > >
> > > If you use dma provided api, you should not take care such detial. DMA
> > > buffer at least one cache line size. otherwise, dma may currupt some data,
> > > which is hard to find.
> > >
> > Do you know what solution would solve the problem here?
> >
> > Let's take a look at the CCC command which is the easiest here. For example
> > GETPID where payload length is 6 bytes and is allocated using the kzalloc()
> > which guarantees the buffer is cache line aligned and ought to be DMA-safe.
> >
> > i3c_master_getpid_locked()
> > i3c_ccc_cmd_dest_init() // Allocates 6-byte payload buffer
> > i3c_master_send_ccc_cmd_locked()
> > i3c_hci_send_ccc_cmd() // mipi-i3c-hci
> > xfer[i].data = ccc->dests[i].payload.data;
> > hci_dma_queue_xfer()
> > // bounce buffer stuff, this patchset
> > dma_map_single()
> > // setup hw with DMA address
> > // and queue the transfer
> >
> > The DMA streaming API dma_map_single() maps buffer's CPU virtual address to
> > the DMA address and we use that to setup the HW and do the DMA.
> >
> > Assume now the DMA is IOMMU mapped. DMA writes correctly 6 bytes to the
> > payload buffer + 2 extra stale bytes. In this case perhaps harmless (due to
> > kzalloc() allocated buffer) but nevertheless corruption which KFENCE may be
> > able to detect.
> >
> > More difficult territory are I3C private transfers and especially I2C
> > transfers. I3C private transfers are expected to pass a DMA-able buffer but
> > I've noticed it may not be always true especially when I3C support is added
> > into existing I2C or SPI target device drivers and that are using
> > regmap_bulk_read(). Those use cases may pass a stack variable (not DMA-able
> > and will be rejected by DMA API), structure member or buffer index and
> > pointer into those come as it via regmap_bulk_read().
> >
> > I2C message buffers are not even expected to be DMA-able.
> > drivers/i2c/i2c-core-base.c has a helper i2c_get_dma_safe_msg_buf() but I
> > see it has the same problem than with the CCC command payload buffer.
> >
> > So far I haven't figured out any better solution than using a local bounce
> > buffer for all of these transfer types.
>
> Thanks, I finially understand the problem you faced. I think other master
> controller face similar issue if they use dma. Let me think more. If alloc
> buffer size is not align to cache line, swtlib will bounce again.
>
> rough idea i3c core layer provide an i3c_(un)map_api to do that.
struct i3c_dma {
void *bounce;
dma_addr_t addr;
enum dma_data_direction;
void *data;
size_t data_size;
size_t bounce_size;
}
struct i3c_dma *i3c_dma_map_single(struct device *dev, void *data, size_t sz, enum dma_data_direction dir)
{
struct i3c_dma *p
p = kmalloc(sizeof(*p));
if (!p)
return p;
if (is_vmalloc_addr(data)) {
p->bounce = kmalloc(align_up(sz, CACHE_LINE_SIZE)); //avoid swtlib bounce again.
memcpy()
...
}
dma_map_single();
...
}
void i3c_dma_unmap_single(struct i3c_dma *p)
{
dma_unmap_single();
...
if (data != bounce)
memcpy();
kfree(p);
}
#define DEFINE_FREE(i3c_dma_unmap_single, struct i3c_dma *, if (_T)i3c_dma_unmap_single(_T))
Frank
>
> Frank
>
> >
> > --
> > linux-i3c mailing list
> > linux-i3c at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-i3c
>
> --
> linux-i3c mailing list
> linux-i3c at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-i3c
More information about the linux-i3c
mailing list