[PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode

Wed Nov 26 00:31:16 PST 2025

On 26.11.2025 07:42:36, Carlos Song wrote:

[...]

> > > +			if (bytes_per_word == 1)
> > > +				swab32s(temp + i);
> > > +			else if (bytes_per_word == 2)
> > > +				swahw32s(temp + i);
> >
> > Why do you do first a swap in place and then a memcpy()
> >
> When dynamic burst enabled, DMA copy data always using buswidth=4 bytes.
> CPU is little endian so bytes order actually little endian in dma_buf.
> But for bytes_per_word=1, every bytes should be the same order with mem order, it should be big endian so swap every bytes.
> In the same reason, bytes per word = 2, swap half word.
> Bytes per word = 4 don't need do any thing.
> (SPI is not ruled bytes order, so SPI bytes order always follow CPU bytes order, here still follow)

Thanks for the explanation. I think my question was not completely
clear. I want to know why you touch every byte twice, first you do the
swap on the existing buffer, then you do a memcpy(). You might do both
operations in one go, i.e. read the bytes from the original buffer and
write them swapped to the bounce buffer.

> > > +		}
> > > +	}
> > > +#endif
> > > +
> > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> >
> > I think this deserves a comment, why you do a re-alignment of the data here.
> >
> Yes. I can add one comment here.
>
> In fact it is not re-alignment, it is trim data.
> When dynamic burst enabled, DMA copy data always using bus width=4 bytes.
> So DMA always get 4 bytes data from RXFIFO. But if data lens is not 4-byte alignment,
> the data in the DMA bounce buffer contains extra garbage bytes, so it needs to be trimmed before memcopy to xfer buffer.
>
> Why is the first word?
> It is from HW behavior. When dynamic burst enabled, BURST_LENGTH will be set same with actual data len,
> It helps maintain correct bit count.
>
> As RM shows:
> "
> In master mode, it controls the number of bits per SPI burst. Since the shift register always loads 32-bit
> data from transmit FIFO, only the n least-significant (n = BURST LENGTH + 1) will be shifted out. The
> remaining bits will be ignored.
>
> Number of Valid Bits in a SPI burst.
>
> 0x000 A SPI burst contains the 1 LSB in a word.
> 0x001 A SPI burst contains the 2 LSB in a word.
> 0x002 A SPI burst contains the 3 LSB in a word.
> ...
> 0x01F A SPI burst contains all 32 bits in a word.
> 0x020 A SPI burst contains the 1 LSB in first word and all 32 bits in second word.
> 0x021 A SPI burst contains the 2 LSB in first word and all 32 bits in second word.
> "
> When data len is not 4 bytes-align, so the first word is always include some garbage bytes(if transfer 7 bytes. first word include one garbage byte and 3 valid bytes, four bytes in second word).
>
> > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > +		copy_ptr = (u8 *)dma_data->dma_rx_buf +
> > BYTES_PER_32BITS_WORD - unaligned;
> > > +	} else {
> > > +		copy_ptr = dma_data->dma_rx_buf;
> >
> > Why do you use the bounce buffer if the data is aligned correct?
> >
> Whatever data is 4 bytes align, when CPU is little endian, bytes swap should be done additionally according to different bytes per word setting.

But for 32 bits per word or word delay you do a not needed bounce
buffers and memcpy()?

We still need the bounce buffer for length not multiple of 4, because a
direct DMA write would overflow the destination buffer, right? And of
course bounce buffers for LE 8 and 16 bit per word for the byte
swapping.

> Summary whole solution about dynamic burst for DMA mode:
> 1. Always read/write SPI FIFO with DMA buswidth = 4, so DMA bounce buffer always 4-bytes alignment:
>   swap bytes/half word according to bytes per word=8/16 when in CPU little endian.
> 2. BURST_LENGTH setting is important, it helps maintain correct bit count(HW trim: don't shift out bits to TXFIFO also don't shift in bits in RXFIFO):
>   * TX: Although DMA put 4 byte-alignment data to FIFO, but in bounce buffer we put valid data in valid LSB of first word, it can make sure ECSPI only shift out valid data in bonus buffer.
>   * RX: Although DMA get 4bytes- alignment data from RXFIFO to bounce buffer, but trim it with valid LSB according actual xfer->len, it can make rx_buf is right data.
>
> > > +	}
> > > +
> > > +	memcpy(rx_buf, copy_ptr, dma_data->data_len); }
> > > +
> > > +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> > > +			   struct dma_data_package *dma_data) {
> > > +	struct spi_controller *controller = spi_imx->controller;
> > > +	struct device *tx_dev = controller->dma_tx->device->dev;
> > > +	struct device *rx_dev = controller->dma_rx->device->dev;
> > > +
> > > +	dma_data->dma_tx_addr = dma_map_single(tx_dev,
> > dma_data->dma_tx_buf,
> > > +
> > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +					       DMA_TO_DEVICE);
> > > +	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
> > > +		dev_err(spi_imx->dev, "DMA TX map failed\n");
> > > +		return -EINVAL;
> >
> > please propagate the error value of dma_mapping_error()
> >
>
> Will do this in V2
> > > +	}
> > > +
> > > +	dma_data->dma_rx_addr = dma_map_single(rx_dev,
> > dma_data->dma_rx_buf,
> > > +
> > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +					       DMA_FROM_DEVICE);
> > > +	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
> > > +		dev_err(spi_imx->dev, "DMA RX map failed\n");
> > > +		goto rx_map_err;
> >
> > there's only one user of the dma_unmap_single(), so no need for the goto.
> >
> This goto is to unmap previous TX, not this RX. TX has been mapped then start to map RX, now RX mapping error, Do we really don't need to
> rollback for TX?

Sorry there was a misunderstanding, I mean you should directly call
dma_unmap_single() and remove the goto.

>
> > > +	}
> > > +
> > > +	return 0;
> > > +
> > > +rx_map_err:
> > > +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> > > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +			 DMA_TO_DEVICE);
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> > > +				      struct dma_data_package *dma_data,
> > > +				      const void *tx_buf,
> > > +				      bool word_delay)
> > > +{
> > > +#ifdef __LITTLE_ENDIAN
> > > +	unsigned int bytes_per_word =
> > spi_imx_bytes_per_word(spi_imx->bits_per_word);
> > > +	u32 *temp;
> > > +#endif
> >
> > move into scope of if()
> >
> Will do this in V2.
> > > +	void *copy_ptr;
> > > +	int unaligned;
> > > +
> > > +	if (word_delay) {
> > > +		dma_data->dma_len = dma_data->data_len;
> > > +	} else {
> > > +		/*
> > > +		 * As per the reference manual, when burst length = 32*n + m bits,
> > ECSPI
> > > +		 * sends m LSB bits in the first word, followed by n full 32-bit words.
> > > +		 * Since actual data may not be 4-byte aligned, allocate DMA TX/RX
> > buffers
> > > +		 * to ensure alignment. For TX, DMA pushes 4-byte aligned words to
> > TXFIFO,
> > > +		 * while ECSPI uses BURST_LENGTH settings to maintain correct bit
> > count.
> > > +		 * For RX, DMA receives 32-bit words from RXFIFO; after transfer
> > completes,
> > > +		 * trim the DMA RX buffer and copy the actual data to rx_buf.
> > > +		 */
> >
> > Ahh, please add the corresponding description for rx.
> >
> Will do this in V2.
> > > +		dma_data->dma_len = ALIGN(dma_data->data_len,
> > BYTES_PER_32BITS_WORD);
> > > +	}
> > > +
> > > +	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> > > +__GFP_ZERO);
> >
> > kzalloc()?
> >
> Yes. Will do this in V2
> > > +	if (!dma_data->dma_tx_buf)
> > > +		return -ENOMEM;
> > > +
> > > +	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> > __GFP_ZERO);
> > > +	if (!dma_data->dma_rx_buf) {
> > > +		kfree(dma_data->dma_tx_buf);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > +		copy_ptr = (u8 *)dma_data->dma_tx_buf +
> > BYTES_PER_32BITS_WORD - unaligned;
> > > +	} else {
> > > +		copy_ptr = dma_data->dma_tx_buf;
> > > +	}
> > > +
> > > +	memcpy(copy_ptr, tx_buf, dma_data->data_len);
> > > +
> > > +	/*
> > > +	 * When word_delay is enabled, DMA transfers an entire word in one
> > minor loop.
> > > +	 * In this case, no data requires additional handling.
> > > +	 */
> > > +	if (word_delay)
> > > +		return 0;
> > > +
> > > +#ifdef __LITTLE_ENDIAN
> > > +	/*
> > > +	 * On little-endian CPUs, adjust byte order:
> > > +	 * - Swap bytes when bpw = 8
> > > +	 * - Swap half-words when bpw = 16
> > > +	 * This ensures correct data ordering for DMA transfers.
> > > +	 */
> > > +	temp = dma_data->dma_tx_buf;
> > > +	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len,
> > BYTES_PER_32BITS_WORD); i++) {
> > > +		if (bytes_per_word == 1)
> > > +			swab32s(temp + i);
> > > +		else if (bytes_per_word == 2)
> > > +			swahw32s(temp + i);
> > > +	}
> > > +#endif
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> > > +				    struct spi_transfer *transfer,
> > > +				    bool word_delay)
> > > +{
> > > +	u32 pre_bl, tail_bl;
> > > +	u32 ctrl;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds
> > 512
> > > +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> > > +	 * an extra DMA request is issued for the remaining data.
> >
> > Why do you need an extra transfer in this case?
> >
>
> BURST_LEGTH is used for SPI HW to maintain correct bit count. So BURST_LENGTH should update with
> data length. After DMA request submit, SPI can not update the BURST_LENGTH, when needed, we must
> split two package, update the register then setup second DMA transfer.
>
> ECSPI HW can update BURST_LENGTH auto, but it always update this using previous value.
> When len > 512 but len is not 512-unaligned, we need two packages, second for tail data.
> For example len is 512 *3 + 511. So first transfer using BURST_LENGTH = 512 bytes(auto update 3 times), DMA transfer len = 512 *3,
> second package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we use 512 bytes as BURST_LENGTH,
> SPI will shift out/in extra bits, it previous isn't acceptable!)

What happens if you keep the Burst Length at 512 and only transfer 511
bytes with the DMA engine?

Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20251126/3639c75c/attachment-0001.sig>