[PATCH v7 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8

Catalin Marinas catalin.marinas at arm.com
Tue Apr 1 09:43:37 PDT 2025


On Fri, Mar 28, 2025 at 04:41:05PM +0000, John Ernberg wrote:
> On 6/12/23 5:31 PM, Catalin Marinas wrote:
> > That's v7 of the series reducing the kmalloc() minimum alignment on
> > arm64 to 8 (from 128). There's no new/different functionality, mostly
> > cosmetic changes and acks/tested-bys.
> > 
> > Andrew, if there are no further comments or objections to this version,
> > are you ok to take the series through the mm tree? The arm64 changes are
> > fairly small. Alternatively, I can push it into linux-next now to give
> > it some wider exposure and decide whether to upstream it when the
> > merging window opens. Thanks.
> > 
> > The updated patches are also available on this branch:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign
> > 
[...]
> Seen on Linux 6.12.20, it is not trivial for us to test later kernels so
> if the issue is potentially fixed we are more than happy to cherry-pick
> the potential fixes and give them a go.

I'm not aware of any recent fix for this, so I doubt testing a newer
kernel would make a difference.

> Having an SMSC9512 (smsc95xx) USB Ethernet/Hub chip attached to the armv8
> SoC iMX8QXP over the Cadence USB3 USB2 interface (cdns3-imx) will since
> the patch set at [0] cause random interrupt storms over the SMSC9512 INT
> EP.
> 
> The reason for the storm is that the async URBs queued at [1] right before
> the interrupt configuration [2] in the driver.
> With [0] applied, those async URBs are likely clobbering any URB located
> after them in memory somewhere in the xhci memory space.
> The memory corruption only happens if there is more than one URB in the
> queue at the same time, making these async URBs a good trigger of the
> problem.
> If we force those URBs to be sync or use the hack inlined below, the
> problem goes away.

I'm not really familiar with this area. My only drivers/usb/ change
related to ARCH_KMALLOC_MINALIGN was commit 075efe7c1656 ("drivers/usb:
use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN"). I wouldn't be
surprised if I missed other things that rely on the kmalloc() alignment
rather than explicit macros.

> The content of read_buf in the interrupt configuration read at [2] looks
> to be the lo-part of a pointer +-20 bytes distance from the pointers
> present in the async URBs queued from [1] when we dumped the URB structures
> instead of the expected register contents.

It might be worth enabling CONFIG_DMA_API_DEBUG to see if it complains.
I lost myself in the call paths on how read_buf gets populated. In
principle, the DMA API should handle bouncing (swiotlb) even if you pass
it a buffer smaller than the required alignment

Random shot, untested and not an actual fix but some ideas for
debugging:

------------------8<-------------------------------
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 44179f4e807f..06d5f9bfef75 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -2024,7 +2024,7 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
 		   cmd, reqtype, value, index, size);

 	if (size) {
-		buf = kmalloc(size, GFP_NOIO);
+		buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_NOIO);
 		if (!buf)
 			goto out;
 	}
@@ -2171,12 +2171,13 @@ int usbnet_write_cmd_async(struct usbnet *dev, u8 cmd, u8 reqtype,
 		goto fail;

 	if (data) {
-		buf = kmemdup(data, size, GFP_ATOMIC);
+		buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_ATOMIC);
 		if (!buf) {
 			netdev_err(dev->net, "Error allocating buffer"
 				   " in %s!\n", __func__);
 			goto fail_free_urb;
 		}
+		memcpy(buf, data, size);
 	}

 	req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
diff --git a/drivers/usb/cdns3/cdnsp-mem.c b/drivers/usb/cdns3/cdnsp-mem.c
index 97866bfb2da9..226ac7af6511 100644
--- a/drivers/usb/cdns3/cdnsp-mem.c
+++ b/drivers/usb/cdns3/cdnsp-mem.c
@@ -45,6 +45,7 @@ static struct cdnsp_segment *cdnsp_segment_alloc(struct cdnsp_device *pdev,
 		return NULL;
 	}

+	max_packet = ALIGN(max_packet, dma_get_cache_alignment());
 	if (max_packet) {
 		seg->bounce_buf = kzalloc(max_packet, flags | GFP_DMA);
 		if (!seg->bounce_buf)
------------------8<-------------------------------

Even without the above, my reading of the code is that it is safe since
the buffers eventually end up in dma_map_single() which would do
bouncing via an aligned buffer.

Try to track down call paths from smsc95xx_read_reg() and
smsc95xx_write_reg_async() to usbnet_{read,wrote}_cmd* etc. and see how
the DMA transfers happen, whether it's missing some dma_map_* call. The
dma_map_* bouncing logic relies on the size, see
dma_kmalloc_needs_bounce().

Is there an iommu between the usb host controller and memory? The iommu
code should do similar bouncing but it's had minimal testing.

-- 
Catalin



More information about the linux-arm-kernel mailing list