[PATCH v7 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
John Ernberg
john.ernberg at actia.se
Fri May 2 05:02:16 PDT 2025
Hi Catalin,
On 4/1/25 6:43 PM, Catalin Marinas wrote:
> On Fri, Mar 28, 2025 at 04:41:05PM +0000, John Ernberg wrote:
>> On 6/12/23 5:31 PM, Catalin Marinas wrote:
>>> That's v7 of the series reducing the kmalloc() minimum alignment on
>>> arm64 to 8 (from 128). There's no new/different functionality, mostly
>>> cosmetic changes and acks/tested-bys.
>>>
>>> Andrew, if there are no further comments or objections to this version,
>>> are you ok to take the series through the mm tree? The arm64 changes are
>>> fairly small. Alternatively, I can push it into linux-next now to give
>>> it some wider exposure and decide whether to upstream it when the
>>> merging window opens. Thanks.
>>>
>>> The updated patches are also available on this branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign
>>>
> [...]
>> Seen on Linux 6.12.20, it is not trivial for us to test later kernels so
>> if the issue is potentially fixed we are more than happy to cherry-pick
>> the potential fixes and give them a go.
>
> I'm not aware of any recent fix for this, so I doubt testing a newer
> kernel would make a difference.
>
>> Having an SMSC9512 (smsc95xx) USB Ethernet/Hub chip attached to the armv8
>> SoC iMX8QXP over the Cadence USB3 USB2 interface (cdns3-imx) will since
>> the patch set at [0] cause random interrupt storms over the SMSC9512 INT
>> EP.
>>
>> The reason for the storm is that the async URBs queued at [1] right before
>> the interrupt configuration [2] in the driver.
>> With [0] applied, those async URBs are likely clobbering any URB located
>> after them in memory somewhere in the xhci memory space.
>> The memory corruption only happens if there is more than one URB in the
>> queue at the same time, making these async URBs a good trigger of the
>> problem.
>> If we force those URBs to be sync or use the hack inlined below, the
>> problem goes away.
>
> I'm not really familiar with this area. My only drivers/usb/ change
> related to ARCH_KMALLOC_MINALIGN was commit 075efe7c1656 ("drivers/usb:
> use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN"). I wouldn't be
> surprised if I missed other things that rely on the kmalloc() alignment
> rather than explicit macros.
>
>> The content of read_buf in the interrupt configuration read at [2] looks
>> to be the lo-part of a pointer +-20 bytes distance from the pointers
>> present in the async URBs queued from [1] when we dumped the URB structures
>> instead of the expected register contents.
>
> It might be worth enabling CONFIG_DMA_API_DEBUG to see if it complains.
> I lost myself in the call paths on how read_buf gets populated. In
> principle, the DMA API should handle bouncing (swiotlb) even if you pass
> it a buffer smaller than the required alignment
>
> Random shot, untested and not an actual fix but some ideas for
> debugging:
>
> ------------------8<-------------------------------
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 44179f4e807f..06d5f9bfef75 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -2024,7 +2024,7 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
> cmd, reqtype, value, index, size);
>
> if (size) {
> - buf = kmalloc(size, GFP_NOIO);
> + buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_NOIO);
> if (!buf)
> goto out;
> }
> @@ -2171,12 +2171,13 @@ int usbnet_write_cmd_async(struct usbnet *dev, u8 cmd, u8 reqtype,
> goto fail;
>
> if (data) {
> - buf = kmemdup(data, size, GFP_ATOMIC);
> + buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_ATOMIC);
> if (!buf) {
> netdev_err(dev->net, "Error allocating buffer"
> " in %s!\n", __func__);
> goto fail_free_urb;
> }
> + memcpy(buf, data, size);
> }
>
> req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
> diff --git a/drivers/usb/cdns3/cdnsp-mem.c b/drivers/usb/cdns3/cdnsp-mem.c
> index 97866bfb2da9..226ac7af6511 100644
> --- a/drivers/usb/cdns3/cdnsp-mem.c
> +++ b/drivers/usb/cdns3/cdnsp-mem.c
> @@ -45,6 +45,7 @@ static struct cdnsp_segment *cdnsp_segment_alloc(struct cdnsp_device *pdev,
> return NULL;
> }
>
> + max_packet = ALIGN(max_packet, dma_get_cache_alignment());
> if (max_packet) {
> seg->bounce_buf = kzalloc(max_packet, flags | GFP_DMA);
> if (!seg->bounce_buf)
> ------------------8<-------------------------------
>
> Even without the above, my reading of the code is that it is safe since
> the buffers eventually end up in dma_map_single() which would do
> bouncing via an aligned buffer.
>
> Try to track down call paths from smsc95xx_read_reg() and
> smsc95xx_write_reg_async() to usbnet_{read,wrote}_cmd* etc. and see how
> the DMA transfers happen, whether it's missing some dma_map_* call. The
> dma_map_* bouncing logic relies on the size, see
> dma_kmalloc_needs_bounce().
>
> Is there an iommu between the usb host controller and memory? The iommu
> code should do similar bouncing but it's had minimal testing.
>
> --
> Catalin
>
(Finally finished up this issue after being assigned some higher
priority tasks)
Thank you again for the very helpful debug advice, I was able to track
down the problem to be a Xen swiotlb problem (I had not realized Xen had
its own swiotlb), and I managed to identify another DMA issue as well
due to that. Series at [1].
Thanks! // John Ernberg
[1]:
https://lore.kernel.org/xen-devel/20250502114043.1968976-1-john.ernberg@actia.se/
More information about the linux-arm-kernel
mailing list