[PATCH v7 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8

John Ernberg john.ernberg at actia.se
Wed Apr 2 03:35:49 PDT 2025


Hi Catalin,

On 4/1/25 6:43 PM, Catalin Marinas wrote:
> On Fri, Mar 28, 2025 at 04:41:05PM +0000, John Ernberg wrote:
>> On 6/12/23 5:31 PM, Catalin Marinas wrote:
>>> That's v7 of the series reducing the kmalloc() minimum alignment on
>>> arm64 to 8 (from 128). There's no new/different functionality, mostly
>>> cosmetic changes and acks/tested-bys.
>>>
>>> Andrew, if there are no further comments or objections to this version,
>>> are you ok to take the series through the mm tree? The arm64 changes are
>>> fairly small. Alternatively, I can push it into linux-next now to give
>>> it some wider exposure and decide whether to upstream it when the
>>> merging window opens. Thanks.
>>>
>>> The updated patches are also available on this branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign
>>>
> [...]
>> Seen on Linux 6.12.20, it is not trivial for us to test later kernels so
>> if the issue is potentially fixed we are more than happy to cherry-pick
>> the potential fixes and give them a go.
> 
> I'm not aware of any recent fix for this, so I doubt testing a newer
> kernel would make a difference.
> 
>> Having an SMSC9512 (smsc95xx) USB Ethernet/Hub chip attached to the armv8
>> SoC iMX8QXP over the Cadence USB3 USB2 interface (cdns3-imx) will since
>> the patch set at [0] cause random interrupt storms over the SMSC9512 INT
>> EP.
>>
>> The reason for the storm is that the async URBs queued at [1] right before
>> the interrupt configuration [2] in the driver.
>> With [0] applied, those async URBs are likely clobbering any URB located
>> after them in memory somewhere in the xhci memory space.
>> The memory corruption only happens if there is more than one URB in the
>> queue at the same time, making these async URBs a good trigger of the
>> problem.
>> If we force those URBs to be sync or use the hack inlined below, the
>> problem goes away.
> 
> I'm not really familiar with this area. My only drivers/usb/ change
> related to ARCH_KMALLOC_MINALIGN was commit 075efe7c1656 ("drivers/usb:
> use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN"). I wouldn't be
> surprised if I missed other things that rely on the kmalloc() alignment
> rather than explicit macros.

We tried hacking the outcome of the function back to how it ran on the 
vendor 6.1 kernel and it did not have an effect on the bug.

> 
>> The content of read_buf in the interrupt configuration read at [2] looks
>> to be the lo-part of a pointer +-20 bytes distance from the pointers
>> present in the async URBs queued from [1] when we dumped the URB structures
>> instead of the expected register contents.
> 
> It might be worth enabling CONFIG_DMA_API_DEBUG to see if it complains.
> I lost myself in the call paths on how read_buf gets populated. In
> principle, the DMA API should handle bouncing (swiotlb) even if you pass
> it a buffer smaller than the required alignment
> 
> Random shot, untested and not an actual fix but some ideas for
> debugging:
> 
> ------------------8<-------------------------------
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 44179f4e807f..06d5f9bfef75 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -2024,7 +2024,7 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
>                     cmd, reqtype, value, index, size);
> 
>          if (size) {
> -               buf = kmalloc(size, GFP_NOIO);
> +               buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_NOIO);
>                  if (!buf)
>                          goto out;
>          }
> @@ -2171,12 +2171,13 @@ int usbnet_write_cmd_async(struct usbnet *dev, u8 cmd, u8 reqtype,
>                  goto fail;
> 
>          if (data) {
> -               buf = kmemdup(data, size, GFP_ATOMIC);
> +               buf = kmalloc(ALIGN(size, dma_get_cache_alignment()), GFP_ATOMIC);
>                  if (!buf) {
>                          netdev_err(dev->net, "Error allocating buffer"
>                                     " in %s!\n", __func__);
>                          goto fail_free_urb;
>                  }
> +               memcpy(buf, data, size);
>          }
> 
>          req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
> diff --git a/drivers/usb/cdns3/cdnsp-mem.c b/drivers/usb/cdns3/cdnsp-mem.c
> index 97866bfb2da9..226ac7af6511 100644
> --- a/drivers/usb/cdns3/cdnsp-mem.c
> +++ b/drivers/usb/cdns3/cdnsp-mem.c
> @@ -45,6 +45,7 @@ static struct cdnsp_segment *cdnsp_segment_alloc(struct cdnsp_device *pdev,
>                  return NULL;
>          }
> 
> +       max_packet = ALIGN(max_packet, dma_get_cache_alignment());
>          if (max_packet) {
>                  seg->bounce_buf = kzalloc(max_packet, flags | GFP_DMA);
>                  if (!seg->bounce_buf)
> ------------------8<-------------------------------
> 
> Even without the above, my reading of the code is that it is safe since
> the buffers eventually end up in dma_map_single() which would do
> bouncing via an aligned buffer.
> 
> Try to track down call paths from smsc95xx_read_reg() and
> smsc95xx_write_reg_async() to usbnet_{read,wrote}_cmd* etc. and see how
> the DMA transfers happen, whether it's missing some dma_map_* call. The
> dma_map_* bouncing logic relies on the size, see
> dma_kmalloc_needs_bounce().
> 
> Is there an iommu between the usb host controller and memory? The iommu
> code should do similar bouncing but it's had minimal testing.
The iMX8QXP does not come with an iommu.>
> --
> Catalin
> 
Thank you for the many debugging pointers, it will take me at least a 
few days to get through them all and produce results.

Best regards // John Ernberg


More information about the linux-arm-kernel mailing list