REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"

Pali Rohár pali at kernel.org
Mon Oct 3 14:35:27 PDT 2022


Hello!

On Monday 03 October 2022 23:30:31 Marcin Wojtas wrote:
> Hi Marek,
> 
> 
> pon., 3 paź 2022 o 17:33 Marek Behún <kabel at kernel.org> napisał(a):
> >
> > On Mon, 3 Oct 2022 15:11:44 +0100
> > "Russell King (Oracle)" <linux at armlinux.org.uk> wrote:
> >
> > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > > It seems that the null pointer dereference comes from the data variable
> > > > > having zero value. We assign
> > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > > >
> > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > I missing?
> > >
> > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > be the virtual address of the buffer.
> > >
> > > Each buffer supplied to the hardware buffer manager is supposed to
> > > contain the virtual address in the first 32-bit word in that buffer.
> > >
> > > This is done by mvneta_bm_construct():
> > >
> > >         /* In order to update buf_cookie field of RX descriptor properly,
> > >          * BM hardware expects buf virtual address to be placed in the
> > >          * first four bytes of mapped buffer.
> > >          */
> > >         *(u32 *)buf = (u32)buf;
> > >
> > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > >
> > > If I had to guess, I would suggest that this write is being lost via
> > > cache invalidation, and given that the hardware BM both reads and
> > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > DMA_BIDIRECTIONAL.
> > >
> 
> I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> the driver - the CPU doesn't access the payload afterward. The BM only
> pushes the pointers back and forth between internal SRAM ('internal
> pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> but afaik it should not touch the buffer contents. But maybe somehow
> it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> About the coherency itself - please see my comment below.
> 
> Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?

Hm... When you are talking about IO coherency... This reminds me that
there is a HW bug on A38x with undocumented errata related to IO
coherency. It is not mentioned in the official A38x errata document but
kernel has implemented some kind of workaround for it. But last time
when I tried to understand it I had feeling that it was implemented
incorrectly and maybe does not do what it is expected to do?

Maybe it should be revisited, now?

> > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > elsewhere in the mvneta_bm and mvneta driver.
> > >
> > > I'm not in a position where I could test that out. Marek?
> > >
> >
> > Hello Russell,
> >
> > thanks for your suggestion!
> >
> > Adding Pali, since he has some information (see at the end of this
> > message).
> >
> > The attached patch seems to solve the null-pointer dereference.
> 
> Did you manage to measure performance impact?
> 
> I have one overall concern here. On all kinds of A38x-based boards I
> worked on, by default, the firmware set all devices (e.g. network,
> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> used). Can you please try adding 'dma-coherent;' property under the
> 'internal-regs' node?
> 
> > I booted into single user mode and enabled eth2. Before it caused the
> > NULL pointer dereference after link got up, not it does not happen.
> >
> > But I am still encountering the freeze after booting into system.
> >
> > Maybe these are different bugs?
> >
> > I am thinking whether we don't need something similar like
> >   7bea67a99430 ("ARM: dts integrator: Fix DMA ranges")
> > also for mvebu.
> > I seem to remember Pali talking about how the ranges defined in some
> > upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> > Pali, what do you remember about this?
> >
> 
> Afaik there should be no issues around MBUS configuration for non-PCIE devices.
> 
> Best regards,
> Marcin



More information about the linux-arm-kernel mailing list