ARM64: Question: How to map non-shareable memory

Catalin Marinas catalin.marinas at arm.com
Thu May 25 01:30:27 PDT 2023


Hi David,

On Wed, May 24, 2023 at 05:33:59PM -0700, David Clear wrote:
> On Wed, 24 May 2023 at 23:59, Ard Biesheuvel <ardb at kernel.org> wrote:
> > Non-shareable cacheable mappings are problematic because they are not
> > covered by the hardware coherency protocol that keeps caches
> > synchronized between CPUs and cluster-level and system-level caches.
> > (IOW, accesses to non-shareable mappings will have snooping disabled).
> >
> > This means that, unless your system only has a single CPU and does not
> > support cache coherent DMA at all, the cached view of those RAM
> > regions will go out of sync between CPUs and wrt other coherent
> > masters, which is probably not what you're after.
> 
> Hi Ard. Thanks for the quick reply.
> 
> I understand your concerns. The general Linux memory within the
> (multi-cluster) system is fully coherent, and there are no surprises
> w.r.t normal SMP system operation and device DMA.
> 
> The non-coherent memories are outside of the general Linux pool, owned
> by autonomous hardware units, and are used for product-specific purposes.
> These memories are either internal to the units (far away from coherence
> machinery) or purposefully avoid the system coherency controllers so as
> to not incur the latecy tax in back-to-back dependent transactions. In
> this product it would be a significant performance burden to maintain
> coherence with ARM caches that are essentially nothing to do with these
> unit's operations.

Are these memories bus masters themselves? I doubt it. My guess is that
such memory is also accessed by a device that cannot maintain coherency
with the CPU caches. So IIUC you want a cached mapping from the CPU side
for performance reason but treat it non-coherent from a DMA perspective.
For some hardware reason, shareable cacheable transactions to such
memory trigger SErrors. Do you know why this is the case? Because any
other non-cacheable transactions are considered shareable anyway. Or is
it that out shareable is fine but inner shareable is not? The Arm CPUs
don't really distinguish between these AFAIK.

> For the userspace software that needs to access this memory, the current
> non-cached mapping is obtained via a device driver and the goal is
> to minimize the number of discrete memory transactions by supporting
> cached burst-reads and burst-writes, bracketed with appropriate cache
> maintenance ops. There are already private caches within the hardware
> pipelines that software needs to be explicitly flush or invalidate,
> so this is just one more thing.

I agree with Ard, such mapping won't work. When you mark it as
non-shareable, it tells the CPU that the cache lines for that mapping
are not shared with other CPUs, they don't participate in the cache
coherency protocols. Any cache maintenance to PoC is also limited to
that CPU. See "Effects of instructions that operate by VA to the PoC" in
the latest Arm ARM (page D7-5784).

So let's say that your user process starts reading from such mapping
(potentially speculatively) but doing some DC IVAC before (it needs to
be in the kernel). The process is than migrated by the kernel to another
CPU which has stale cache lines for that range because the DC IVAC only
affected the first CPU. Similarly with the writes, you can't guarantee
that the write and the DC CVAC happen on the same CPU. I also have no
idea how some "transparent" system caches behave here, whether they do
anything on the DC instructions and how shareability changes their
behaviour.

Your best bet is Normal Non-cacheable here. On newer architecture
versions Arm introduced ST64B/LD64B for similar performance reasons
(FEAT_LS64 in Armv8.7) but I don't think there's hardware yet.

-- 
Catalin



More information about the linux-arm-kernel mailing list