[PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support

David Laight david.laight.linux at gmail.com
Wed Nov 12 06:01:57 PST 2025


On Wed, 12 Nov 2025 12:28:01 +0000
Mark Rutland <mark.rutland at arm.com> wrote:

> On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:
> > From: Weili Qian <qianweili at huawei.com>
> > 
> > Starting from ARMv8.4, stp and ldp instructions become atomic.  
> 
> That's not true for accesses to Device memory types.
> 
> Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in
> Armv8.4"):
> 
>   If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load
>   or store two 64-bit registers are single-copy atomic when all of the
>   following conditions are true:
>   • The overall memory access is aligned to 16 bytes.
>   • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
> 
> IIUC when used for Device memory types, those can be split, and a part
> of the access could be replayed multiple times (e.g. due to an
> intetrupt).

That can't be right.
IO accesses can reference hardware FIFO so must only happen once.
(Or is 'Device memory' something different from 'Device register'?
I'm also not sure that the bus cycles could get split by an interrupt,
that would require a mid-instruction interrupt - very unlikely.
Interleaving is most likely to come from another cpu.

More interesting would be whether the instructions generate a single
PCIe TLP? (perhaps even only most of the time.)
PCIe reads are high latency, anything that can be done to increase the
size of the TLP improves PIO throughput massively.

	David

> 
> I don't think we can add this generally. It is not atomic, and not
> generally safe.
> 
> Mark.
...



More information about the linux-arm-kernel mailing list