[PATCH v5 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer

Fri May 29 04:20:22 PDT 2026

On 5/29/26 11:25 AM, Arnd Bergmann wrote:
> On Fri, May 29, 2026, at 09:47, Krzysztof Kozlowski wrote:
>> On 28/05/2026 19:44, Arnd Bergmann wrote:
>>> On Tue, May 5, 2026, at 15:13, Tudor Ambarus wrote:
>>>> Sashiko identified a silent data corruption in [1].
>>>>
>>>> In acpm_get_rx(), the driver reads the response payload from SRAM using
>>>> __ioread32_copy() and subsequently updates the hardware RX rear pointer
>>>> via writel().
>>>>
>>>> On weakly ordered architectures like ARM64, writel() provides a write
>>>> memory barrier (wmb()), which strictly orders prior writes against
>>>> subsequent writes. However, it does not order prior reads against
>>>> subsequent writes. Consequently, the CPU is permitted to reorder the
>>>> writel() store to become globally visible before the payload reads
>>>> have completed.
>>>
>>> I am very confused by this after seeing it in the Exynos fixes pull
>>> request. How would anything get reordered here? What I see is that
>>>
>>> - The SRAM is device memory, so any access to it is architecturally
>>>   ordered against other accesses to the same device. Even on
>>>   architectures that don't guarantee this, Linux I/O accessors
>>>   do.
>>
>> Well, __ioread32_copy does not guarantee that, I think. That's the
>> relaxed version.
> 
> __ioread32_copy() certainly does not guarantee the ordering within
> the block, and I think you are right that we don't properly document
> the ordering between a __raw_readl() and following writel(), but
> as far as I can tell all implementations do provide strict
> ordering here because either the MMIO load/store instructions are
> architecturally ordered (x86, arm64, ...) or there are sufficient
> barriers in the writel() to serialize the __raw_readl() as well
> (mips, alpha, ...).
> 
> [side note: there is a difference between __raw_readl() and
> readl_relaxed() here. The _relaxed MMIO operations are required
> to to be serialized with each other but not against memory
> accesses, while the __raw_ version used here provides neither
> guarantee]
> 
>>>   after the read (because of the data dependency).
>>
>> I don't see the data dependency regarding the write. We read 'rx_front'
>> and 'i' in the loop. The 'i' is used for subsequent read (addr = base +
>> mlen*i) and that's dependency, but that 'addr' is not used in any
>> further writes.
> 
> What I meant is that the store into the target memory buffer
> (xfer->rxd or rx_data->cmd) depends on the data being read from
> MMIO first. The writel() guarantees that this buffer is visible
> to all DMA masters in the system and that can only happen when
> the __raw_readl() has provided the data first.
> 
Thanks both for the detailed explanations.

I missed the data dependency chain. I focused too much on the read
part in __ioread32_copy() that I missed the RAM store implications
in it. The RAM store is forced to wait for its SRAM load, and the
writel is forced to wait for all the RAM stores. So the entire
payload is guaranteed to be visible in memory RAM before the writel.

Maybe I thought about the reordering of the final __raw_readl() loop
iteration with the writel(). But the dma_wmb() -> __dma_wmb() ->
dmb(oshst) from writel has a compiler barrier, so the compiler can't
reorder the code. And given the ARM64 device memory accesses ordering,
the ordering is protected.

My bad, sorry. We shall either drop or revert the patch. Please let
me know if you prefer a revert.

Thanks,
ta