[PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path
Tudor Ambarus
tudor.ambarus at linaro.org
Tue Apr 28 05:57:39 PDT 2026
Hi, Krzysztof,
Thanks for the review! I indeed missed something in the acquire path,
details below.
On 4/28/26 12:52 PM, Krzysztof Kozlowski wrote:
> On 27/04/2026 17:04, Tudor Ambarus wrote:
>> Sashiko identified a memory ordering race in RX path [1].
>>
>> When draining the RX queue or reading saved responses, the driver uses
>> clear_bit() to release the sequence number back to the available pool.
>> However, on weakly ordered architectures like ARM64, clear_bit() does
>> not provide implicit memory barriers.
>
> And it does not have to if entire access is synchronized by other locks.
Right.
> You need to analyze also this and mention here path which is not
> synchronized and uses these weakly ordered atomic operations.
>
The TX path is protected by tx_lock, and the RX path is protected by
rx_lock. Because the bitmap_seqnum is modified by the RX thread
(clearing the bit) and the TX thread (setting the bit), the bitmap is
accessed across two different lock domains. Therefore, from the
bitmap's perspective, the synchronization is entirely lockless, and
explicit memory barriers are required. I'll add a comment in the
commit message.
>>
>> This allows the CPU to reorder instructions, making the cleared bit
>> globally visible before the preceding memory operations (memcpy() or
>> __ioread32_copy()) have completed. If a concurrent thread allocates the
>> newly freed sequence number, it can execute acpm_prepare_xfer() and
>> zero out the buffer via memset() while the RX thread is still actively
>> reading from it, leading to silent data corruption.
>>
>> Fix this by replacing clear_bit() with clear_bit_unlock() across the
>> RX path. This provides release semantics, guaranteeing that all prior
>> memory reads and writes are fully completed and visible before the
>> sequence number is marked as free.
>
> Barriers should be paired and release is paired with acquire.
> bitmap_seqnum() is used with test_bit() and a separate set_bit(), which
> do not have acquire semantics, although in some calls it is within lock.
> Problem is I guess acpm_dequeue_by_polling() which is called without any
> locks.
>
> This means that other thread won't see updated values.
>
> I think you also need to investigate and fix that acquire path.
:) thanks for the challenge!
In acpm_dequeue_by_polling(), zero-lenght messages (rxcnt == 0) can have
their bits cleared by a concurrent thread draining the queue. If we have
our thread sitting in the do...while loop and calling test_bit() we risk
speculative execution of the caller's subsequent instructions before the
bit actually flips to zero. The fix is to s/test_bit()/test_bit_acquire().
In what concerns acpm_prepare_xfer(), where we use set_bit(), I find it
safe as it is, I don't think we need an acquire barrier. By the time the
test_bit() loop observes a 0, the RX's clear_bit_unlock() has guaranteed
that the payload was copied out. The rx_data->cmd buffer is dead.
Another thing that I thought about was about the reordering of memset
and set_bit in acpm_prepare_xfer(), but even there, the internal
execution order doesn't matter. Both are guaranteed to be completed
before writel (wmb). One may notice that I even reordered the memset and
set_bit in patch 6/6.
Also, test_bit() in acpm_prepare_xfer() will be replaced in patch 6/6
by find_next_zero_bit(), they are functionally equivalent.
I'll send a v3, s/test_bit()/test_bit_acquire()/ in RX path and update
the commit message with the details described above.
Cheers,
ta
More information about the linux-arm-kernel
mailing list