[PATCH v3 6/6] firmware: samsung: acpm: Fix memory ordering races in RX and polling paths
Tudor Ambarus
tudor.ambarus at linaro.org
Wed Apr 29 06:11:55 PDT 2026
Sashiko identified a memory ordering race in the RX path [1].
Sequence numbers are allocated by the TX thread and freed by the RX
thread. Because the TX path is protected by 'tx_lock' and the RX path
is protected by 'rx_lock', the shared 'bitmap_seqnum' is modified
across two separate lock domains. Thus, the operations acting on the
bitmap are effectively lockless and require explicit memory barriers.
This patch addresses missing barriers in two paths:
1. The Release Path (RX thread):
When draining the RX queue, the driver reads the payload and uses a
relaxed clear_bit() to free the sequence number. On weakly ordered
architectures like ARM64, this allows the CPU to make the cleared bit
globally visible before the preceding memory reads (memcpy or
__ioread32_copy) have completed. If a concurrent TX thread allocates
the newly freed sequence number, it can execute memset() and corrupt
the buffer while the RX thread is still actively reading from it.
Fix this by replacing clear_bit() with clear_bit_unlock() to enforce
Release semantics.
2. The Acquire Path (Polling thread):
In polling mode, zero-length messages (rxcnt == 0) can have their bits
cleared by a concurrent thread that happens to be draining the queue.
The polling thread waits on test_bit(). Because test_bit() lacks an
acquire barrier, the CPU can speculatively execute the client driver's
subsequent instructions before the RX thread's memory updates are
globally visible. Fix this by pairing the release with
test_bit_acquire().
Note that the TX allocation path (acpm_prepare_xfer) is safe as-is
and does not require an explicit acquire barrier (like
test_and_set_bit_lock) for two reasons:
* Address Dependency: The CPU mathematically cannot calculate the
destination pointer for the memset() until the non-atomic
find_next_zero_bit() returns the index, naturally preventing
speculative execution of the buffer wipe.
* Lock Boundaries: The visibility of the atomic set_bit() to the next
TX thread is safely protected by the 'tx_lock' boundaries
(specifically the Release semantics of mutex_unlock).
Cc: stable at vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus at linaro.org>
---
drivers/firmware/samsung/exynos-acpm.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index b8a4978b091b..15627b439838 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -7,7 +7,7 @@
#include <linux/bitfield.h>
#include <linux/bitmap.h>
-#include <linux/bits.h>
+#include <linux/bitops.h>
#include <linux/cleanup.h>
#include <linux/container_of.h>
#include <linux/delay.h>
@@ -207,7 +207,7 @@ static void acpm_get_saved_rx(struct acpm_chan *achan,
if (rx_seqnum == tx_seqnum) {
memcpy(xfer->rxd, rx_data->cmd, xfer->rxcnt * sizeof(*xfer->rxd));
- clear_bit(rx_seqnum - 1, achan->bitmap_seqnum);
+ clear_bit_unlock(rx_seqnum - 1, achan->bitmap_seqnum);
}
}
@@ -268,7 +268,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
if (rx_seqnum == tx_seqnum) {
__ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
rx_set = true;
- clear_bit(seqnum, achan->bitmap_seqnum);
+ clear_bit_unlock(seqnum, achan->bitmap_seqnum);
} else {
/*
* The RX data corresponds to another request.
@@ -280,7 +280,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
rx_data->rxcnt);
}
} else {
- clear_bit(seqnum, achan->bitmap_seqnum);
+ clear_bit_unlock(seqnum, achan->bitmap_seqnum);
}
i = (i + 1) % achan->qlen;
@@ -322,7 +322,14 @@ static int acpm_dequeue_by_polling(struct acpm_chan *achan,
if (ret)
return ret;
- if (!test_bit(seqnum - 1, achan->bitmap_seqnum))
+ /*
+ * For zero-length messages (rxcnt == 0), the bit can be
+ * cleared by a concurrent thread draining the queue. Use
+ * test_bit_acquire() to prevent the CPU from speculatively
+ * executing the caller's subsequent instructions before the
+ * hardware transaction is fully synchronized.
+ */
+ if (!test_bit_acquire(seqnum - 1, achan->bitmap_seqnum))
return 0;
/* Determined experimentally. */
@@ -392,13 +399,24 @@ static int acpm_prepare_xfer(struct acpm_chan *achan,
}
}
- /* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
+ /*
+ * Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62). We do
+ * not need an explicit acquire barrier here because visibility to the
+ * next TX thread is safely protected by the tx_lock boundaries
+ * (mutex_unlock provides Release semantics). The RX thread only
+ * blind-clears bits and doesn't care about this.
+ */
achan->seqnum = bit + 1;
set_bit(bit, achan->bitmap_seqnum);
txd[0] |= FIELD_PREP(ACPM_PROTOCOL_SEQNUM, achan->seqnum);
- /* Clear data for upcoming responses */
+ /*
+ * Clear data for upcoming responses. Speculative execution of memset()
+ * is prevented by the strict Address Dependency (implicit barrier) on
+ * 'bit'. The CPU mathematically cannot calculate the destination
+ * pointer until find_next/first_zero_bit() returns.
+ */
rx_data = &achan->rx_data[bit];
memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
/* zero means no response expected */
--
2.54.0.545.g6539524ca2-goog
More information about the linux-arm-kernel
mailing list