[PATCH] mailbox: Fix NULL message support in mbox_send_message()

Doug Anderson dianders at chromium.org
Fri Mar 27 13:24:38 PDT 2026


Jassi,

On Sun, Mar 22, 2026 at 10:18 AM <jassisinghbrar at gmail.com> wrote:
>
> From: Jassi Brar <jassisinghbrar at gmail.com>
>
> The active_req field serves double duty as both the "is a TX in
> flight" flag (NULL means idle) and the storage for the in-flight
> message pointer. When a client sends NULL via mbox_send_message(),
> active_req is set to NULL, which the framework misinterprets as
> "no active request". This breaks the TX state machine by:
>
>  - tx_tick() short-circuits on (!mssg), skipping the tx_done
>    callback and the tx_complete completion
>  - txdone_hrtimer() skips the channel entirely since active_req
>    is NULL, so poll-based TX-done detection never fires.
>
> Fix this by introducing a MBOX_NO_MSG sentinel value that means
> "no active request," freeing NULL to be valid message data. The
> sentinel is defined in the subsystem-internal mailbox.h so that
> controller drivers within drivers/mailbox/ can reference it, but
> it is not exposed to clients outside the subsystem.
>
> Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
> Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
> audited for regression:
>
>  - Most already work around the bug via knows_txdone=true with a
>    manual mbox_client_txdone() call, making the framework's
>    tracking irrelevant. These are unaffected.
>
>  - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
>    the poll timer now correctly detects NULL-active channels
>    instead of silently skipping them.
>
>  - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
>    caller that omitted the knows_txdone + mbox_client_txdone()
>    pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
>    missing mailbox TX done acknowledgment").
>
>  - No caller sets both a tx_done callback and sends NULL, nor
>    combines tx_block=true with NULL sends, so the newly reachable
>    callback/completion paths are never exercised.
>
> Also update tegra-hsp's flush callback, which directly inspects
> active_req to wait for the channel to drain: the old "!= NULL"
> check becomes "!= MBOX_NO_MSG", otherwise flush spins until
> timeout since the sentinel is non-NULL.
>
> The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> by clients.
>
> Signed-off-by: Jassi Brar <jassisinghbrar at gmail.com>
> ---
>  drivers/mailbox/mailbox.c   | 13 +++++++------
>  drivers/mailbox/mailbox.h   |  3 +++
>  drivers/mailbox/tegra-hsp.c |  2 +-
>  3 files changed, 11 insertions(+), 7 deletions(-)

This looks reasonable to me. I have one nit, though. Can you please
add a snippet to the beginning of mbox_send_message() that looks like:

if (mssg == MBOX_NO_MSG)
  return -EINVAL

I just want to ensure a client doesn't decide to simulate the
old/weird behavior by sending this sentinel value. ;-)

Other than that:

Reviewed-by: Douglas Anderson <dianders at chromium.org>


-Doug



More information about the linux-arm-kernel mailing list