[PATCH] mailbox: Fix NULL message support in mbox_send_message()
Jassi Brar
jassisinghbrar at gmail.com
Tue Mar 31 22:07:50 PDT 2026
On Tue, Mar 31, 2026 at 5:08 AM Joonwon Kang <joonwonkang at google.com> wrote:
>
> > The active_req field serves double duty as both the "is a TX in
> > flight" flag (NULL means idle) and the storage for the in-flight
> > message pointer. When a client sends NULL via mbox_send_message(),
> > active_req is set to NULL, which the framework misinterprets as
> > "no active request". This breaks the TX state machine by:
> >
> > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > callback and the tx_complete completion
> > - txdone_hrtimer() skips the channel entirely since active_req
> > is NULL, so poll-based TX-done detection never fires.
> >
> > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > "no active request," freeing NULL to be valid message data. The
> > sentinel is defined in the subsystem-internal mailbox.h so that
> > controller drivers within drivers/mailbox/ can reference it, but
> > it is not exposed to clients outside the subsystem.
> >
> > Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
> > Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
> > audited for regression:
> >
> > - Most already work around the bug via knows_txdone=true with a
> > manual mbox_client_txdone() call, making the framework's
> > tracking irrelevant. These are unaffected.
> >
> > - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
> > the poll timer now correctly detects NULL-active channels
> > instead of silently skipping them.
> >
> > - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
> > caller that omitted the knows_txdone + mbox_client_txdone()
> > pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
> > missing mailbox TX done acknowledgment").
> >
> > - No caller sets both a tx_done callback and sends NULL, nor
> > combines tx_block=true with NULL sends, so the newly reachable
> > callback/completion paths are never exercised.
> >
> > Also update tegra-hsp's flush callback, which directly inspects
> > active_req to wait for the channel to drain: the old "!= NULL"
> > check becomes "!= MBOX_NO_MSG", otherwise flush spins until
> > timeout since the sentinel is non-NULL.
> >
> > The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> > by clients.
> >
> > Reported-by: Joonwon Kang <joonwonkang at google.com>
> > Reviewed-by: Douglas Anderson <dianders at chromium.org>
> > Signed-off-by: Jassi Brar <jassisinghbrar at gmail.com>
>
> Do you have plans to backport this patch to other stable versions?
> If not, I can send the backport for you to the stable versions that are in my needs.
>
Please feel free to do so. Thanks for the help.
More information about the linux-arm-kernel
mailing list