Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
Cristian Marussi
cristian.marussi at arm.com
Thu Sep 22 07:54:24 PDT 2022
On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote:
> Hi Christian,
>
Hi Shivnandan,
>
> Do you have any update or suggestion regarding thread https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019.
>
> We are still getting this issue and wanted to check if there is any fix
> that I can try.
>
Sorry this issue fell to the bottom of my list in these past months...
... but it stil on TODO :D
So today I tried to get my head around this issue again (i.e. mainly
re-reading the above thread to remind me what was the status and wth I
had written... :P)
In summary the racy thing seemed to be caused by the a delayed late
SCMI Base reply happily served on one core by scmi_rx_callback operating
on some well-defined SCMI channel, while on another core we are effectively
shutting down the system and destroying such channels: now this should
be clearly NOT be possible and it is what we have to synchronize.
Looking at the transport layer that you use, mailbox, I see that while
setup/free helpers are synchronized on an internal chan->lock, the RX
path inside the mailbox core is not, so I tried this:
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4229b9b5da98..bb6173c0ad54 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
*/
void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&chan->lock, flags);
/* No buffering the received data */
if (chan->cl->rx_callback)
chan->cl->rx_callback(chan->cl, mssg);
+ spin_unlock_irqrestore(&chan->lock, flags);
}
EXPORT_SYMBOL_GPL(mbox_chan_received_data);
... can you try on your setup ? I dont have a way to easily reproduce
your race as of now...
NOTE THAT, I am not still convinced that the above fix, if it works, will
constitute the final solution to this issue, I could maybe move this same
kind of sync up into the SCMI transport layer to avoid to impact all other
users of the above mailbox interface (since, as of today, nobody has
reported any issue like ours due to the missing spinlock..)..but it
could be helpful to test the above to verify that this is really where
the root issue is.
Thanks,
Cristian
More information about the linux-arm-kernel
mailing list