[PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation

Cristian Marussi cristian.marussi at arm.com
Tue Oct 8 05:10:39 PDT 2024


On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote:
> 
> 
> On 10/7/24 6:10 AM, Cristian Marussi wrote:
> > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote:
> > > On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote:
> > > > The mailbox layer has its own queue. However this confuses the per
> > > > message timeouts since the clock starts ticking the moment the messages
> > > > get queued up. So all messages in the queue have there timeout clocks
> > > > ticking instead of only the message inflight. To fix this, lets move the
> > > > queue back into the SCMI layer.
> > > > 
> > > 
> > > I think this has come up in the past. We have avoided adding addition
> > > locking here as the mailbox layer takes care of it. Has anything changed
> > > recently ?
> > 
> > I asked for an explanation in my reply (we crossed each other mails probably)
> > since it alredy came up in the past a few times and central locking seemed not
> > to be needed...here the difference is about the reason...Justin talks about
> > message timeouts related to the queueing process..so I asked to better
> > explain the detail (and the anbomaly observed) since it still does not
> > seem to me that even in this case the lock is needed....anyway I can
> > definitely be woring of course :D
> > 
> 
> Hello Cristian,
> 

Hi Justin,

> Thanks for the response. I'll try to elaborate.
> 
> When comparing SMC and mailbox transport, we noticed mailbox transport
> timesout much quicker when under load. Originally we thought this was the
> latency of the mailbox implementation, but after debugging we noticed a
> weird behavior. We saw SMCI transactions timing out before the mailbox even
> transmitted the message.
> 
> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c
> do_xfer() function.
> 
> The fundamental issue is send_message() blocks for SMC transport, but
> doesn't block for mailbox transport. So if send_message() doesn't block we
> can have multiple messages waiting at scmi_wait_for_message_response().
> 

oh...yes...now I can see it...tx_prepare is really never called given
how the mailbox subsystem de-queues messages once at time...so we end up
waiting for a reply to some message that is still to be sent...so the
message inflight is really NOT corrupted because the next remain pending
until the reply in the shmem is read back , BUT the timeout will drift away
if you multiple inflights are pending to be sent...

> SMC looks like this
> CPU #0 SCMI message 0 -> calls send_message() then calls
> scmi_wait_for_message_response(), timesout after 30ms.
> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0
> to complete.
> 
> Mailbox looks like this
> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up
> message, mailbox layer sees no message is outgoing and sends it. CPU waits
> at scmi_wait_for_message_response(), timesout after 30ms
> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up
> message, mailbox layer sees message pending, hold message in queue. CPU
> waits at scmi_wait_for_message_response(), timesout after 30ms.
> 
> Lets say if transport takes 25ms. The first message would succeed, the
> second message would timeout after 5ms.
> 
> Hopefully this makes sense.

Yes, of course, thanks, for reporting this, and taking time to
explain...

...in general the patch LGTM...I think your patch is good also because it
could be easily backported as a fix....can you add a Fixes tag in your
next version ?

Also can you explain in more detail the issue and the solution in the commit
message....that will help having it merged as a Fix in stables...

...for the future (definitely NOT in this series) we could probably think to
get rid of the sleeping mutex in favour of some other non-sleeping form of
mutual exclusion around the channnel (like in SMC transport) and enable
(optionally) Atomic transmission support AND also review if the shmem
layer busy-waiting in txprepare is anymore needed at all...

Thanks,
Cristian



More information about the linux-arm-kernel mailing list