[PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation
Sudeep Holla
sudeep.holla at arm.com
Tue Oct 8 06:34:59 PDT 2024
On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote:
> On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote:
> > On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote:
> > > Thanks for the response. I'll try to elaborate.
> > >
> > > When comparing SMC and mailbox transport, we noticed mailbox transport
> > > timesout much quicker when under load. Originally we thought this was the
> > > latency of the mailbox implementation, but after debugging we noticed a
> > > weird behavior. We saw SMCI transactions timing out before the mailbox even
> > > transmitted the message.
> > >
> > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c
> > > do_xfer() function.
> > >
> > > The fundamental issue is send_message() blocks for SMC transport, but
> > > doesn't block for mailbox transport. So if send_message() doesn't block we
> > > can have multiple messages waiting at scmi_wait_for_message_response().
> > >
> >
> > oh...yes...now I can see it...tx_prepare is really never called given
> > how the mailbox subsystem de-queues messages once at time...so we end up
> > waiting for a reply to some message that is still to be sent...so the
> > message inflight is really NOT corrupted because the next remain pending
> > until the reply in the shmem is read back , BUT the timeout will drift away
> > if you multiple inflights are pending to be sent...
> >
>
> Indeed.
>
> > > SMC looks like this
> > > CPU #0 SCMI message 0 -> calls send_message() then calls
> > > scmi_wait_for_message_response(), timesout after 30ms.
> > > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0
> > > to complete.
> > >
> > > Mailbox looks like this
> > > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up
> > > message, mailbox layer sees no message is outgoing and sends it. CPU waits
> > > at scmi_wait_for_message_response(), timesout after 30ms
> > > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up
> > > message, mailbox layer sees message pending, hold message in queue. CPU
> > > waits at scmi_wait_for_message_response(), timesout after 30ms.
> > >
> > > Lets say if transport takes 25ms. The first message would succeed, the
> > > second message would timeout after 5ms.
> > >
> > > Hopefully this makes sense.
> >
> > Yes, of course, thanks, for reporting this, and taking time to
> > explain...
> >
> > ...in general the patch LGTM...I think your patch is good also because it
> > could be easily backported as a fix....can you add a Fixes tag in your
> > next version ?
> >
>
> Are you seeing this issue a lot ? IOW, do we need this to be backported ?
>
> > Also can you explain in more detail the issue and the solution in the commit
> > message....that will help having it merged as a Fix in stables...
> >
> > ...for the future (definitely NOT in this series) we could probably think to
> > get rid of the sleeping mutex in favour of some other non-sleeping form of
> > mutual exclusion around the channnel (like in SMC transport) and enable
> > (optionally) Atomic transmission support AND also review if the shmem
> > layer busy-waiting in txprepare is anymore needed at all...
> >
>
> Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting
> in tx_prepare and the associated details in the comment as this locking
> voids that. It is better have both the changes in the same patch to indicate
> the relation between them.
Actually scratch that last point. The waiting in tx_prepare until the platform
marks it free for agent to use is still needed. One usecase is when agent/OS
times out but platform continues to process and eventually releases the shmem.
Sorry I completely forgot about that.
--
Regards,
Sudeep
More information about the linux-arm-kernel
mailing list