[PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation

Justin Chen justin.chen at broadcom.com
Mon Oct 7 10:58:47 PDT 2024



On 10/7/24 6:10 AM, Cristian Marussi wrote:
> On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote:
>> On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote:
>>> The mailbox layer has its own queue. However this confuses the per
>>> message timeouts since the clock starts ticking the moment the messages
>>> get queued up. So all messages in the queue have there timeout clocks
>>> ticking instead of only the message inflight. To fix this, lets move the
>>> queue back into the SCMI layer.
>>>
>>
>> I think this has come up in the past. We have avoided adding addition
>> locking here as the mailbox layer takes care of it. Has anything changed
>> recently ?
> 
> I asked for an explanation in my reply (we crossed each other mails probably)
> since it alredy came up in the past a few times and central locking seemed not
> to be needed...here the difference is about the reason...Justin talks about
> message timeouts related to the queueing process..so I asked to better
> explain the detail (and the anbomaly observed) since it still does not
> seem to me that even in this case the lock is needed....anyway I can
> definitely be woring of course :D
> 

Hello Cristian,

Thanks for the response. I'll try to elaborate.

When comparing SMC and mailbox transport, we noticed mailbox transport 
timesout much quicker when under load. Originally we thought this was 
the latency of the mailbox implementation, but after debugging we 
noticed a weird behavior. We saw SMCI transactions timing out before the 
mailbox even transmitted the message.

This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c 
do_xfer() function.

The fundamental issue is send_message() blocks for SMC transport, but 
doesn't block for mailbox transport. So if send_message() doesn't block 
we can have multiple messages waiting at scmi_wait_for_message_response().

SMC looks like this
CPU #0 SCMI message 0 -> calls send_message() then calls 
scmi_wait_for_message_response(), timesout after 30ms.
CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI 
message 0 to complete.

Mailbox looks like this
CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up 
message, mailbox layer sees no message is outgoing and sends it. CPU 
waits at scmi_wait_for_message_response(), timesout after 30ms
CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up 
message, mailbox layer sees message pending, hold message in queue. CPU 
waits at scmi_wait_for_message_response(), timesout after 30ms.

Lets say if transport takes 25ms. The first message would succeed, the 
second message would timeout after 5ms.

Hopefully this makes sense.

Justin





More information about the linux-arm-kernel mailing list