[PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation
Justin Chen
justin.chen at broadcom.com
Mon Oct 7 10:58:47 PDT 2024
On 10/7/24 6:10 AM, Cristian Marussi wrote:
> On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote:
>> On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote:
>>> The mailbox layer has its own queue. However this confuses the per
>>> message timeouts since the clock starts ticking the moment the messages
>>> get queued up. So all messages in the queue have there timeout clocks
>>> ticking instead of only the message inflight. To fix this, lets move the
>>> queue back into the SCMI layer.
>>>
>>
>> I think this has come up in the past. We have avoided adding addition
>> locking here as the mailbox layer takes care of it. Has anything changed
>> recently ?
>
> I asked for an explanation in my reply (we crossed each other mails probably)
> since it alredy came up in the past a few times and central locking seemed not
> to be needed...here the difference is about the reason...Justin talks about
> message timeouts related to the queueing process..so I asked to better
> explain the detail (and the anbomaly observed) since it still does not
> seem to me that even in this case the lock is needed....anyway I can
> definitely be woring of course :D
>
Hello Cristian,
Thanks for the response. I'll try to elaborate.
When comparing SMC and mailbox transport, we noticed mailbox transport
timesout much quicker when under load. Originally we thought this was
the latency of the mailbox implementation, but after debugging we
noticed a weird behavior. We saw SMCI transactions timing out before the
mailbox even transmitted the message.
This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c
do_xfer() function.
The fundamental issue is send_message() blocks for SMC transport, but
doesn't block for mailbox transport. So if send_message() doesn't block
we can have multiple messages waiting at scmi_wait_for_message_response().
SMC looks like this
CPU #0 SCMI message 0 -> calls send_message() then calls
scmi_wait_for_message_response(), timesout after 30ms.
CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI
message 0 to complete.
Mailbox looks like this
CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up
message, mailbox layer sees no message is outgoing and sends it. CPU
waits at scmi_wait_for_message_response(), timesout after 30ms
CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up
message, mailbox layer sees message pending, hold message in queue. CPU
waits at scmi_wait_for_message_response(), timesout after 30ms.
Lets say if transport takes 25ms. The first message would succeed, the
second message would timeout after 5ms.
Hopefully this makes sense.
Justin
More information about the linux-arm-kernel
mailing list