[PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation

Justin Chen justin.chen at broadcom.com
Tue Oct 8 12:40:37 PDT 2024



On 10/8/24 12:23 PM, Justin Chen wrote:
> 
> 
> On 10/8/24 6:37 AM, Cristian Marussi wrote:
>> On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote:
>>> On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote:
>>>> On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote:
>>>>> On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote:
>>>>>> Thanks for the response. I'll try to elaborate.
>>>>>>
>>>>>> When comparing SMC and mailbox transport, we noticed mailbox 
>>>>>> transport
>>>>>> timesout much quicker when under load. Originally we thought this 
>>>>>> was the
>>>>>> latency of the mailbox implementation, but after debugging we 
>>>>>> noticed a
>>>>>> weird behavior. We saw SMCI transactions timing out before the 
>>>>>> mailbox even
>>>>>> transmitted the message.
>>>>>>
>>>>>> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c
>>>>>> do_xfer() function.
>>>>>>
>>>>>> The fundamental issue is send_message() blocks for SMC transport, but
>>>>>> doesn't block for mailbox transport. So if send_message() doesn't 
>>>>>> block we
>>>>>> can have multiple messages waiting at 
>>>>>> scmi_wait_for_message_response().
>>>>>>
>>>>>
>>>>> oh...yes...now I can see it...tx_prepare is really never called given
>>>>> how the mailbox subsystem de-queues messages once at time...so we 
>>>>> end up
>>>>> waiting for a reply to some message that is still to be sent...so the
>>>>> message inflight is really NOT corrupted because the next remain 
>>>>> pending
>>>>> until the reply in the shmem is read back , BUT the timeout will 
>>>>> drift away
>>>>> if you multiple inflights are pending to be sent...
>>>>>
>>>>
>>>> Indeed.
>>>>
>>>>>> SMC looks like this
>>>>>> CPU #0 SCMI message 0 -> calls send_message() then calls
>>>>>> scmi_wait_for_message_response(), timesout after 30ms.
>>>>>> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI 
>>>>>> message 0
>>>>>> to complete.
>>>>>>
>>>>>> Mailbox looks like this
>>>>>> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer 
>>>>>> queues up
>>>>>> message, mailbox layer sees no message is outgoing and sends it. 
>>>>>> CPU waits
>>>>>> at scmi_wait_for_message_response(), timesout after 30ms
>>>>>> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer 
>>>>>> queues up
>>>>>> message, mailbox layer sees message pending, hold message in 
>>>>>> queue. CPU
>>>>>> waits at scmi_wait_for_message_response(), timesout after 30ms.
>>>>>>
>>>>>> Lets say if transport takes 25ms. The first message would succeed, 
>>>>>> the
>>>>>> second message would timeout after 5ms.
>>>>>>
>>>>>> Hopefully this makes sense.
>>>>>
>>>>> Yes, of course, thanks, for reporting this, and taking time to
>>>>> explain...
>>>>>
>>>>> ...in general the patch LGTM...I think your patch is good also 
>>>>> because it
>>>>> could be easily backported as a fix....can you add a Fixes tag in your
>>>>> next version ?
>>>>>
>>>>
>>>> Are you seeing this issue a lot ? IOW, do we need this to be 
>>>> backported ?
>>>>
> 
> I wouldn't say a lot. But we are seeing it with standard use of our 
> devices running over an extended amount of time. Yes we would like this 
> backported.
> 

Any advice on what Fixes tag to add here? Since the driver was moved to 
the transport folder, I assume this is going to be a manual backport. 
Not sure if the Fixes tag should be when the SCMI MBOX transport was 
moved to a standalone driver in the transport folder or the very first 
SCMI commits since the MBOX transport was added then.

Thanks,
Justin

<snip>




More information about the linux-arm-kernel mailing list