[PATCH v2] firmware: arm_scmi: Queue in scmi layer for mailbox implementation

Justin Chen justin.chen at broadcom.com
Fri Oct 11 12:15:07 PDT 2024



On 10/11/24 6:43 AM, Cristian Marussi wrote:
> On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote:
>> send_message() does not block in the MBOX implementation. This is
>> because the mailbox layer has its own queue. However, this confuses
>> the per xfer timeouts as they all start their timeout ticks in
>> parallel.
>>
>> Consider a case where the xfer timeout is 30ms and a SCMI transaction
>> takes 25ms.
>>
>> 0ms: Message #0 is queued in mailbox layer and sent out, then sits
>> at scmi_wait_for_message_response() with a timeout of 30ms
>> 1ms: Message #1 is queued in mailbox layer but not sent out yet.
>> Since send_message() doesn't block, it also sits at
>> scmi_wait_for_message_response() with a timeout of 30ms
>> ...
>> 25ms: Message #0 is completed, txdone is called and Message #1 is
>> sent out
>> 31ms: Message #1 times out since the count started at 1ms. Even
>> though it has only been inflight for 6ms.
>>
>> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
>> Signed-off-by: Justin Chen <justin.chen at broadcom.com>
>> ---
>>
>> Changes in v2:
> 
> Hi Justin,
> 
> thanks.
> 
> A few nitpicks and one remark down below.
> 
>>
>> - Added Fixes tag
>> - Improved commit message to better capture the issue
>>
>>   .../firmware/arm_scmi/transports/mailbox.c    | 21 +++++++++++++------
>>   1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
>> index 1a754dee24f7..30bc2865582f 100644
>> --- a/drivers/firmware/arm_scmi/transports/mailbox.c
>> +++ b/drivers/firmware/arm_scmi/transports/mailbox.c
>> @@ -33,6 +33,7 @@ struct scmi_mailbox {
>>   	struct mbox_chan *chan_platform_receiver;
>>   	struct scmi_chan_info *cinfo;
>>   	struct scmi_shared_mem __iomem *shmem;
>> +	struct mutex chan_lock;
> 
> Missing Doxygen comment....
> 
> arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox
> 
>>   };
>>   
>>   #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl)
>> @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
>>   	cl->rx_callback = rx_callback;
>>   	cl->tx_block = false;
>>   	cl->knows_txdone = tx;
>> +	mutex_init(&smbox->chan_lock);
> 
> This could be move at the end of this function after the channels are
> requested and it is no more possible to fail and bail out....messages
> wont flow and lock wont be used anyway until this chan_setup completes...
> ...BUT I have NOT string opinion about this....you can leave it here
> too...up to you
>>   
>>   	smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan);
>>   	if (IS_ERR(smbox->chan)) {
>> @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo,
>>   	struct scmi_mailbox *smbox = cinfo->transport_info;
>>   	int ret;
>>   
>> +	/*
>> +	 * The mailbox layer has it's own queue. However the mailbox queue confuses
>   				  its own queue
> 
>> +	 * the per message SCMI timeouts since the clock starts when the message is
>> +	 * submitted into the mailbox queue. So when multiple messages are queued up
>> +	 * the clock starts on all messages instead of only the one inflight.
>> +	 */
>> +	mutex_lock(&smbox->chan_lock);
>> +
>>   	ret = mbox_send_message(smbox->chan, xfer);
>>   
>>   	/* mbox_send_message returns non-negative value on success, so reset */
>>   	if (ret > 0)
>>   		ret = 0;
>> +	else
>> +		mutex_unlock(&smbox->chan_lock);
> 
> I think this should be
> 
> 	else if (ret < 0)
> 		mutex_unlock(&smbox->chan_lock);
> 
> ...since looking at mbox_send_message() and its implementation it returns
> NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means
> SUCCESS and we should not release the mutex (I think the 'ret' returned
> here is the idx from add_to_rbuf...so it will become zero peridiocally
> on normal successfull operation)
> 

Yes, I see the implementation. Looks like it returns the position in the 
ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a 
warning.

What about this?
if (ret >= 0)
	ret = 0
else
	mutex_unlock(&smbox->chan_lock);

A bit easier to read IMO.

Thanks,
Justin

>>   
>>   	return ret;
>>   }
>> @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret,
>>   {
>>   	struct scmi_mailbox *smbox = cinfo->transport_info;
>>   
>> -	/*
>> -	 * NOTE: we might prefer not to need the mailbox ticker to manage the
>> -	 * transfer queueing since the protocol layer queues things by itself.
>> -	 * Unfortunately, we have to kick the mailbox framework after we have
>> -	 * received our message.
>> -	 */
>>   	mbox_client_txdone(smbox->chan, ret);
>> +
>> +	/* Release channel */
>> +	mutex_unlock(&smbox->chan_lock);
>>   }
>>   
>>   static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
>> -- 
>> 2.34.1
>>
> 
> I gave it a go on a couple of JUNO, without any issues.
> 
> Other than the above, LGTM.
> 
> Reviewed-by: Cristian Marussi <cristian.marussi at arm.com>
> Tested-by: Cristian Marussi <cristian.marussi at arm.com>
> 
> Thanks,
> Cristian
> 




More information about the linux-arm-kernel mailing list