[RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ

Radu Rendec rrendec at redhat.com
Wed Jan 22 13:10:47 PST 2025


Hi Cristian,

On Wed, 2025-01-22 at 12:14 +0000, Cristian Marussi wrote:
> On Tue, Jan 21, 2025 at 09:19:37PM -0500, Radu Rendec wrote:
> 
> > On Tue, 2025-01-21 at 10:49 +0000, Cristian Marussi wrote:
> > > mmm...I dont think you can do this....it is a bit tricky to explain BUT
> > > the optional txdone_irq in the mailbox subsystem is NOT a completion
> > > interrupt as intended in the SCMI world, I'll try to explain why in the
> > > following.
> 
> [snip]
> 
> > > Sorry for the not so short mail :P ... but if I am missing something
> > > and you are seeing anomalies, please let us know.
> > 
> > No need to apologize, and many thanks for taking the time to explain
> > everything in such detail. Much appreciated! What you explained makes
> > perfect sense, and now I clearly understand why the changes I proposed
> > would not work.
> > 
> > I have quite the opposite problem. I'm using a mailbox controller that
> > does *not* provide a txACK interrupt (or any kind of interrupt for that
> > matter), and the problem is that SCMI transfers do not complete because
> > scmi_wait_for_reply() times out.
> 
> Do you mean a unidirectional mailbox controller (like ARM MHUv3) that
> allow for bidirectionality ONLY if you effectively use a pair of them,
> one for each direction ?

Yes, that's what I meant. But I had a very distorted understanding of
how that worked, and now it's much clearer after reading your email,
looking at arm,scmi.yaml and staring more at the code :) Please see
further details below.


[snip]

> > Since xfer->hdr.poll_completion defaults to false, I looked at what
> > could trigger the completion that scmi_wait_for_reply() waits on, in
> > the case of a mailbox transport. The way it's set up is quite
> > convoluted but if I'm reading the code correctly:
> >  * core->rx_callback (in mailbox.c) is set to scmi_rx_callback()
> >  * chan->cl->rx_callback is set to rx_callback() (in mailbox.c)
> > Therefore, the (reverse) call path leading to completion would be:
> > complete(&xfer->done)
> > scmi_handle_response()
> > scmi_rx_callback()
> > rx_callback()
> > mbox_chan_received_data()
> > 
> > If I understand correctly, mbox_chan_received_data() is supposed to be
> > called by the mailbox provider driver when the mailbox receives data
> > e.g. the SCMI server rings the bell to indicate completion. But doesn't
> > this mean that the completion notification (through mailbox doorbell)
> > is expected to be implemented by default?
> 
> Nope because when the SCMI client requires a polling transaction, beside
> the busy-looping on the channel BUSY bit on its side, it will also ask
> for a polling transaction by setting the proper bit in the channel flags
> (as detailed in spec chap 5.1) of the shmem area, so that the server will
> know that the completion IRQ, even if existing is NOT required for that
> transaction: in any case, polling or not, the server will clear the channel
> BUSY bit when it is done, since that is the mechanism that is used by the
> client and server to pass back and forth the 'ownership' of the channel.
> [BIG RED WARNING: I never remember if the channel bit I blabbed about
> above is defined as a BUSY bit or as a FREE bit....so the logic could be
> reversed.]
> 
> The above backtrace is only for the IRQ path...if you are polling, you
> are indeed busy looping inside scmi_wait_for_reply() and if you dont get
> a timeout (or an out of order invalid reply) the reply if picked up at
> the end of the poll loop by a fetch_reponse() inside that same
> wait_for_reply....since polling anyway is meant to pick up Delayed
> Responses OR notification that does on the P2A channel (RX in the SCMI
> stack) which needs to have an associated IRQ to work at all.
> 
> Beside your scenario of a system lacking a completion IRQ, polling
> transactions can be asked for a speficic SCMI command at will like to
> issue a clock_enable(), so polling is generally working as far as I know
> and tested....you can easily test that also by just enaboing
> force_polling flag in the transport desc (for testing/debug purposes
> only)

I'm sorry, I didn't phrase the question very well. Polling does work as
expected, and it was clearly proven by the fact that I actually got the
SCMI communication working with the patch I sent initially - because it
was forcing the polling mode (even though the patch itself was wrong
and had no practical value).

The backtrace I mentioned is clearly for the IRQ path. What I was
getting at was that it seems to be the only way for the completion to
be completed. And since in the default configuration (with the default
flags) scmi_wait_for_reply() takes the second branch, where it waits
for that completion, it means that by default the completion IRQ needs
to exist for the kernel to know that the transfer has been replied to.

What I really meant to ask was this: does it mean that in the default
configuration (with the default flags) the completion IRQ is required?
And I think you already answered, indirectly :)

> > Now I get that the mailbox layer has no knowledge of the client layer
> > (SCMI in this case) and that the txACK IRQ in the mailbox layer is not
> > meaningful or useful to the SCMI layer. But in my (uneducated) opinion,
> > there should still be a way to set the no_completion_irq flag in struct
> > scmi_chan_info to true for the mailbox transport, for the case when the
> > SCMI server does not implement the completion notification. Perhaps
> > through the device tree?
> 
> As I said, the flag is NOT settable, since it was a scenario that was
> never needed....but you are right.. you will have to somehow set that
> flag to handle this kind of setup.
> 
> Now, I was hoping to be able to somehow handle this with the existing
> bindings, because if you look at 'mboxes' definition in:
> 
>   Documentation/devicetree/bindings/firmware/arm,scmi.yaml
> 
> ..you will see that we try to handle all the possible combinations of
> mailbox controller types implicitly by deducing the kind of controller
> by the combination of the number of defined shmem/mbox. (this is because
> unfortunately we cannot rely on naming for backward compatibility issue
> since mbox names was NOT required prop from the start).

That makes perfect sense, and the description in arm,scmi.yaml (which I
should have looked at in the first place) was a real eye opener. Thanks
for that!!

> So, I would have loved to be able to deduce in mailbox_chan_setup() that
> in a specific configuration there is no completion interrupt and set
> accordingly the no_completion_irq flag using the existing bindings....
> 
> ...but seems to me that it is not possible since there is the assumption
> that unidirectional mailbox are always used in pair to provide a completion
> IRQ....and we run out of possibilities to exploit this implicit info
> (again because we address the most common case with the current bindings)

True, but maybe the assumption is wrong. In the scenario I'm describing
there are 2 shmem areas and only one unidirectional mailbox for Tx. The
Rx mailbox does not exist because there is no completion IRQ, and the
only way you can implement a Rx mailbox is through an IRQ. So in theory
you could infer this configuration from a "1 mbox / 2 shmem" setup,
which is not one of the currently supported combinations. But it would
probably not scale well for the more general case.

> At the end, yes I think we should be able to describe this particular HW
> configuration somehow, so a new binding could be the answer, since this
> is exactly HW description, the tricky part would be to add something
> that can cope well with all the above existing implict combinations
> (maybe is not a great issue eh .... I have not though about it...)
> 
> ...BUT...before all of this, just to be paranoid, I would check to make
> sure that your HW is really unidirectional in nature and not that, instead,
> is a bidirectional mailbox controller, where the SCMI server is really not
> taking care to ring the doorbell back ? (which usually is the same
> doorbell used in the client->server direction)

I think you nailed it. Looking at the mailbox driver implementation
(which actually has an IRQ handler that calls mbox_chan_received_data)
and the way the SCMI configuration is described in the device tree, I
think that's exactly the case. And the completion IRQ doesn't work
either because the SCMI server doesn't trigger it or there is a problem
with the IRQ plumbing along the way.

Unfortunately, because the mailboxes can be either unidirectional or
bidirectional and they are not necessarily mapped 1:1 to the SCMI
channels, it can be very confusing for the uninitiated eye (like
myself). If I had known everything I know now, it would have been
pretty obvious from the beginning that there was something wrong with
the SCMI server (or completion IRQ routing).

> (I ignore which server implementation you use...so just to make sure...)

Unfortunately the mailbox driver is not upstream (which is always a sad
thing), and that's why I didn't mention what it was. It's also based on
a 5.10.x kernel but simple enough that it compiles cleanly with the
latest upstream. With everything I have learned, now I can tell that
ironically the driver itself seems to be OK. For what it's worth, the
BSP kernel where this comes from includes an ugly hack that changes the
initialization of xfer->hdr.poll_completion to true instead of false to
force polling mode and cope with the broken completion IRQ :-/

> > Now it's my turn to apologize for a long email :) Hopefully it makes
> > sense and it's clear now what problem I'm trying to solve. Thanks again
> > for all the input provided so far, and I'm looking forward to some more
> > answers.
> 
> ... better stop apologizing here and now, given the average payload size
> of these recent mail exchanges :P

Fair enough :) Instead, let me thank you, again, for your time and
patience, and for all the detailed information you have provided.

-- 
Best regards,
Radu




More information about the linux-arm-kernel mailing list