[PATCH v2 00/11] Drivers for gunyah hypervisor

Tue Aug 9 17:07:39 PDT 2022

On 8/9/2022 6:13 AM, Robin Murphy wrote:
> [drive-by observation since one thing caught my interest...] >

Appreciate all the comments.

Jassi,

I understood you have talked with some of our folks (Trilok and Carl) a
few years ago about using the mailbox APIs. We were steered away from
using mailboxes then. Is that still the recommendation today?

> On 2022-08-09 00:38, Elliot Berman wrote:
>>> I might be completely wrong about this, but if my in-mind picture of 
>>> Gunyah is correct, I'd have implemented the gunyah core subsytem as 
>>> mailbox provider, RM as a separate platform driver consuming these 
>>> mailboxes and in turn being a remoteproc driver, and consoles as 
>>> remoteproc subdevices. >
>>
>> The mailbox framework can only fit with message queues and not 
>> doorbells or vCPUs.
> 
> Is that so? There was a whole long drawn-out saga around the SCMI 
> protocol using the Arm MHU mailbox as a set of doorbells for 
> shared-memory payloads, but it did eventually get merged as the separate 
> arm_mhu_db.c driver, so unless we're talking about some completely 
> different notion of "doorbell"... :/
> 

Doorbells will be harder to fit into mailbox API framework.

  - Simple doorbells don't have any TX done acknowledgement model at
    the doorbell layer (see bullet 1 from 
https://lore.kernel.org/all/68e241fd-16f0-96b4-eab8-369628292e03@quicinc.com/).
    Doorbell clients might have a doorbell acknowledgement flow, but the
    only client I have for doorbells doesn't. IRQFDs would send an
    empty message to the mailbox and immediately do a client-triggered
    TX_DONE.

  - Using mailboxes for the more advanced use-case doorbell forces client
    to use doorbells a certain way because each channel could be a bit on
    the bitmask, or the client could have complete control of the entire
    bitmask. I think implementing the mailbox API would force the
    otherwise-generic doorbell code to make that decision for clients.

Further, I wanted to highlight one other challenge with fitting Gunyah
message queues into mailbox API:

  - Message queues track a flag which indicates whether there is space
    available in the queue. The flag is returned on msgq_send. When the
    message queue is full, an interrupt is raised when there is more
    space available. This could be used as a TX_DONE indicator, but
    mailbox framework's API prevents us from doing mbox_chan_txdone
    inside the send_data channel op.

I think this might be solvable by adding a new txdone mechanism.

>> The mailbox framework also relies on the mailbox being defined in the 
>> devicetree. RM is an exceptional case in that it is described in the 
>> devicetree. Message queues for other VMs would be dynamically created 
>> at runtime as/when that VM is created. Thus, the client of the message 
>> queue would need to "own" both the controller and client ends of the 
>> mailbox.
> 
> FWIW, if the mailbox API does fit conceptually then it looks like it 
> shouldn't be *too* hard to better abstract the DT details in the 
> framework itself and allow providers to offer additional means to 
> validate channel requests, which might be more productive than inventing 
> a whole new thing. >
Some notes about fitting mailboxes into Gunyah IPC:

  - A single mailbox controller can't cover all the gunyah devices. The
    number of gunyah devices is not fixed and varies per VM launched.
    Mailbox controller would need to be per-VM or per-device, where each
    channel represents a capability.

  - The other device types (like vCPU) don't fit into message-based
    style framework. I'd like to have a consistent way of binding a
    device's function with the device. If we use mailbox API, some
    devices will use mailbox and others will use some other mechanism.
    I'd prefer to consistently use "some other mechanism" throughout.

  - TX and RX message queues are independent and "combining" a TX and RX
    message queue happens at client layer by the client requesting access
    to two otherwise unassociated message queues. A mailbox channel would
    either be associated with a TX message queue capability or an RX
    message queue capability. This isn't a major hurdle per se, but it
    decreases how cleanly we can use the mailbox APIs IMO.
      - A VM might only have a TX message queue and no RX message queue,
        or vice versa. We won't be able to require coupling a TX and RX
        message queue for the mailbox.

  - TX done acknowledgement doesn't fit Gunyah IPC (see above) and a new
    TX_DONE mode would need to be implemented.

  - Need to make it possible for a client to binding a mailbox channel
    without DT.

I'm getting a bit apprehensive about the tweaks needed to make mailbox
framework usable for Gunyah. Will there be enough code re-use and help
with abstracting the direct-to-Gunyah APIs? IMO, there isn't, but
opinions are welcome :)

Thanks,
Elliot