[RFC] iommu: arm-smmu: stall support
robdclark at gmail.com
Mon Sep 18 05:11:47 PDT 2017
On Mon, Sep 18, 2017 at 7:13 AM, Jean-Philippe Brucker
<jean-philippe.brucker at arm.com> wrote:
> Hi Rob,
> On 14/09/17 20:44, Rob Clark wrote:
>> Adds a new domain property for iommu clients to opt-in to stalling
>> with asynchronous resume, and for the client to determine if the
>> iommu supports this.
>> Current motivation is that:
>> a) On 8x96/a530, if we don't enable CFCFG (or HUPCF) then non-
>> faulting translations which are happening concurrently with
>> one that faults, fail (or return garbage), which triggers all
>> sorts of fun GPU crashes, which generally have no relation
>> to the root fault. (The CP can be far ahead in the cmdstream
>> from the other parts of the GPU...)
> Would the GPU driver always enable stall for this implementation? Or only
> enable it for specific domains?
I expect for all domains. (Currently that is just a single domain,
but I expect that to change)
> Instead of enabling it at domain level, I wonder if couldn't be left
> entirely to the SMMU driver. I have a proposal (that I'll publish shortly)
> for adding a "can-stall" attribute to device-tree nodes, telling the SMMU
> driver that the device can withstand stalled transactions without locking
> up the system.
> The SMMU would then enable stall for this particular device without
> needing approval from the device driver. I'm doing this for v3, which has
> a more mature stall model, but I suppose we can do the same for v2 as well.
The GPU driver does need to know if stalling is supported/enabled by
the iommu driver (since depending on SoC, drm/msm works with one of
three different iommu drivers currently), and to be in control of
resume.. I'm a bit sceptical about trying to abstract too much at the
For example when the gpu gets a fault, it tends to get 1000s of
faults. On the first fault, I want to kick things off to a wq where I
can snapshot the cmdstream and gpu state. But subsequent faults on
the same submit I ignore.
Btw, apologies that I haven't sent the corresponding drm/msm patches
yet. I haven't had a chance to clean up yet, but you can find
something rough here:
> In any case, the firmware has to tell the OS that a device is capable of
> stalling, because it is unlikely that many platform devices will
> gracefully handle this mode.
>> b) I am working on a debugfs feature to dump submits/batches
>> that cause GPU hangs, and I would like to also use this for
>> faults. But it needs to run in non-atomic context, so I
>> need to toss things off to a workqueue, and then resume
>> the iommu after it finishes.
> Are you relying on stalled transaction to freeze the GPU state and
> allow for introspection? I suppose the debug code would always terminate
> after recording the fault? I'm just trying to get a picture of all
> possible users of a common fault API.
yes, this is what I'm doing now. For SVM, however, we'd retry the
transaction instead of terminating.
>> c) (and ofc at some point in the future for SVM we'd like to
>> be able to pin unpinned pages and things like that, in
>> response to faults.)
> For SVM there will be generic code calling into the mm code to pin pages
> and resume the SMMU. We are working on consolidating this with other
> IOMMUs at the moment and use generic code where possible. Ideally the GPU
> driver shouldn't need to get involved.
> That new API will be based on PASIDs/SSIDs, which doesn't exist in SMMUv2.
> I do believe that we also need to consolidate the API for devices and
> IOMMUs that support page faults but not PASIDs. We could use a common
> fault workqueue in the IOMMU core.
I've no idea qcom's plans for future hw, but pretty sure we are going
to want to implement SVM on v2 iommu, without PASIDs/SSIDs. However
on current hw, there is really only one userspace process active on
the gpu at a time, so we don't really need PASIDs/SSIDs.
> It seems like your use-case (b) could fit in there. If the device driver
> didn't bind to a process but instead registered a fault handler, then we
> could ask it to do something with the fault. And since it's in a wq, the
> call to device driver would be synchronous and we'd pass the return status
> (retry/terminate) to the SMMU.
> This is probably easier to handle than a separate "resume" callback,
> especially with SMMUv3 stall and PRI, where faults are out of order and
> contain a token identifying a fault.
IIRC Will or Robin mentioned wanting a token in earlier stall
discussion.. although not being familiar with v3 I wasn't quite sure
what the use was.
At any rate, adding a token to fault handler callback and
iommu_domain_resume() seems like something that could be added later,
when it is needed.
Anyways, I am interested to see what your proposal is.. although tend
to be a bit sceptical about trying to abstract too much. (And if
there should be something more abstract, maybe it should be at the
dma-mapping layer instead?)
I don't suppose you or someone working on this from ARM side will be
at linaro connect in a couple weeks? Jordan and myself will be there,
and it could be a good time to chat about this, and also per-process
pagetables and gpu switching switching pagetables directly on v2 hw.
>> - For RFC I thought it would be easier to review the idea
>> as a single patch, but it should be split into separate
>> core and arm-smmu parts
>> - I vaguely remember someone (Will?) mentioning that there
>> could be cases with multiple masters sharing a single
>> context bank, and somehow stalling might not work in that
>> case? (How does that even happen, arm-smmu assignes the
>> context banks? Maybe I'm mis-remembering the details.)
>> I think that this probably shouldn't effect the API parts
>> of this RFC, the iommu driver should already know about
>> all the devices that might attach because of ->attach_dev()
>> so it could fail in _set_attr()?
> With VFIO, userspace can decide to put multiple device in the same domain.
> But attach_dev can fail if there are device incompatibilities, and then
> VFIO will use a new domain. It might become relevant with vSVM, forwarding
> recoverable faults from guest-assigned devices.
More information about the linux-arm-kernel