SMMU driver and stall vs terminate mode

Will Deacon will.deacon at arm.com
Tue Jun 21 02:42:37 PDT 2016


On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
> On 20/06/16 16:28, Stuart Yoder wrote:
> >Right now the SMMU driver is hardcoded to configure 'stall' mode for
> >context faults:
> >
> >       /* SCTLR */
> >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> >
> >We are running into an issue with a device where it seems behave sanely
> >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> >unaware that an access violation occurred.
> 
> Does the device keep issuing transactions after the initial faulting one, by
> any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
> with backports to some horrible Android kernel), and I think we concluded
> that there's an inherent race window between writing RESUME and acking the
> interrupt in which MMU-500 can process another faulting transaction and
> reassert the IRQ without Linux realising, which then gets lost and things go
> out of whack.

Do we not detect this with the MULTI bit in the FSR?

> >Is there really some assumption that all devices that send transcactions
> >through the SMMU _must_ be able to handle stall mode?  I am trying to
> >find out from our hw designers what is going on at the signal level for
> >the device in question, but it seems to me that 'terminate' mode exists
> >for a reason and I wonder what your thoughts are about providing a
> >configuration option to allow configuration of terminate mode if a particular
> >SoC requires it.
> 
> Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
> stalling anyway), but I recall Will having a fairly reasonable-sounding
> argument in favour, which I now can't remember the details of. Hopefully he
> might remind us, unless his conference is too enthralling.

Given that we don't do anything particularly useful in the context fault
handler, I also wouldn't object to turning this off (and removing the
retry/reporting machinery). However, I'd want t better description of
*why* it's causing problems first, so that we can justify the decision
in case anybody is using this out of tree.

If we did make the thing configurable, would that be another command line
option, or something in DT? What about ACPI?

Will



More information about the linux-arm-kernel mailing list