[PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

Nate Watterson nwatters at codeaurora.org
Mon Jan 30 06:54:46 PST 2017


On 2017-01-30 09:38, Will Deacon wrote:
> On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote:
>> On 1/30/2017 9:23 AM, Nate Watterson wrote:
>> > On 2017-01-30 08:59, Sinan Kaya wrote:
>> >> On 1/30/2017 7:22 AM, Robin Murphy wrote:
>> >>> On 29/01/17 17:53, Sinan Kaya wrote:
>> >>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote:
>> >>>>> [+hanjun, tomasz, sinan]
>> >>>>>
>> >>>>> It is quite a key patchset, I would be glad if they can test on their
>> >>>>> respective platforms with IORT.
>> >>>>>
>> >>>>
>> >>>> Tested on top of 4.10-rc5.
>> >>>>
>> >>>> 1.    Platform Hidma device passed dmatest
>> >>>> 2.    Seeing some USB stalls on a platform USB device.
>> >>>> 3.    PCIe NVME drive probed and worked fine with MSI interrupts after boot.
>> >>>> 4.     NVMe driver didn't probe following a hotplug insertion and received an
>> >>>> SMMU error event during the insertion.
>> >>>
>> >>> What was the SMMU error - a translation/permission fault (implying the
>> >>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell
>> >>> the SMMU about the device at all)?
>> >>>
>> >>
>> >> root at ubuntu:/sys/bus/pci/slots/4# echo 0 > power
>> >>
>> >> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0
>> >> [  204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down
>> >> [  204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event
>> >> ignored; already powering off
>> >>
>> >> root at ubuntu:/sys/bus/pci/slots/4#
>> >>
>> >> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8
>> >> [  254.820599] nvme nvme0: pci function 0003:01:00.0
>> >> [  254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002)
>> >> [  261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received:
>> >> [  261.948561] arm-smmu-v3 arm-smmu-v3.0.auto:  0x000001000000000a
>> >> [  261.948563] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
>> >> [  261.948564] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
>> >> [  261.948566] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
>> 
>> > Looks like C_BAD_CD. Can you please try with:
>> > iommu/arm-smmu-v3: Clear prior settings when updating STEs
>> 
>> This resolved the issue. Can we pull Nate's patch to 4.10 so that I 
>> don't see
>> this issue again.
> 
> I already sent the pull request to Joerg for 4.11. Do you see this 
> problem
> without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need 
> to
> send the patch to stable after -rc1.
Using vanilla mainline, I see it most commonly when directly assigning
a device to a guest machine. I think I've also seen it after removing 
then
re-adding a PCI device. Basically anytime an STE's CTX pointer is 
changed
from a non-NULL value and STE[CFG] indicates translation will be 
performed.

Nate
> 
> Will

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux
Foundation Collaborative Project.



More information about the linux-arm-kernel mailing list