[ARM IOMMU] IOMMU framework concurrency issue
Robin Murphy
robin.murphy at arm.com
Wed Oct 18 08:34:20 PDT 2023
On 2023-10-17 17:33, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 07:10:23PM +0800, Zhenhua Huang wrote:
>> Dear experts,
>>
>> Saw a few crashes in our projects because of concurrency between (1) and
>> (2).
>>
>> bus notifier or bus_iommu_probe:
>> __iommu_probe_device
>> acquire iommu_probe_device_lock only
>> iommu_init_device()
>> //touch dev->iommu (1)
>> dev_iommu_get()
>> ->probe_device()
>>
>> Client device probing path:
>> of_dma_configure
>> of_iommu_configure
>> //touch dev->iommu (2)
>> ...
>>
>>
>> We already have 01657bc14a39 ("iommu: Avoid races around device probe") and
>> the big comment in __iommu_probe_device() refers to adopt device_lock()
>> further. Notice your big effort to utilize it, and IMO it can address above
>> issue(which protects dev->iommu):
>
> I think something else has gone wrong here, you should not be able to
> get to any really_probe() before the iommu side has done its part.
>
> Even with proper device locking the poor device that is racing isn't
> going to work properly as the IOMMU won't be guarenteed to be
> consistently configured.
>
> This seems like you need to resolve boot time ordering in your
> platform? (I don't know exactly how ARM works here though)
>
> eg make sure the iommu driver is fully registered before allowing any
> concurrent probes. Once the iommu driver is registered it will be able
> to catch the bus notifiers and serialize things properly.
Ugh, I think I see at least how this happens for device which *don't*
have an IOMMU - because iommu_init_device() has to transiently allocate
dev->iommu in order to call ops->probe_device in order to discover that
the device doesn't actually have an IOMMU (and thus free dev->iommu
again). That still leaves a window where dev_iommu_fwspec_get() from
elsewhere could return a pointer which becomes a UAF bomb, I guess
b54240ad4943 didn't close it completely.
However I think my bus ops series might also happen to fix this, since
with that we shouldn't get as far as that dev_iommu_get() unless we
found ops which we can expect to be valid for the given device, so we
should no longer be doing the allocate/free cycle except in the rare
case that ->probe_device() suffers an unexpected genuine failure.
Thanks,
Robin.
More information about the linux-arm-kernel
mailing list