[ARM IOMMU] IOMMU framework concurrency issue

Robin Murphy robin.murphy at arm.com
Wed Oct 18 08:34:20 PDT 2023


On 2023-10-17 17:33, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 07:10:23PM +0800, Zhenhua Huang wrote:
>> Dear experts,
>>
>> Saw a few crashes in our projects because of concurrency between (1) and
>> (2).
>>
>> bus notifier or bus_iommu_probe:
>> 	__iommu_probe_device
>> 		acquire iommu_probe_device_lock only
>> 		iommu_init_device()
>> 			//touch dev->iommu (1)
>> 			dev_iommu_get()
>> 			->probe_device()
>>
>> Client device probing path:
>> of_dma_configure			
>> 	of_iommu_configure
>> 		//touch dev->iommu  (2)
>> 		...
>>
>>
>> We already have 01657bc14a39 ("iommu: Avoid races around device probe") and
>> the big comment in __iommu_probe_device() refers to adopt device_lock()
>> further. Notice your big effort to utilize it, and IMO it can address above
>> issue(which protects dev->iommu):
> 
> I think something else has gone wrong here, you should not be able to
> get to any really_probe() before the iommu side has done its part.
> 
> Even with proper device locking the poor device that is racing isn't
> going to work properly as the IOMMU won't be guarenteed to be
> consistently configured.
> 
> This seems like you need to resolve boot time ordering in your
> platform? (I don't know exactly how ARM works here though)
> 
> eg make sure the iommu driver is fully registered before allowing any
> concurrent probes. Once the iommu driver is registered it will be able
> to catch the bus notifiers and serialize things properly.

Ugh, I think I see at least how this happens for device which *don't* 
have an IOMMU - because iommu_init_device() has to transiently allocate 
dev->iommu in order to call ops->probe_device in order to discover that 
the device doesn't actually have an IOMMU (and thus free dev->iommu 
again). That still leaves a window where dev_iommu_fwspec_get() from 
elsewhere could return a pointer which becomes a UAF bomb, I guess 
b54240ad4943 didn't close it completely.

However I think my bus ops series might also happen to fix this, since 
with that we shouldn't get as far as that dev_iommu_get() unless we 
found ops which we can expect to be valid for the given device, so we 
should no longer be doing the allocate/free cycle except in the rare 
case that ->probe_device() suffers an unexpected genuine failure.

Thanks,
Robin.



More information about the linux-arm-kernel mailing list