[ARM IOMMU] IOMMU framework concurrency issue

Zhenhua Huang quic_zhenhuah at quicinc.com
Wed Oct 18 07:27:53 PDT 2023


Thanks Jason for your reply.

On 2023/10/18 0:33, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 07:10:23PM +0800, Zhenhua Huang wrote:
>> Dear experts,
>>
>> Saw a few crashes in our projects because of concurrency between (1) and
>> (2).
>>
>> bus notifier or bus_iommu_probe:
>> 	__iommu_probe_device
>> 		acquire iommu_probe_device_lock only
>> 		iommu_init_device()
>> 			//touch dev->iommu (1)
>> 			dev_iommu_get()
>> 			->probe_device()
>>
>> Client device probing path:
>> of_dma_configure			
>> 	of_iommu_configure
>> 		//touch dev->iommu  (2)
>> 		...
>>
>>
>> We already have 01657bc14a39 ("iommu: Avoid races around device probe") and
>> the big comment in __iommu_probe_device() refers to adopt device_lock()
>> further. Notice your big effort to utilize it, and IMO it can address above
>> issue(which protects dev->iommu):
> 
> I think something else has gone wrong here, you should not be able to
> get to any really_probe() before the iommu side has done its part.

Regarding "iommu side" Are you talking about client dev->iommu set up or 
arm-smmu driver probing? should be the latter? It seems not 
guaranteed... because 1st we have an option to disable device link check 
which creates dependency, 2nd also, for example, if we call 
of_platform_populate() in ancestor node to add child node device.. 
Ancestor node doesn't know if the dependency of child not is met.

BTW, Even for non-iommu binding client device which having no iommu 
dependency at all, the issue was also reported:
Client device's probing:
of_iommu_configure  ---Thread 1
	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
	if (fwspec) {
		 if (fwspec->ops)
			...

IOMMU probing:
iommu_device_register ---Thread 2
	.. bus_iommu_probe(iommu_buses[i]);
		__iommu_probe_device
			iommu_init_device
				dev_iommu_get
				... *time window of concurrency*  ------(1)
				dev_iommu_free

In above time window (1), dev->iommu allocated but not freed, if it's 
just accessed by client device's probing(Thread 1).. 	crash happens.

Thanks,
Zhenhua

> 
> Even with proper device locking the poor device that is racing isn't
> going to work properly as the IOMMU won't be guarenteed to be
> consistently configured.
> 
> This seems like you need to resolve boot time ordering in your
> platform? (I don't know exactly how ARM works here though)
> 
> eg make sure the iommu driver is fully registered before allowing any
> concurrent probes. Once the iommu driver is registered it will be able
> to catch the bus notifiers and serialize things properly. >
> Jason



More information about the linux-arm-kernel mailing list