[ARM IOMMU] IOMMU framework concurrency issue

Zhenhua Huang quic_zhenhuah at quicinc.com
Fri Oct 20 01:39:17 PDT 2023



On 2023/10/19 23:15, Jason Gunthorpe wrote:
> On Thu, Oct 19, 2023 at 04:21:47PM +0800, Zhenhua Huang wrote:
>> In above time window (1), dev->iommu allocated but not freed, if it's just
>> accessed by client device's probing(Thread 1)..     crash happens.
>>
>> I also want to mention from our side, it's *not only seen for non-iommu*
>> device.
> 
> I see. So there are only two locks we currently have that could
> resolve that - the device_lock or the iommu_probe_device_lock
> 
> The device_lock path is what my prior series did, in your case you
> already have the device_lock on of_dma_configure_id() so you just need
> it on the bus path.

Exactly.

> 
> The iommu_probe_device_lock is what I guess Robin was thinking of with
> the of_xlate rework.. I looked a little and it seems like quite a
> thorny problem.. The ARM SMMU drivers seem to model the correct design
> using the iommu_fwspec to pass data from of_xlate to probe, while a
> whole bunch of other drivers decided to put the first half of their
> probe functions into of_xlate!
> 
> To untangle this to use the iommu_probe_device_lock the of_xlate would
> have to stop using the struct dev (ie so it cannot touch the
> dev->iommu any more) and all the dev->iommu touches in the of/acpi
> code reorganized into function arguments to iommu_probe which would
> then store them into the dev->iommu under the lock. Ie stop using
> dev->iommu as some temporary scratch pad to shuffle data around prior
> to probing. Pass iommu_fwspec as an arg to iommu_probe and a new
> ops->probe_fwspec. Stop calling dev_iommu_priv_set from of_xlate ops.
> 
>> Patch seems good to me and in theory can cover the case I have met. I tested
>> below based on 6.6-rc1 for sanity with minor changes(clean up tags etc). If
>> you're OK I want to propagate into our tree and to see if it fixes issue?
> 
> I don't think this patch can solve the races with of_xlate vs probe,
> that is just wrongly locked.
> 
> I assume my series you linked to comprehensively fixes this?
> 

Yeah, I thought your previous series can comprehensively cover it. But 
it seems based on your discussion with Robin, it needs fix for all wonky 
drivers and I indeed see lots of drivers directly call of_dma_configure 
etc... Instead, Shall we firstly adopt the lockless solution?

1. It actually closed concurrency window for non-iommu device
2. for iommu device, I checked for arm smmu and haven't seen concurrency 
case as well?

But it also needs your expert view if there is any impact from other 
smmu drivers :) For our case, it looks like to be working well..

Thanks,
Zhenhua

> FWIW I don't have the energy to try and fix all the wonky drivers
> properly so I don't plan to revisit it.
> 
> Jason



More information about the linux-arm-kernel mailing list