[PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

Jason Gunthorpe jgg at nvidia.com
Mon Aug 21 05:51:13 PDT 2023


On Mon, Aug 21, 2023 at 12:06:40PM +0100, Robin Murphy wrote:
> On 2023-08-18 22:32, Jason Gunthorpe wrote:
> > It is subtle and was never documented or enforced, but there has always
> > been an assumption that of_dma_configure_id() is not concurrent. It makes
> > several calls into the iommu layer that require this, including
> > dev_iommu_get(). The majority of cases have been preventing concurrency
> > using the device_lock().
> > 
> > Thus the new lock debugging added exposes an existing problem in
> > drivers. On inspection this looks like a theoretical locking problem as
> > generally the cases are already assuming they are the exclusive (single
> > threaded) user of the target device.
> 
> Sorry to be blunt, but the only problem is that you've introduced an
> idealistic new locking scheme which failed to take into account how things
> currently actually work, and is broken and achieving nothing but causing
> problems.

That's pretty dramatic. I would have prefered this series have some
more testing before Joerg took it, but this is certainly not
"achieving nothing but causing problems". Introducing a real scheme
for how locking is supposed to work here is going to cause some
strain.  We've had a long period where the lack of any locking rules
has yielded a big mess held together with hope and dreams.

Of course there will be crazy stuff. If these 13 drivers are the only
problem then we've done pretty well.

And at the end we get *actual rules about how locking works* Wow!
Certainly not nothing.

What I want to hear from you is a concrete reason why device_lock() is
the *wrong* lock here. I can't think of any reason why we can't obtain
the device_lock in all the places that want to probe the iommu driver.

Nor, can I see a reason why it would be a bad choice of lock after all
the dma_configure logic is reworked someday.

> When their sole purpose was to improve the locking and make it
> easier to reason about, and now the latest "fix" is now to remove
> one of the assertions which forms the fundamental basis for that
> reasoning, then the point has clearly been lost.

I do not want to remove the assertion, I think the assertion should
stay so people running debug kernels on these drivers can get warnings
about the existing problems in these drivers.

It is removed mostly because we are at rc7, otherwise I'd play wack a
mole adding the device_lock and a nasty comment to the drivers. We can
tackle that in the next cycle and put the assertion back.

> All we've done is churned a dodgy and incomplete locking scheme into

Well, at least we agree what we have today is not great.

> And on the subject of idealism, the fact is that doing IOMMU configuration
> based on driver probe via bus->dma_configure is *fundamentally wrong* and
> breaking a bunch of other IOMMU API assumptions, so it is not a robust
> foundation to build anything upon in the first place. 

So now that Joerg has dropped it - what is your big idea to make the
locking actually work right?

Jason



More information about the ath10k mailing list