[PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
Nicolin Chen
nicolinc at nvidia.com
Tue May 19 11:29:23 PDT 2026
On Tue, May 19, 2026 at 09:07:37AM -0300, Jason Gunthorpe wrote:
> On Mon, May 18, 2026 at 08:38:54PM -0700, Nicolin Chen wrote:
> > +void iommu_report_device_broken(struct device *dev)
> > +{
> > + struct group_device *gdev;
> > +
> > + /*
> > + * We cannot hold group->mutex here. Rely on iommu_group_broken_worker()
> > + * to validate dev_has_iommu(). The iommu_group memory is RCU-protected
> > + * via kfree_rcu() in iommu_group_release(), and group->devices is an
> > + * RCU-protected list, so the lookup runs entirely under rcu_read_lock.
> > + *
> > + * Note the device might have been concurrently removed from the group
> > + * (list_del_rcu) before iommu_deinit_device() cleared the dev->iommu.
> > + */
> > + rcu_read_lock();
> > + gdev = __dev_to_gdev_rcu(dev);
> > + if (gdev) {
>
> If this is why the RCU is being added it seems like overkill.
>
> Just add the worker to struct dev_iommu and push it there so it can
> use a mutex but I'm confused why are we even adding this function?
>
> The entire design of this series was supposed to have the IOMMU driver
> itself adjust it's "STE" to inhibit translated TLPs synchronosly
> within its fully locked invalidation loop.
Yes. Surgical STE is done in the driver. But, core-level attaching
state doesn't reflect correctly. So the driver calls this function
to notify the core (this is in an invalidation context -- not able
to use mutex).
> Whats the async worker for?
Then, the core needs to block the device using the similar routine
to the reset prepare(). And that needs to hold group->mutex, so it
needs an async worker.
Do you see a much simpler way?
Thanks
Nicolin
More information about the linux-arm-kernel
mailing list