[PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
Ulf Hansson
ulf.hansson at linaro.org
Tue May 5 07:15:49 PDT 2026
On Tue, 5 May 2026 at 14:46, Danilo Krummrich <dakr at kernel.org> wrote:
>
> On Tue May 5, 2026 at 1:12 PM CEST, Ulf Hansson wrote:
> > On Wed, 22 Apr 2026 at 12:59, Danilo Krummrich <dakr at kernel.org> wrote:
> >>
> >> On Wed Apr 22, 2026 at 12:07 PM CEST, Ulf Hansson wrote:
> >> > On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr at kernel.org> wrote:
> >> >> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> >> >> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
> >> >> > if (dev->state_synced)
> >> >> > return;
> >> >> >
> >> >> > + if (dev->driver && dev->driver->queue_sync_state)
> >> >> > + dev->driver->queue_sync_state(dev);
> >> >>
> >> >> This seems to be called without the device lock being held, which seems to allow
> >> >> the queue_sync_state() callback to execute concurrently with remove(). This
> >> >> opens the door for all kinds of UAF conditions in drivers.
> >> >
> >> > If that were the case, this whole function would be unsafe even before
> >> > this change. I assume this isn't because of how the function is being
> >> > called, but I may be wrong.
> >>
> >> This function does not issue any driver callbacks intentionally; the existing
> >> sync_state() callback is deferred to device_links_flush_sync_list(), which is
> >> called without the device_links_write_lock() held, but takes the device_lock()
> >> to protect against other concurrent driver callbacks, such as remove().
> >>
> >> I.e. we can't take the device_lock() when the device_links_write_lock() is held,
> >> as it would be prone to lock inversion.
> >
> > You are right, the device_lock() is needed in
> > device_links_flush_sync_list() to protect against concurrent driver
> > callbacks from being invoked, such as the ->remove().
> >
> > However, as part of the removal procedure in
> > __device_release_driver(), we also need to account for device links.
> > Hence we call device_links_busy() which takes the
> > device_links_write_lock().
>
> I don't think this is sufficient.
>
> I wrapped my head around this about three weeks ago and I think there were
> multiple cases where this falls apart. I think one case was:
>
> 1. Consumer binds, supplier is added to deferred_sync. Consumer unbinds,
> link goes to AVAILABLE.
>
> 2. device_links_busy(supplier) takes and releases device_links_write_lock(),
> finds no ACTIVE consumers, returns false -- drv->remove() starts without
> device_links_write_lock() held.
>
> 3. device_links_supplier_sync_state_resume() takes device_links_write_lock(),
> finds supplier still on deferred_sync -- device_links_driver_cleanup()
> hasn't run yet, blocked on the lock. queue_sync_state() is called
> concurrently with drv->remove().
Thanks for pointing this out! More problems seem to exist regarding the above.
What if drv->remove(supplier) completes before
device_links_supplier_sync_state_resume() is called and the removed
device is still in the deferred_sync list? Probably another
get|put_device() is needed to protect us from UAF.
>
> But again, I think that's what I concluded three weeks ago and this state
> machine is rather tricky.
Indeed it is.
>
> That said, *even if* we could prove that this works in all cases, it would be
> very subtle, pretty fragile and not by design.
>
> The whole state machine around sync state is already rather complicated. So, if
> we really want to make it an invariant that it is valid to call bus device
> callbacks without holding the device lock from
> __device_links_queue_sync_state(), I think this has to be carefully reflected by
> the overall design making it *very explicit* how this invariant holds up in all
> cases.
Right.
I experimented with adding the devices for ->queue_sync_state() into a
separate list and then flushing that list similarly to how the
sync_state list is flushed. It becomes messy, but I guess there is no
other way.
Perhaps Saravana has another suggestion for how to implement this?
>
> Otherwise, I'd be rather concerned that this becomes a source of subtle bugs.
Kind regards
Uffe
More information about the linux-arm-kernel
mailing list