[PATCH 1/2] liveupdate: Reference count outgoing FLB data
David Matlack
dmatlack at google.com
Mon Jun 8 16:37:12 PDT 2026
On 2026-06-08 04:19 PM, Pratyush Yadav wrote:
> On Tue, Jun 02 2026, David Matlack wrote:
>
> > On 2026-06-02 07:15 PM, Pratyush Yadav wrote:
> >> Hi David,
> >>
> >> On Thu, May 28 2026, David Matlack wrote:
> >>
> >> > Increment the outgoing FLB refcount in liveupdate_flb_get_outgoing() so
> >> > that the FLB structure cannot be freed while the caller is actively
> >> > using it. Add an additional liveupdate_flb_put_outgoing() function so
> >> > the caller can explicitly indicate when it is done using the outgoing
> >> > FLB.
> >> >
> >> > During a Live Update, the kernel may need to fetch the outgoing FLB
> >> > outside of the scope of a file handler's preserve() and unpreserve()
> >> > callbacks. In that situation there is no way for the caller to protect
> >> > itself against the outgoing FLB from being freed while it is using it.
> >> > Incrementing the reference count in liveupdate_flb_get_outgoing()
> >> > ensures it cannot be freed.
> >>
> >> We grab a reference to the FLB's module when the first file using the
> >> FLB is preserved. So the FLB should never go away while preserved files
> >> exist. Once all preserved files go away, you normally shouldn't be doing
> >> anything with the FLB anyway.
> >>
> >> Can you please elaborate on the use case and why this is a problem?
> >> Using the FLB outside of the standard LUO file callbacks sounds
> >> problematic.
> >
> > The scenario I had in mind was to remove a PCI device from the outgoing
> > FLB if the device is forcibly removed while the file is still preserved,
> > for example someone writes 1 to /sys/bus/pci/devices/.../remove or a
> > device is physically hot-unplugged.
> >
> > Specifically this call here from the patch below:
> >
> > +void pci_liveupdate_cleanup_device(struct pci_dev *dev)
> > +{
> > + /*
> > + * It should be safe to READ_ONCE() outside of the rwsem during cleanup
> > + * since there should no longer be any references to @dev on the system.
> > + */
> > + if (READ_ONCE(dev->liveupdate.outgoing)) {
> > + pci_WARN(dev, 1, "Destroying outgoing-preserved device!\n");
> > + pci_liveupdate_unpreserve(dev);
> > + }
> > +}
> >
> > https://lore.kernel.org/linux-pci/20260522202410.3104264-3-dmatlack@google.com/
> >
> > I can do this without adding reference counting to
> > liveupdate_flb_get_outgoing(), but the reference counting makes it
> > obvious that the outgoing FLB will not be freed while I am using it
> > here, and also aligns with liveupdate_flb_get_incoming().
>
> The lifecycle of FLB is bound to _preserved_ files. So it is only valid
> as long as preserved files exist. So I think you should only get the FLB
> object when you are inside a file preservation callback for a file which
> the FLB is registered. Anywhere outside of that, you are not guaranteed
> to get anything sane.
LUO should enforce this then, IMO.
> This refcounting scheme breaks the inherent "file-lifecycle-bound" part
> of FLB, since now anyone can grab a reference and hold the FLB as long
> as they like, even when no preserved files exist.
>
> For the normal case, your the VFIO driver gets probed, it registers its
> file handler, then when the device is preserved by VFIO, the VFIO file
> handler's callbacks can get the FLB and do whatever. LUO guarantees the
> FLB exists. Anywhere outside of that, you should _not_ touch the FLB
> because of the reasons above.
>
> Now for hot-unplug, I think that case is not supported right now. When a
> preserved file exists, LUO can only remove it when the user closes the
> session. Trying to clean up the file from any other context will leave
> dangling references to the file and we currently do not handle those.
> Trying to hold the file reference won't help much either since LUO
> callbacks will try to proceed as normal, and normal no longer applies.
>
> For example, say userspace preserved the file for your device in their
> session, then you hot-unplug the device, then userspace triggers a
> kexec. What is the freeze() callback supposed to do? Sure, the FLB
> object still exists, but the device doesn't. Similarly, if you force
> remove the module, the freeze() callback itself no longer exists, and
> you likely get a panic.
>
> We might at some point support "invalidating" preserved files. I imagine
> when you hot-unplug with a preserved device, you tell LUO to invalidate
> all preserved files with that device. They would still exist in their
> sessions, but all operations on them fail immediately, including
> freeze(), which prevents live update from proceeding until user cleans
> them up.
>
> So unless I am missing something, I think this refcounting is a band-aid
> and the real problem is to properly track these "invalidated" files.
>
> Also, I think the refcounting on the incoming path is also a mistake.
> Unfortunately for incoming, there is a need for accessing the FLB
> outside of the file handling callbacks, since subsystems needs to use it
> to initialize itself. But I suppose we can have a accessor that
> subsystems can call once on boot/init to get their object. Then they use
> it to initialize their state and refer to the state directly, with all
> later calls going through the usual file handler callbacks.
>
> If you are interested in solving this problem, we can have a chat to
> talk in more detail, or perhaps have a discussion at one of the
> bi-weeklies?
Thanks for the detailed reply but I think it's hard to discuss all these
as theoretical situations since we can get bogged down in the parts that
aren't clear yet and potential future use-cases.
Can you review the use of the outgoing and incoming FLB in the PCI core
series and let me know what you think I am doing wrong?
https://lore.kernel.org/linux-pci/20260522202410.3104264-1-dmatlack@google.com/
>
> >
> >> >
> >> > This change also aligns the outgoing FLB lifecycle management with the
> >> > incoming FLB, since the latter uses the same get/put semantics.
> >> >
> >> > Fixes: cab056f2aae7 ("liveupdate: luo_flb: introduce File-Lifecycle-Bound global state")
> >> > Assisted-by: Gemini:gemini-3-pro-preview
> >> > Signed-off-by: David Matlack <dmatlack at google.com>
> >> [...]
> >>
> >> --
> >> Regards,
> >> Pratyush Yadav
>
> --
> Regards,
> Pratyush Yadav
More information about the kexec
mailing list