Forcing devices into idle

Thu Jul 3 07:22:57 PDT 2025

On Thu, Jul 3, 2025 at 4:14 PM Thierry Reding <thierry.reding at gmail.com> wrote:
>
> On Thu, Jul 03, 2025 at 03:46:03PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Jul 3, 2025 at 3:32 PM Thierry Reding <thierry.reding at gmail.com> wrote:
> > >
> > > On Thu, Jul 03, 2025 at 02:06:00PM +0200, Thierry Reding wrote:
> > > > On Thu, Jul 03, 2025 at 01:12:15PM +0200, Rafael J. Wysocki wrote:
> > > > > On Thu, Jul 3, 2025 at 12:33 PM Oliver Neukum <oneukum at suse.com> wrote:
> > > > > >
> > > > > > On 03.07.25 12:08, Thierry Reding wrote:
> > > > > >
> > > > > > > Any thoughts on how to solve this? Is the pm_runtime_{put,get}_sync()
> > > > > > > method acceptable? If not, are there other alternatives to achieve the
> > > > > > > same thing that I'm not aware of? Would it be useful to add a new set of
> > > > > > > APIs to force devices into an idle state (which could be semantically
> > > > > > > different from runtime suspend)? Or is this all too specific for any
> > > > > > > kind of generic API?
> > > > > >
> > > > > > Basically what you need is what happens when the system prepares to
> > > > > > do a snapshot for S4. However, if you just perform FREEZE and then THAW,
> > > > > > devices will assume that user space has been frozen. You need a way
> > > > > > to substitute for that assumption.
> > > > >
> > > > > Well, you just need to freeze user space beforehand.
> > > >
> > > > Freezing userspace seems a bit heavy-handed. There's only a very few
> > > > devices that need to be put into reset (such as the GPU), so most of
> > > > userspace should be fine to continue to run. For the GPU the idea is
> > > > to block all incoming requests while the device is forced into idle,
> > > > and then to resume processing these requests after the VPR resize.
> > > >
> > > > But maybe freezing userspace isn't the heavy operation that I think it
> > > > is. Ideally we do not want to suspend things for too long to avoid
> > > > stuttering on the userspace side.
> > > >
> > > > Also, I think we'd want the freezing to be triggered by the VPR driver
> > > > because userspace ideally doesn't know when the resizing happens. The
> > > > DMA BUF heap API that I'm trying to use is too simple for that, and
> > > > only the VPR driver knows when a resize needs to happen.
> > > >
> > > > Is it possible to trigger the freeze from a kernel driver? Or localize
> > > > the freezing of userspace to only the processes that are accessing a
> > > > given device?
> > > >
> > > > Other than that, freeze() and thaw() seem like the right callbacks for
> > > > this.
> > >
> > > I've prototyped this using the sledgehammer freeze_processes() and
> > > thaw_processes() functions and the entire process seems to be pretty
> > > quick. I can get through most of it in ~30 ms. This is on a mostly
> > > idle test system, so I expect this to go up significantly if there
> > > is a high load.
> > >
> > > On the other hand, this will drastically simplify the GPU driver
> > > implementation, because by the time ->freeze() is called, all userspace
> > > will be frozen, so there's no need to do any blocking on the kernel
> > > side.
> > >
> > > What I have now is roughly this:
> > >
> > >         freeze_processes();
> > >
> > >         for each VPR device dev:
> > >                 pm_generic_freeze(dev);
> > >
> > >         resize_vpr();
> > >
> > >         for each VPR device dev:
> > >                 pm_generic_thaw(dev);
> > >
> > >         thaw_processes()
> > >
> > > I still can't shake the feeling that this is sketchy, but it seems to
> > > work. Is there anything blatantly wrong about this?
> >
> > There are a few things to take into consideration.
> >
> > First, there are 4 tiers of "freeze" callbacks (->prepare, ->freeze,
> > ->freeze_late, ->freeze_noirq), and analogously for "thaw" callbacks,
> > but you only use one of them.  This may be fine in a particular case,
> > but you need to ensure that the other tiers are not needed and, in
> > particular, the _noirq ones need not be involved.  Also ensure that
> > they don't assume that PM notifiers have run (or that they will run on
> > the resume side).
>
> I think that's something we can accomodate in this case. The primary
> device that needs this in the integrated GPU on Tegra, so it's very a
> narrow set.
>
> > Second, if there are dependencies between the devices being frozen and
> > other devices, they will have to be taken into account.
>
> Fortunately, as far as I know the only dependency, if any, would be via
> the userspace. There's no relationship between the devices from a PM
> point of view.
>
> There might be a video decoder that decodes images into the VPR and the
> GPU would then read data out of the VPR and composite onto the screen.
> So as long as userspace is frozen, there should be no issue.
>
> > Also note that kernel threads are generally not affected by
> > freeze_processes(), but I guess this is not a problem in your use
> > case.
>
> I had read through some of the documentation around this, and yes, the
> kernel threads shouldn't be an issue.
>
> In any case I'll make sure to add comments where necessary to point out
> the peculiarities of the whole thing.
>
> So this sounds like a workable solution. If for whatever reason freezing
> the entire userspace doesn't work out, would it still be possible to
> "abuse" the freeze() and thaw() callbacks like this? The way I imagine
> this would be for freeze() to include code that effectively suspends all
> userspace submissions to the GPU (i.e. effectively suspend all GPU
> related operations).

If you can effectively prevent user space from interacting with any of
the affected devices, directly or indirectly, even via mmapped memory
regions or some such, then this should work.

The freezing of tasks is basically all about getting user space out of the way.