[PATCH AUTOSEL 6.15 6/8] PM: Restrict swap use to later in the suspend sequence

Wed Jul 9 09:55:47 PDT 2025

On Wed, Jul 9, 2025 at 6:35 PM Mario Limonciello
<mario.limonciello at amd.com> wrote:
>
> On 7/9/2025 12:23 PM, Eric W. Biederman wrote:
> > Sasha Levin <sashal at kernel.org> writes:
> >
> >> On Tue, Jul 08, 2025 at 04:46:19PM -0500, Eric W. Biederman wrote:
> >>> Sasha Levin <sashal at kernel.org> writes:
> >>>
> >>>> On Tue, Jul 08, 2025 at 02:32:02PM -0500, Eric W. Biederman wrote:
> >>>>>
> >>>>> Wow!
> >>>>>
> >>>>> Sasha I think an impersonator has gotten into your account, and
> >>>>> is just making nonsense up.
> >>>>
> >>>> https://lore.kernel.org/all/aDXQaq-bq5BMMlce@lappy/
> >>>
> >>> It is nice it is giving explanations for it's backporting decisions.
> >>>
> >>> It would be nicer if those explanations were clearly marked as
> >>> coming from a non-human agent, and did not read like a human being
> >>> impatient for a patch to be backported.
> >>
> >> Thats a fair point. I'll add "LLM Analysis:" before the explanation to
> >> future patches.
> >>
> >>> Further the machine given explanations were clearly wrong.  Do you have
> >>> plans to do anything about that?  Using very incorrect justifications
> >>> for backporting patches is scary.
> >>
> >> Just like in the past 8 years where AUTOSEL ran without any explanation
> >> whatsoever, the patches are manually reviewed and tested prior to being
> >> included in the stable tree.
> >
> > I believe there is some testing done.  However for a lot of what I see
> > go by I would be strongly surprised if there is actually much manual
> > review.
> >
> > I expect there is a lot of the changes are simply ignored after a quick
> > glance because people don't know what is going on, or they are of too
> > little consequence to spend time on.
> >
> >> I don't make a point to go back and correct the justification, it's
> >> there more to give some idea as to why this patch was marked for
> >> review and may be completely bogus (in which case I'll drop the patch).
> >>
> >> For that matter, I'd often look at the explanation only if I don't fully
> >> understand why a certain patch was selected. Most often I just use it as
> >> a "Yes/No" signal.
> >>
> >> In this instance I honestly haven't read the LLM explanation. I agree
> >> with you that the explanation is flawed, but the patch clearly fixes a
> >> problem:
> >>
> >>      "On AMD dGPUs this can lead to failed suspends under memory
> >>      pressure situations as all VRAM must be evicted to system memory
> >>      or swap."
> >>
> >> So it was included in the AUTOSEL patchset.
> >
> >
> >> Do you have an objection to this patch being included in -stable? So far
> >> your concerns were about the LLM explanation rather than actual patch.
> >
> > Several objections.
> > - The explanation was clearly bogus.
> > - The maintainer takes alarm.
> > - The patch while small, is not simple and not obviously correct.
> > - The patch has not been thoroughly tested.
> >
> > I object because the code does not appear to have been well tested
> > outside of the realm of fixing the issue.
> >
> > There is no indication that the kexec code path has ever been exercised.
> >
> > So this appears to be one of those changes that was merged under
> > the banner of "Let's see if this causes a regression".>
> > To the original authors.  I would have appreciated it being a little
> > more clearly called out in the change description that this came in
> > under "Let's see if this causes a regression".
> >
>
> As the original author of this patch I don't feel this patch is any
> different than any other patch in that regard.
> I don't write in a commit message the expected risk of a patch.
>
> There are always people that find interesting ways to exercise it and
> they could find problems that I didn't envision.
>
> > Such changes should not be backported automatically.  They should be
> > backported with care after the have seen much more usage/testing of
> > the kernel they were merged into.  Probably after a kernel release or
> > so.  This is something that can take some actual judgment to decide,
> > when a backport is reasonable.
>
> TBH - I didn't include stable in the commit message with the intent that
> after this baked a cycle or so that we could bring it back later if
> AUTOSEL hadn't picked it up by then.

I actually see an issue in this patch that I have overlooked
previously, so Sasha and "stable" folks - please drop this one.

Namely, the change in dpm_resume_end() is going too far.

> It's a real issue people have complained about for years that is
> non-obvious where the root cause is.
>
> Once we're all confident on this I'd love to discuss bringing it back
> even further to LTS kernels if it's viable.

Sure.