[PATCH] arm64/fpsimd: Ensure that offlined CPUs are not using SME

Mark Brown broonie at kernel.org
Tue Jun 18 08:43:39 PDT 2024


On Tue, Jun 18, 2024 at 03:51:47PM +0100, Mark Rutland wrote:
> On Tue, Jun 18, 2024 at 03:03:50PM +0100, Mark Brown wrote:

> > When we use CPU hotplug to offline a CPU we may transition directly from
> > running a task which was using SME to the CPU being offlined. This means
> > that PSTATE.{SM,ZA} may still be set, indicating to the system that SME is
> > still in use. This could create contention with other still running CPUs if
> > the system uses shared SMCUs.

> Does it actually cause contention if the CPU isn't issuing SME
> instructions?

It was misbehaving, I didn't dig into the specifics of how.  There will
be a power impact too regardless of any instructions being issued.

> Is this theoretical or something you see in practice?

It was inspired by a report, the reporter was able to fix their firmware
to be more sensible and issue the SMSTOP itself but it seemed like
reasonable defensiveness/politeness for us to release the resource
anyway.

> I don't think spin-table is relevant; there's no support whatsoever for
> offlining CPUs with spin-table (and offlining will be rejected long
> before cpu_die()).

Ah, good - I didn't spend enough time to convince myself there were no
situations where we'd try to take down the CPU anyway.

> > and it is possible that system firmware may not be ideally
> > implemented, so let's explicitly disable SME during the process of
> > offlining the CPU in order to ensure there's no spurious contention.

> If this is an issue, surely it's the same with idle, or any other long
> period spent in the kernel, or any long period where userspace leaves
> the CPU in streaming mode?

> It feels very odd that we should need to do something for cpu offlining
> in particular.

Yes, it's an issue for idle too in the case where we're not using
cpuidle - I sent a separate patch for that.  cpuidle should already
cover this either itself or when it notifies us that register state
will be lost.  

A good chunk of the other users that spend noticable time in kernel mode
will be using kernel mode floating point so disable anyway due to that,
and for everything else there's a tricky tradeoff with how long we're
spending in kernel vs how much pressure is being applied and the
likelyhood of returning to the same userspace process.  That feels like
we need some more real world experience to see what if anything is
needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20240618/a8435188/attachment.sig>


More information about the linux-arm-kernel mailing list