[PATCH 02/19] riscv: cpufeature: Fix thead vector hwcap removal

Conor Dooley conor at kernel.org
Fri Apr 12 13:42:42 PDT 2024


On Fri, Apr 12, 2024 at 01:34:43PM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 08:26:12PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> > > On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > > > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley at microchip.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > > > harts have the same values as the boot hart as a fallback.
> > > > > > >
> > > > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > > >
> > > > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > > > that the commit you're fixing here addresses will have these properties
> > > > > > and if they did have them, they'd then also be new enough to hopefully
> > > > > > not have "v" either - the issue is they're using whatever crap the
> > > > > > vendor shipped.
> > > > > > If we're gonna get the information from DT, we already have something
> > > > > > that we can look at to perform the disable as the cpu compatibles give
> > > > > > us enough information to make the decision.
> > > > > >
> > > > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > > > repeating these ecalls on all systems.
> > > > > >
> > > > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > > > knowledge the systems that this targets all have homogeneous
> > > > > > marchid/mvendorid values of 0x0.
> > > > > 
> > > > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > > > usermode in a per-hart way, it would be better IMO if we really do
> > > > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > > > about it after the fact. Since we reported the wrong information to
> > > > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > > > up.
> > > > 
> > > > You're misinterpreting, we do get the values on all individually as
> > > > they're brought online. This is only used by the code that throws a bone
> > > > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > > > support the unratified version.
> > > 
> > > Not quite,
> > 
> > Remember that this patch stands in isolation and the justification given
> > in your commit message does not mention anything other than fixing my
> > broken patch.
> 
> Fixing the patch in the simplest sense would be to eagerly get the
> mvendorid/marchid without using the cached version. But this assumes
> that all harts have the same mvendorid/marchid. This is not something
> that I am strongly attached to. If it truly is detrimental to Linux to
> allow a user a way to specify different vendorids for different harts
> then I will remove that code.

I think that the simple fix is all that we need to do here, perhaps
updating the comment to point out how naive we are being.
`
> > 
> > > the alternatives are patched before the other cpus are
> > > booted, so the alternatives will have false positives resulting in
> > > broken kernels.
> > 
> > Over-eagerly disabling vector isn't going to break any kernels and
> > really should not break a behaving userspace either.
> > Under-eagerly disabling it (in a way that this approach could solve) is
> > only going to happen on a system where the boot hart has non-zero values
> > and claims support for v but a non-boot hart has zero values and
> > claims support for v but actually doesn't implement the ratified version.
> > If the boot hart doesn't support v, then we currently disable the
> > extension as only homogeneous stuff is supported by Linux. If the boot
> > hart claims support for "v" but doesn't actually implement the ratified
> > version neither the intent of my original patch nor this fix for it are
> > going to help avoid a broken kernel.
> > 
> > I think we do have a problem if the boot cpu having some erratum leads
> > to the kernel being patched in a way that does not work for the other
> > CPUs on the system, but I don't think this series addresses that sort of
> > issue at all as you'd be adding code to the pi section if you were fixing
> > it. I also don't think we should be making pre-emptive changes to the
> > errata patching code either to solve that sort of problem, until an SoC
> > shows up where things don't work.
> > Cheers,
> > Conor.
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20240412/bb3d97d4/attachment.sig>


More information about the linux-arm-kernel mailing list