more /proc/cpuinfo & extension support related woes
Anup Patel
apatel at ventanamicro.com
Wed Jul 26 09:37:52 PDT 2023
On Wed, Jul 26, 2023 at 4:11 PM Conor Dooley <conor.dooley at microchip.com> wrote:
>
> On Wed, Jul 26, 2023 at 11:04:39AM +0200, Andrew Jones wrote:
> > On Tue, Jul 25, 2023 at 06:19:36PM +0100, Conor Dooley wrote:
>
> > > (I've only CCed a few people here, mostly those who were involved in the
> > > riscv_has_extension_likely() stuff earlier this year)
> > >
> > > I know using /proc/cpuinfo for extension detection is a bad idea, that's
> > > not the point of this. We can go on about how it is bad, but people are
> > > using it & we shouldn't knowingly lead them astray. For example, unlord
> > > reported on IRC yesterday that, having built a kernel using a released
> > > version of llvm, they were surprised to see illegal instruction
> > > exceptions given /proc/cpuinfo had a v in the ISA string entry.
> > >
> > > For most extensions that we support in the kernel at the moment, the
> > > general flow is something along the lines of the DT etc gets parsed,
> > > and then we set a bit if the hart supports the extension. For some
> > > select extensions, that have additional properties in the DT, we perform
> > > a check for those before setting the bit. Extensions that the kernel
> > > does not understand get ignored as it does not know what to look for in
> > > the isa string representation.
> > >
> > > As a result, the kernel populates a list of neither the extensions
> > > supported by the hart nor a list of the extensions supported by the
> > > running kernel. Instead, its something awkward in the middle - a list
> > > of extensions that a kernel built from this source code is aware of
> > > available on this platform.
> > >
> > > For the case where the system, but not the kernel, supports vector, we
> > > do what looks like a weird dance*, where we set up vsize, but then mask
> > > off the hwcap bit:
> > >
> > > if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> > > riscv_v_setup_vsize();
> > > /*
> > > * ISA string in device tree might have 'v' flag, but
> > > * CONFIG_RISCV_ISA_V is disabled in kernel.
> > > * Clear V flag in elf_hwcap if CONFIG_RISCV_ISA_V is disabled.
> > > */
> > > if (!IS_ENABLED(CONFIG_RISCV_ISA_V))
> > > elf_hwcap &= ~COMPAT_HWCAP_ISA_V;
> > > }
> > > * riscv_v_setup_vsize() w/o a return check should be a NOP when
> > > the config option is disabled.
> > >
> > > Note that only the hwcap bit is masked, but not the corresponding
> > > extension support tracking bit in riscv_isa. I'd swear I recall talking
> > > to Andy about how this could be problematic w.r.t. false reporting of
> > > some variety, I'll have to go find it. It is (I think) the same for any
> > > of the other extensions parsed from DT that has a Kconfig option that
> > > must be set before userspace can use it, such as F & D.
> > >
> > > Either way, this becomes problematic for /proc/cpuinfo, where we do:
> > > static void print_isa_ext(struct seq_file *f)
> > > {
> > > struct riscv_isa_ext_data *edata;
> > > int i = 0, arr_sz;
> > >
> > > arr_sz = ARRAY_SIZE(isa_ext_arr) - 1;
> > >
> > > /* No extension support available */
> > > if (arr_sz <= 0)
> > > return;
> > >
> > > for (i = 0; i <= arr_sz; i++) {
> > > edata = &isa_ext_arr[i];
> > > if (!__riscv_isa_extension_available(NULL, edata->isa_ext_id))
> > > continue;
> > > seq_printf(f, "_%s", edata->uprop);
> > > }
> > > }
> > >
> > > The issue here is the call to __riscv_isa_extension_available() which,
> > > with a NULL first argument, will use the riscv_isa bitmask that contains,
> > > as described above, a list of extensions that a kernel built from this
> > > source code is aware of that are available on this platform.
> > >
> > > This also impacts riscv_has_extension_[un]likely(), which operates on a
> > > similar premise. has_vector(), has_svnapot & has_fpu(), which are the
> > > users of riscv_has_extension_[un]likely() don't suffer here, because when
> > > their config option is unset, they resolve to `return -EOPNOTSUPP;` or
> > > similar.
> > >
> > > Other users of __riscv_isa_extension_available() I have not evaluated
> > > yet, but there are quite a few, scattered in various places, including
> > > kvm & hwprobe.
> > >
> > > I think we might've just overused this particular bit of information for
> > > various purposes & screwed ourselves a bit. There are a few main places
> > > this information is used:
> > >
> > > 1 Figuring what exactly the hardware supports during boot. Evan has
> > > added per-CPU tracking of this, although the kernel is limited here to
> > > the subset that it knows AND further restricted by a call to
> > > riscv_isa_extension_check(). This we don't actually use yet after
> > > boot, but Evan has a user on the lists. This per-CPU tracking is then
> > > used to create and LCD set of extensions that drives
> > > __riscv_isa_extension_available().
> > >
> > > 2 Telling userspace what extensions are supported. However, by what? The
> > > running-kernel & hw combo? The hardware? I tried to read the hwprobe
> > > doc and couldn't tell. /proc/cpuinfo, as I already said, is a
> > > mixed-mess, but hwprobe should mean "this extension is supported by
> > > the hardware (or firmware) and you can use this without a concern",
> > > right? Evan's proposed per-CPU reporting here adds a third thing to
> > > the mix, but is intended to (AFAIU) represent the set of
> > > known-to-the-kernel extensions that each hart supports, regardless of
> > > whether the kernel itself supports them.
> > >
> > > 3 Setting dynamic code paths in the kernel. For example, if the hardware
> > > supports Zbb, patch in the string manip routines. This is distinct
> > > from telling userspace it can use the extension, since something like
> > > Zbb can be used by userspace even if the kernel doesn't touch it.
> > >
> > > 4 Telling in-kernel users what extensions they can use. This, I figure,
> > > is more along the lines of my theorised meaning for hwprobe.
> > >
> > > I think for case 3, we are mostly okay. Either this sort of code uses
> > > alternatives, that have an in-built config dependency, or the
> > > has_whatever() stuff that has an alternative implementation if the
> > > correct config option is not set.
> > >
> > > For case 2, I think hwprobe is doing the right thing (for now) but
> > > /proc/cpuinfo needs to be changed.
> > >
> > > Case 4 probably needs an audit to make sure that nothing is going
> > > awry. I'll go have a look through the users.
> > >
> > > If other people think splitting this up a worthwhile endeavour, I'll
> > > go write some patches. I do FWIW. Or maybe I am overlooking something
> > > that makes this less of a problem than I think. LMK if so :)
>
> > Here's a summary of my thoughts, which probably aren't adding anything new
> > to this
> >
> > - The kernel needs per-hart bitmaps expressing what extensions are
> > supported for each given hart.
>
> We sorta have this. The per-hart bitmap is a bitmap of extensions that
> have defines in hwcap.h (an understandable limitation) and for which
> riscv_isa_extension_check() passes.
>
> > - The kernel needs a bitmap expressing the LCD of extensions across all
> > harts.
>
> Again, we sorta have this. It's all of the per-hart bitmaps, bitwise
> ANDed.
>
> > (The two bitmaps are clearly expressing the intersection of what's
> > supported by a hart and what the kernel is aware of, since the kernel
> > must allocate the bits.)
>
> Right. With the exception of the riscv_isa_extension_check(), that we
> currently apply to Zicbo{m,z}.
>
> Getting ahead of myself a little, but I wonder if we should remove the
> riscv_isa_extension_check() stuff from from the code that sets the
> bitmaps & instead have it as a separate stage, which would align our
> bitmaps with what you suggest here.
>
> > - Extensions in DT/ACPI that the kernel is unaware of get ignored by
> > the kernel and not exposed to userspace (if the extensions are for
> > U-mode and U-mode has another way of determining their presence to
> > use them, then good for U-mode)
>
> Sure.
>
> > - The kernel will combine information from one of the two bitmaps listed
> > above with other information, such as config information, to decide
> > when / if an extension should be used by the kernel and/or exposed to
> > userspace
>
> Right. This bit (or rather two bits, since I view the kernel and
> userspace bits here separately) is where we are falling short. I think
> hwprobe's limited users do the right thing, but we're not doing this for
> /proc/cpuinfo. For the in-kernel users, grepping shows some suspicious
> looking things, but how many are problematic I do not yet know. As an
> example, does
> static void kvm_riscv_vcpu_update_config(const unsigned long *isa)
> {
> u64 henvcfg = 0;
>
> if (riscv_isa_extension_available(isa, SVPBMT))
> henvcfg |= ENVCFG_PBMTE;
>
> work correctly if the SVPBMT Kconfig option is disabled?
> From a quick check, `isa` here is set from the host ISA
> /* Setup ISA features available to VCPU */
> for (i = 0; i < ARRAY_SIZE(kvm_isa_ext_arr); i++) {
> host_isa = kvm_isa_ext_arr[i];
> if (__riscv_isa_extension_available(NULL, host_isa) &&
> kvm_riscv_vcpu_isa_enable_allowed(i))
> set_bit(host_isa, vcpu->arch.isa);
> }
> so if the check passes for the host ISA, it'll pass for the guest ISA
> too, so henvcfg will end up with the PBMTE bit set. I just haven't yet
> checked what the outcome of this will be, but I figure not good?
Why is the outcome not good?
Guest need Svpbmt to support pass-through devices.
>
> > (it'd be good to have a consistent API for these types of
> > checks which combine extension presence with other information)
>
> I figure the best option might just be to make the
> __riscv_isa_extension_available() into this to avoid disruption? The
> vast majority of users of that function want to know whether or not it
> is safe to use the extension on that hart, not whether the hart itself
> supports it. In fact, outside of Evan's per-hart stuff in /proc/cpuinfo,
> do we have any users that would even want an API that checks the
> "unfiltered" versions? I'll have to check the users to answer that I
> think.
Each guest VCPU will have its own ISA bitmap so KVM needs the
__riscv_isa_extension_available() to check VCPU ISA bitmap.
>
> > - The kernel will expose sanitized extension information to userspace.
> > S-mode only extensions do not need to be exposed and extensions that
> > require kernel support should only be exposed when the kernel is
> > willing and able to support them. Additional properties of extensions,
> > such as block sizes, also need to be exposed (exposure is done through
> > hwprobe)
>
> It gets extra messy too when the prctl etc get involved. What do you
> report in /proc/cpuinfo's isa field when RISCV_ISA_V_DEFAULT_ENABLE is
> set to disabled? It would have to respect the sysctl, right?
> Ditto the prctl, which will interact "nicely" with things like Google's
> cpu_feature library that parses /proc/cpuinfo.
> https://github.com/google/cpu_features
>
> > I think we have most/all the above already, but the audits you've
> > volunteered to do would be welcome, along with maybe documenting
> > it all somewhere. Then, after that, all that's left is the "what to
> > do with /proc/cpuinfo and hwcap?" question.
>
> > Can we simply freeze
> > what extensions are exposed through them and mark them as deprecated?
>
> hwcap is already limited to the single-letter stuff, and is already
> disabling things like F, D & V if the config options are not set.
>
> But for /proc/cpuinfo, that is sorta where I was wondering if we should
> go, but hwprobe isn't a replacement for "casual" checking of what
> is/isn't supported, so it may be premature. Certainly, leaving people
> without an easy way to check by eye what extensions are present seems
> overly painful.
>
> Cheers,
> Conor.
Regards,
Anup
More information about the linux-riscv
mailing list