more /proc/cpuinfo & extension support related woes

Evan Green evan at rivosinc.com
Wed Jul 26 10:18:26 PDT 2023


On Wed, Jul 26, 2023 at 5:55 AM Andrew Jones <ajones at ventanamicro.com> wrote:
>
> On Wed, Jul 26, 2023 at 11:41:05AM +0100, Conor Dooley wrote:
> > On Wed, Jul 26, 2023 at 11:04:39AM +0200, Andrew Jones wrote:
> > > On Tue, Jul 25, 2023 at 06:19:36PM +0100, Conor Dooley wrote:
> >
> > > > (I've only CCed a few people here, mostly those who were involved in the
> > > > riscv_has_extension_likely() stuff earlier this year)
> > > >
> > > > I know using /proc/cpuinfo for extension detection is a bad idea, that's
> > > > not the point of this. We can go on about how it is bad, but people are
> > > > using it & we shouldn't knowingly lead them astray. For example, unlord
> > > > reported on IRC yesterday that, having built a kernel using a released
> > > > version of llvm, they were surprised to see illegal instruction
> > > > exceptions given /proc/cpuinfo had a v in the ISA string entry.
> > > >
> > > > For most extensions that we support in the kernel at the moment, the
> > > > general flow is something along the lines of the DT etc gets parsed,
> > > > and then we set a bit if the hart supports the extension. For some
> > > > select extensions, that have additional properties in the DT, we perform
> > > > a check for those before setting the bit. Extensions that the kernel
> > > > does not understand get ignored as it does not know what to look for in
> > > > the isa string representation.
> > > >
> > > > As a result, the kernel populates a list of neither the extensions
> > > > supported by the hart nor a list of the extensions supported by the
> > > > running kernel. Instead, its something awkward in the middle - a list
> > > > of extensions that a kernel built from this source code is aware of
> > > > available on this platform.
> > > >
> > > > For the case where the system, but not the kernel, supports vector, we
> > > > do what looks like a weird dance*, where we set up vsize, but then mask
> > > > off the hwcap bit:
> > > >
> > > >   if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> > > >           riscv_v_setup_vsize();
> > > >           /*
> > > >            * ISA string in device tree might have 'v' flag, but
> > > >            * CONFIG_RISCV_ISA_V is disabled in kernel.
> > > >            * Clear V flag in elf_hwcap if CONFIG_RISCV_ISA_V is disabled.
> > > >            */
> > > >           if (!IS_ENABLED(CONFIG_RISCV_ISA_V))
> > > >                   elf_hwcap &= ~COMPAT_HWCAP_ISA_V;
> > > >   }
> > > >   * riscv_v_setup_vsize() w/o a return check should be a NOP when
> > > >   the config option is disabled.
> > > >
> > > > Note that only the hwcap bit is masked, but not the corresponding
> > > > extension support tracking bit in riscv_isa. I'd swear I recall talking
> > > > to Andy about how this could be problematic w.r.t. false reporting of
> > > > some variety, I'll have to go find it. It is (I think) the same for any
> > > > of the other extensions parsed from DT that has a Kconfig option that
> > > > must be set before userspace can use it, such as F & D.
> > > >
> > > > Either way, this becomes problematic for /proc/cpuinfo, where we do:
> > > >   static void print_isa_ext(struct seq_file *f)
> > > >   {
> > > >           struct riscv_isa_ext_data *edata;
> > > >           int i = 0, arr_sz;
> > > >
> > > >           arr_sz = ARRAY_SIZE(isa_ext_arr) - 1;
> > > >
> > > >           /* No extension support available */
> > > >           if (arr_sz <= 0)
> > > >                   return;
> > > >
> > > >           for (i = 0; i <= arr_sz; i++) {
> > > >                   edata = &isa_ext_arr[i];
> > > >                   if (!__riscv_isa_extension_available(NULL, edata->isa_ext_id))
> > > >                           continue;
> > > >                   seq_printf(f, "_%s", edata->uprop);
> > > >           }
> > > >   }
> > > >
> > > > The issue here is the call to __riscv_isa_extension_available() which,
> > > > with a NULL first argument, will use the riscv_isa bitmask that contains,
> > > > as described above, a list of extensions that a kernel built from this
> > > > source code is aware of that are available on this platform.
> > > >
> > > > This also impacts riscv_has_extension_[un]likely(), which operates on a
> > > > similar premise. has_vector(), has_svnapot & has_fpu(), which are the
> > > > users of riscv_has_extension_[un]likely() don't suffer here, because when
> > > > their config option is unset, they resolve to `return -EOPNOTSUPP;` or
> > > > similar.
> > > >
> > > > Other users of __riscv_isa_extension_available() I have not evaluated
> > > > yet, but there are quite a few, scattered in various places, including
> > > > kvm & hwprobe.
> > > >
> > > > I think we might've just overused this particular bit of information for
> > > > various purposes & screwed ourselves a bit. There are a few main places
> > > > this information is used:
> > > >
> > > > 1 Figuring what exactly the hardware supports during boot. Evan has
> > > >   added per-CPU tracking of this, although the kernel is limited here to
> > > >   the subset that it knows AND further restricted by a call to
> > > >   riscv_isa_extension_check(). This we don't actually use yet after
> > > >   boot, but Evan has a user on the lists. This per-CPU tracking is then
> > > >   used to create and LCD set of extensions that drives
> > > >   __riscv_isa_extension_available().
> > > >
> > > > 2 Telling userspace what extensions are supported. However, by what? The
> > > >   running-kernel & hw combo? The hardware? I tried to read the hwprobe
> > > >   doc and couldn't tell. /proc/cpuinfo, as I already said, is a

It should be the running kernel & hw. The _BASE_BEHAVIOR key had the language "
user-visible behavior that this kernel supports", but we probably
should have also mentioned that in keys like _FD.

> > > >   mixed-mess, but hwprobe should mean "this extension is supported by
> > > >   the hardware (or firmware) and you can use this without a concern",
> > > >   right? Evan's proposed per-CPU reporting here adds a third thing to
> > > >   the mix, but is intended to (AFAIU) represent the set of
> > > >   known-to-the-kernel extensions that each hart supports, regardless of
> > > >   whether the kernel itself supports them.
> > > >
> > > > 3 Setting dynamic code paths in the kernel. For example, if the hardware
> > > >   supports Zbb, patch in the string manip routines. This is distinct
> > > >   from telling userspace it can use the extension, since something like
> > > >   Zbb can be used by userspace even if the kernel doesn't touch it.
> > > >
> > > > 4 Telling in-kernel users what extensions they can use. This, I figure,
> > > >   is more along the lines of my theorised meaning for hwprobe.
> > > >
> > > > I think for case 3, we are mostly okay. Either this sort of code uses
> > > > alternatives, that have an in-built config dependency, or the
> > > > has_whatever() stuff that has an alternative implementation if the
> > > > correct config option is not set.
> > > >
> > > > For case 2, I think hwprobe is doing the right thing (for now) but
> > > > /proc/cpuinfo needs to be changed.

Will we break userspace by doing this? Or is the argument something
like: in old kernels we falsely reported extensions that no one could
possibly use since the kernel didn't support them. So we know by
removing them we're not breaking anybody, since there was no way to
successfully use the info.

> > > >
> > > > Case 4 probably needs an audit to make sure that nothing is going
> > > > awry. I'll go have a look through the users.
> > > >
> > > > If other people think splitting this up a worthwhile endeavour, I'll
> > > > go write some patches. I do FWIW. Or maybe I am overlooking something
> > > > that makes this less of a problem than I think. LMK if so :)
> >
> > > Here's a summary of my thoughts, which probably aren't adding anything new
> > > to this
> > >
> > >  - The kernel needs per-hart bitmaps expressing what extensions are
> > >    supported for each given hart.
> >
> > We sorta have this. The per-hart bitmap is a bitmap of extensions that
> > have defines in hwcap.h (an understandable limitation) and for which
> > riscv_isa_extension_check() passes.
> >
> > >  - The kernel needs a bitmap expressing the LCD of extensions across all
> > >    harts.
> >
> > Again, we sorta have this. It's all of the per-hart bitmaps, bitwise
> > ANDed.
> >
> > >  (The two bitmaps are clearly expressing the intersection of what's
> > >   supported by a hart and what the kernel is aware of, since the kernel
> > >   must allocate the bits.)
> >
> > Right. With the exception of the riscv_isa_extension_check(), that we
> > currently apply to Zicbo{m,z}.
> >
> > Getting ahead of myself a little, but I wonder if we should remove the
> > riscv_isa_extension_check() stuff from from the code that sets the
> > bitmaps & instead have it as a separate stage, which would align our
> > bitmaps with what you suggest here.
>
> You touch on this later, but tracking "non-filtered" extensions may not be
> useful. riscv_isa_extension_check() drops the tracking of extensions which
> have incomplete descriptions. Zicbom with no or a bad block size described
> is really no Zicbom at all, for example.
>
> >
> > >  - Extensions in DT/ACPI that the kernel is unaware of get ignored by
> > >    the kernel and not exposed to userspace (if the extensions are for
> > >    U-mode and U-mode has another way of determining their presence to
> > >    use them, then good for U-mode)
> >
> > Sure.
> >
> > >  - The kernel will combine information from one of the two bitmaps listed
> > >    above with other information, such as config information, to decide
> > >    when / if an extension should be used by the kernel and/or exposed to
> > >    userspace
> >
> > Right. This bit (or rather two bits, since I view the kernel and
> > userspace bits here separately) is where we are falling short. I think
> > hwprobe's limited users do the right thing, but we're not doing this for
> > /proc/cpuinfo. For the in-kernel users, grepping shows some suspicious
> > looking things, but how many are problematic I do not yet know. As an
> > example, does
> > static void kvm_riscv_vcpu_update_config(const unsigned long *isa)
> > {
> >       u64 henvcfg = 0;
> >
> >       if (riscv_isa_extension_available(isa, SVPBMT))
> >               henvcfg |= ENVCFG_PBMTE;
> >
> > work correctly if the SVPBMT Kconfig option is disabled?
> > From a quick check, `isa` here is set from the host ISA
> >       /* Setup ISA features available to VCPU */
> >       for (i = 0; i < ARRAY_SIZE(kvm_isa_ext_arr); i++) {
> >               host_isa = kvm_isa_ext_arr[i];
> >               if (__riscv_isa_extension_available(NULL, host_isa) &&
> >                   kvm_riscv_vcpu_isa_enable_allowed(i))
> >                       set_bit(host_isa, vcpu->arch.isa);
> >       }
> > so if the check passes for the host ISA, it'll pass for the guest ISA
> > too, so henvcfg will end up with the PBMTE bit set. I just haven't yet
> > checked what the outcome of this will be, but I figure not good?
>
> As long as KVM doesn't depend on the rest of the kernel having the
> extension enabled in order to support the guest's use of it, then the
> extension can be safely exposed to the guest kernel, even when the
> host kernel has opted to avoid its use. Indeed, even if the host
> kernel was compiled without the support of the extension as a
> workaround for errata, exposing the extension to the guest isn't
> horrible, as the guest kernel may want to experiment with the broken
> extension or choose to workaround those errata in another way.
>
> >
> > > (it'd be good to have a consistent API for these types of
> > >    checks which combine extension presence with other information)
> >
> > I figure the best option might just be to make the
> > __riscv_isa_extension_available() into this to avoid disruption? The
> > vast majority of users of that function want to know whether or not it
> > is safe to use the extension on that hart, not whether the hart itself
> > supports it. In fact, outside of Evan's per-hart stuff in /proc/cpuinfo,
> > do we have any users that would even want an API that checks the
> > "unfiltered" versions? I'll have to check the users to answer that I
> > think.

So the idea is to add stuff into __riscv_isa_extension_available()
that filters out extensions if their corresponding Kconfig isn't
enabled? My worry would be that in the future we're going to add
knowledge of an extension but then forget to add a hunk here to
disable it if the Kconfig isn't enabled. I guess for extensions that
are entirely unusable if the Kconfig is disabled, we can call it an
"oops" and fix it with the same rationale as I described above. But if
we ever did this with an extension that was partially usable from
userspace we'd be stuck with it. Maybe I'm speculating too much.

> >
> > >  - The kernel will expose sanitized extension information to userspace.
> > >    S-mode only extensions do not need to be exposed and extensions that
> > >    require kernel support should only be exposed when the kernel is
> > >    willing and able to support them. Additional properties of extensions,
> > >    such as block sizes, also need to be exposed (exposure is done through
> > >    hwprobe)
> >
> > It gets extra messy too when the prctl etc get involved. What do you
> > report in /proc/cpuinfo's isa field when RISCV_ISA_V_DEFAULT_ENABLE is
> > set to disabled? It would have to respect the sysctl, right?
> > Ditto the prctl, which will interact "nicely" with things like Google's
> > cpu_feature library that parses /proc/cpuinfo.
> >       https://github.com/google/cpu_features
>
> Sigh...

The V prctl is kind of a special case though, isn't it? The way I
understood it is as a workaround for breaking userspace ABI by adding
the V registers to the signal context. So we're not expecting
individual applications to worry about how the prctl is set, init sets
it to say "I've recompiled the world and vouch that apps can handle
the larger struct". I'm hoping we can mostly ignore the prctl()'s
interaction with V.

>
> >
> > > I think we have most/all the above already, but the audits you've
> > > volunteered to do would be welcome, along with maybe documenting
> > > it all somewhere. Then, after that, all that's left is the "what to
> > > do with /proc/cpuinfo and hwcap?" question.
> >
> > > Can we simply freeze
> > > what extensions are exposed through them and mark them as deprecated?
> >
> > hwcap is already limited to the single-letter stuff, and is already
> > disabling things like F, D & V if the config options are not set.
> >
> > But for /proc/cpuinfo, that is sorta where I was wondering if we should
> > go, but hwprobe isn't a replacement for "casual" checking of what
> > is/isn't supported, so it may be premature. Certainly, leaving people
> > without an easy way to check by eye what extensions are present seems
> > overly painful.
>
> I guess it depends on the purpose of the easy check. Is it to see
> what's supported+enabled by the kernel? Or what's supported by the
> platform, but may or may not be useable? Or just what's usable for
> U-mode? Maybe we need a sysfs extension listing where each extension
> node has subnodes providing their current status (enabled, disabled,
> etc.). Then lscpu could be taught how to use that information to
> output different listings depending on the user's needs.

I can see people making an argument for keeping it closer to what the
hardware has, rather than "understood and enabled in the kernel", for
cases like reporting hardware info of a fleet being managed. Maybe
there are analogies over in the Intel/ARM world we can use to see what
the exact expectations of /proc/cpuinfo are (eg do they report
features in /proc/cpuinfo even if the Kconfig is off?).

-Evan



More information about the linux-riscv mailing list