[PATCH v1 3/6] RISC-V: hwprobe: Introduce which-cpus flag
Andrew Jones
ajones at ventanamicro.com
Fri Oct 13 00:20:12 PDT 2023
On Thu, Oct 12, 2023 at 10:40:21AM -0700, Evan Green wrote:
> On Wed, Oct 11, 2023 at 6:56 AM Andrew Jones <ajones at ventanamicro.com> wrote:
> >
> > Introduce the first flag for the hwprobe syscall. The flag basically
> > reverses its behavior, i.e. instead of populating the values of keys
> > for a given set of cpus, the set of cpus after the call is the result
> > of finding a set which supports the values of the keys. In order to
> > do this, we implement a pair compare function which takes the type of
> > value (a single value vs. a bitmask of booleans) into consideration.
> > We also implement vdso support for the new flag.
> >
> > Signed-off-by: Andrew Jones <ajones at ventanamicro.com>
> > ---
> > Documentation/riscv/hwprobe.rst | 16 ++++-
> > arch/riscv/include/asm/hwprobe.h | 24 +++++++
> > arch/riscv/include/uapi/asm/hwprobe.h | 3 +
> > arch/riscv/kernel/sys_hwprobe.c | 93 +++++++++++++++++++++++++--
> > arch/riscv/kernel/vdso/hwprobe.c | 68 +++++++++++++++++---
> > 5 files changed, 190 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index c57437e40ffb..576aa03f56bb 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
> > have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> > value returned will be a logical AND of the values for the specified CPUs.
> > Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > -all online CPUs. There are currently no flags, this value must be zero for
> > -future compatibility.
> > +all online CPUs. The currently supported flags are:
> > +
> > +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> > + of sys_riscv_hwprobe(). Instead of populating the values of keys for a given
> > + set of CPUs, the set of CPUs is initially all unset and the values of each key
>
> This isn't quite right anymore, I reckon. The cpuset passed in is used
> as an initial set, and this function removes CPUs from the set that
> have differing values from those given in the array of pairs. If an
> empty cpuset is passed in, then the initial set used is all online
> cpus.
Indeed. I'll add more text describing the "which cpus of this cpuset"
behavior for v2.
>
> > + are given. Upon return, the CPUs which all match each of the given key-value
> > + pairs are set in ``cpus``. How matching is done depends on the key type. For
> > + value-like keys, matching means to be the exact same as the value. For
> > + boolean-like keys, matching means the result of a logical AND of the pair's
> > + value with the CPU's value is exactly the same as the pair's value. ``cpus``
> > + may also initially have set bits, in which case the bits of any CPUs which do
> > + not match the pairs will be cleared, but no other bits will be set.
> > +
> > +All other flags are reserved for future compatibility and must be zero.
> >
> > On success 0 is returned, on failure a negative error code is returned.
> >
> > diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
> > index 7cad513538d8..a68764149e51 100644
> > --- a/arch/riscv/include/asm/hwprobe.h
> > +++ b/arch/riscv/include/asm/hwprobe.h
> > @@ -15,4 +15,28 @@ static inline bool riscv_hwprobe_key_is_valid(__s64 key)
> > return key >= 0 && key <= RISCV_HWPROBE_MAX_KEY;
> > }
> >
> > +static inline bool hwprobe_key_is_bitmask(__s64 key)
> > +{
> > + switch (key) {
> > + case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > + case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > + case RISCV_HWPROBE_KEY_CPUPERF_0:
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> > + struct riscv_hwprobe *other_pair)
> > +{
> > + if (pair->key != other_pair->key)
> > + return false;
> > +
> > + if (hwprobe_key_is_bitmask(pair->key))
> > + return (pair->value & other_pair->value) == other_pair->value;
> > +
> > + return pair->value == other_pair->value;
> > +}
> > +
> > #endif
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > index 006bfb48343d..1d4134befc48 100644
> > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -38,4 +38,7 @@ struct riscv_hwprobe {
> > #define RISCV_HWPROBE_MISALIGNED_MASK (7 << 0)
> > /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> >
> > +/* Flags */
> > +#define RISCV_HWPROBE_WHICH_CPUS (1 << 0)
> > +
> > #endif
> > diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
> > index 69ad5f793374..de294538ca25 100644
> > --- a/arch/riscv/kernel/sys_hwprobe.c
> > +++ b/arch/riscv/kernel/sys_hwprobe.c
> > @@ -166,10 +166,10 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> > }
> > }
> >
> > -static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> > - size_t pair_count, size_t cpusetsize,
> > - unsigned long __user *cpus_user,
> > - unsigned int flags)
> > +static int hwprobe_get_values(struct riscv_hwprobe __user *pairs,
> > + size_t pair_count, size_t cpusetsize,
> > + unsigned long __user *cpus_user,
> > + unsigned int flags)
> > {
> > size_t out;
> > int ret;
> > @@ -223,6 +223,91 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> > return 0;
> > }
> >
> > +static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs,
> > + size_t pair_count, size_t cpusetsize,
> > + unsigned long __user *cpus_user,
> > + unsigned int flags)
> > +{
> > + cpumask_t cpus, one_cpu;
> > + bool clear_all = false;
> > + size_t i;
> > + int ret;
> > +
> > + if (flags != RISCV_HWPROBE_WHICH_CPUS)
> > + return -EINVAL;
>
> We'll have to be careful if we add another flag to deal with how it
> behaves here too. I think the choice you made here is correct as it's
> the most defensive. It's (usually) easy to change a failure to a
> non-failure, but an ABI compatibility issue to change a
> weird-but-nonfailing behavior to a different behavior. This might be
> worth a comment, but I also tend to love comments more than most, so
> up to you.
I'll probably leave it without a comment. My thinking is that if we
add another flag which applies to which-cpus, then the selftests,
which should get updated with new tests for the new flag, should
quickly point this out. I prefer avoiding comments when possible,
since neither the compiler nor test cases test them, allowing them
to easily go stale.
>
> > +
> > + if (!cpusetsize || !cpus_user)
> > + return -EINVAL;
> > +
> > + if (cpusetsize > cpumask_size())
> > + cpusetsize = cpumask_size();
> > +
> > + ret = copy_from_user(&cpus, cpus_user, cpusetsize);
> > + if (ret)
> > + return -EFAULT;
> > +
> > + cpumask_and(&cpus, &cpus, cpu_online_mask);
> > + if (cpumask_empty(&cpus))
> > + cpumask_copy(&cpus, cpu_online_mask);
>
> I worry this is accident prone. If the caller asks for some set of
> CPUs that don't happen to be online right now, they get an answer for
> a completely different set of CPUs. I'm all for having a shorthand for
> "all online CPUs", but I think it should be more explicit. If you
> moved the cpumask_and() below the if(cpumask_empty()), then IMO it
> would be a clean shorthand: pass in an empty set to get all online
> CPUs.
Good catch. The way it is now is indeed a bug and your suggestion is
the fix. Will do for v2.
>
> > +
> > + cpumask_clear(&one_cpu);
> > +
> > + for (i = 0; i < pair_count; i++) {
> > + struct riscv_hwprobe pair, tmp;
> > + int cpu;
> > +
> > + ret = copy_from_user(&pair, &pairs[i], sizeof(pair));
> > + if (ret)
> > + return -EFAULT;
> > +
> > + if (!riscv_hwprobe_key_is_valid(pair.key)) {
> > + clear_all = true;
> > + pair = (struct riscv_hwprobe){ .key = -1, };
> > + ret = copy_to_user(&pairs[i], &pair, sizeof(pair));
> > + if (ret)
> > + return -EFAULT;
> > + }
> > +
> > + if (clear_all)
> > + continue;
> > +
> > + tmp = (struct riscv_hwprobe){ .key = pair.key, };
> > +
> > + for_each_cpu(cpu, &cpus) {
> > + cpumask_set_cpu(cpu, &one_cpu);
> > +
> > + hwprobe_one_pair(&tmp, &one_cpu);
> > +
> > + if (!riscv_hwprobe_pair_cmp(&tmp, &pair))
> > + cpumask_clear_cpu(cpu, &cpus);
> > +
> > + cpumask_clear_cpu(cpu, &one_cpu);
> > + }
> > + }
> > +
> > + if (clear_all)
> > + cpumask_clear(&cpus);
> > +
> > + ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> > + if (ret)
> > + return -EFAULT;
> > +
> > + return 0;
> > +}
> > +
> > +static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> > + size_t pair_count, size_t cpusetsize,
> > + unsigned long __user *cpus_user,
> > + unsigned int flags)
> > +{
> > + if (flags & RISCV_HWPROBE_WHICH_CPUS)
> > + return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
> > + cpus_user, flags);
> > +
> > + return hwprobe_get_values(pairs, pair_count, cpusetsize,
> > + cpus_user, flags);
> > +}
> > +
> > #ifdef CONFIG_MMU
> >
> > static int __init init_hwprobe_vdso_data(void)
> > diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
> > index 026b7645c5ab..e6c324d64544 100644
> > --- a/arch/riscv/kernel/vdso/hwprobe.c
> > +++ b/arch/riscv/kernel/vdso/hwprobe.c
> > @@ -3,6 +3,7 @@
> > * Copyright 2023 Rivos, Inc
> > */
> >
> > +#include <linux/string.h>
> > #include <linux/types.h>
> > #include <vdso/datapage.h>
> > #include <vdso/helpers.h>
> > @@ -11,14 +12,9 @@ extern int riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > size_t cpusetsize, unsigned long *cpus,
> > unsigned int flags);
> >
> > -/* Add a prototype to avoid -Wmissing-prototypes warning. */
> > -int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > - size_t cpusetsize, unsigned long *cpus,
> > - unsigned int flags);
> > -
> > -int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > - size_t cpusetsize, unsigned long *cpus,
> > - unsigned int flags)
> > +static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count,
> > + size_t cpusetsize, unsigned long *cpus,
> > + unsigned int flags)
> > {
> > const struct vdso_data *vd = __arch_get_vdso_data();
> > const struct arch_vdso_data *avd = &vd->arch_data;
> > @@ -50,3 +46,59 @@ int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> >
> > return 0;
> > }
> > +
> > +static int riscv_vdso_get_cpus(struct riscv_hwprobe *pairs, size_t pair_count,
> > + size_t cpusetsize, unsigned long *cpus,
> > + unsigned int flags)
> > +{
> > + const struct vdso_data *vd = __arch_get_vdso_data();
> > + const struct arch_vdso_data *avd = &vd->arch_data;
> > + struct riscv_hwprobe *p = pairs;
> > + struct riscv_hwprobe *end = pairs + pair_count;
> > + bool clear_all = false;
> > +
> > + if (!cpusetsize || !cpus)
> > + return -EINVAL;
> > +
> > + if (flags != RISCV_HWPROBE_WHICH_CPUS || !avd->homogeneous_cpus)
> > + return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
> > +
>
> Hm, I realize now that doing the shorthand of "empty set == all online
> cpus" leaves the VDSO in a slight lurch, as we now have to figure out
> how to come up with the cpu_online_mask in usermode. It'd be a bummer
> if the shorthand always bounced to the system call,
Huh, I recall thinking about this while working on it, but now, looking at
the code I wrote, I see that I apparently forgot to address it. I had
intended to bounce to the system call to handle the empty set case, but
I'm missing the check for that!
> as I think this
> will be the most common case. Is there a way we could stash the
> cpu_online_mask in the vDSO data without it going stale? Any other
> ideas?
I don't think we can stash the data and keep it synchronized and, while
the empty set case may be useful for platform description utilities, I
actually suspect applications which want to set the affinity of their
threads based on cpu features will want to start by setting their set
to the result of sched_getaffinity(). This is because it wouldn't be
useful for an application to know that cpu A has the extension they need
if their process has been confined to a set excluding cpu A with cgroups.
I'll fix this lack of empty set check for v2.
Thanks,
drew
More information about the linux-riscv
mailing list