[RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag

Thu Oct 5 11:11:25 PDT 2023

On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones at ventanamicro.com> wrote:
> >
> > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
...
> > > So if I write out this algorithm, I get something like:
> > >  * Create an array of every possible key, and dedupe the caller's list
> > > of pairs into this array.
> > >  * For each remaining cpu, go through this array and either confirm
> > > the big array's element matches this cpu's value, or clear the cpu
> > > from the result set.
> > >
> > > But why do we go to all the effort of de-duping the caller's array of
> > > pairs? Can't they do that themselves (or pay a small performance
> > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > For each pair in the user's set, for each remaining cpu in the set,
> > > compare the values, or clear the cpu in the remaining set.
> > >
> > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > O(query_lengh * ncpus).
> >
> > I want to de-dupe for two reasons:
> >  * query_length is unbounded, but keyspace is bounded (and is currently
> >    small)
> 
> Ok, but remember that if we ship this behavior today, we're committed
> to it forever. The keyspace is likely to grow, it would be unfortunate
> if this step started to cause a noticeable performance delay.

Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
than some number should return E2BIG.

> 
> >  * The merge function also provides an opportunity for sanity checks of
> >    value-type duplicates. If the values don't match then the syscall
> >    returns -EINVAL.
> 
> But the behavior if we don't do the deduping is that we get an empty
> set of CPUs (eg the caller has asked "which CPUs have featureX==1 &&
> featureX==2", answer: none). Other than perhaps a diagnostic print,
> can the application take any practical different course of action from
> these two results? In other words, the only logical choices I can see
> an app or library taking to either of these two responses is 1) shrug
> and move on or 2) panic and abort.

For (2), an EINVAL would indicate what went wrong to the developer and
it may be fixed. Also, for EINVAL, there's (3), which is to propagate
the error, possibly all the way to the user, where the inputs could get
corrected and the call may be retried.

> In my opinion the deduping has real costs associated with it, but
> other than more directly pointing out that usermode made a silly
> request, I don't understand the benefit.
>

When pair_count is large, then the cost should be less to loop over the
pairs once, de-duping, than to loop over them once for each CPU. If
we opt to limit pair_count, then I agree there's less need for a de-
dupe loop, but we still need to copy_from_user() each pair, and that
should only be done once. So, we either complicate the body of the main
loop in order to do it on the first time through, or we do it in a
separate loop first (where we might as well also de-dupe :-)

Thanks,
drew