[RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag

Mon Oct 9 09:50:00 PDT 2023

On Mon, Oct 9, 2023 at 8:39 AM Andrew Jones <ajones at ventanamicro.com> wrote:
>
> On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> > On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones at ventanamicro.com> wrote:
> > > >
> > > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > ...
> > > > > So if I write out this algorithm, I get something like:
> > > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > > of pairs into this array.
> > > > >  * For each remaining cpu, go through this array and either confirm
> > > > > the big array's element matches this cpu's value, or clear the cpu
> > > > > from the result set.
> > > > >
> > > > > But why do we go to all the effort of de-duping the caller's array of
> > > > > pairs? Can't they do that themselves (or pay a small performance
> > > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > > compare the values, or clear the cpu in the remaining set.
> > > > >
> > > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > > O(query_lengh * ncpus).
> > > >
> > > > I want to de-dupe for two reasons:
> > > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > > >    small)
> > >
> > > Ok, but remember that if we ship this behavior today, we're committed
> > > to it forever. The keyspace is likely to grow, it would be unfortunate
> > > if this step started to cause a noticeable performance delay.
> >
> > Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> > than some number should return E2BIG.
> >
>
> Scratch this idea. I went looking for precedent for limiting the length of
> input arrays to syscalls, but couldn't find any. To the contrary, I found
> that move_pages() used to return E2BIG when there were "too many pages to
> move", but it hasn't done so since 2.6.29 and, even then, it appears the
> concern was multiplication overflow, not having "too much" work. So, we
> should leave the number of pairs unbounded, but, too me, that means we
> should de-dupe, since we can do the copy_from_user() once for each at that
> time, and any user which decides to provide each bit separately for each
> bitmask key type should get tiny speedup.

I still don't get the usecase where usermode is going to be submitting
this large jumble of keys, replete with many duplicates. And even if
such a case did exist, why should the kernel be the one deduping,
dragging in a runtime penalty for all other callers that didn't submit
duplicates. Usermode could de-dupe on their own I'd think.

-Evan