[PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage
Yury Norov
yury.norov at gmail.com
Sun Nov 28 22:38:39 PST 2021
On Sun, Nov 28, 2021 at 07:03:41PM +0100, mirq-test at rere.qmqm.pl wrote:
> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions like this:
> >
> > if (num_present_cpus() > 1)
> > do_something();
> >
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> >
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> >
> > if (num_present_cpus_gt(1))
> > do_something();
> >
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
>
> Having slept on it I have more structured thoughts:
>
> First, I like substituting bitmap_empty/full where possible - I think
> the change stands on its own, so could be split and sent as is.
Ok, I can do it.
> I don't like the proposed API very much. One problem is that it hides
> the comparison operator and makes call sites less readable:
>
> bitmap_weight(...) > N
>
> becomes:
>
> bitmap_weight_gt(..., N)
>
> and:
> bitmap_weight(...) <= N
>
> becomes:
>
> bitmap_weight_lt(..., N+1)
> or:
> !bitmap_weight_gt(..., N)
>
> I'd rather see something resembling memcmp() API that's known enough
> to be easier to grasp. For above examples:
>
> bitmap_weight_cmp(..., N) > 0
> bitmap_weight_cmp(..., N) <= 0
> ...
bitmap_weight_cmp() cannot be efficient. Consider this example:
bitmap_weight_lt(1000 0000 0000 0000, 1) == false
^
stop here
bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
^
stop here
I agree that '_gt' is less verbose than '>', but the advantage of
'_gt' over '>' is proportional to length of bitmap, and it means
that this API should exist.
> This would also make the implementation easier in not having to
> copy and paste the code three times. Could also use a simple
> optimization reducing code size:
In the next version I'll reduce code duplication like this:
bool bitmap_eq(..., N);
bool bitmap_ge(..., N);
#define bitmap_weight_gt(..., N) bitmap_weight_ge(..., N + 1)
#define bitmap_weight_lt(..., N) !bitmap_weight_ge(..., N)
#define bitmap_weight_le(..., N) !bitmap_weight_gt(..., N)
Thanks,
Yury
More information about the linux-snps-arc
mailing list