[PATCH v4 1/5] lib/bitmap: add bitmap_{set,get}_value()

Yury Norov yury.norov at gmail.com
Mon Jul 24 22:04:34 PDT 2023


On Mon, Jul 24, 2023 at 11:36:36AM +0300, Andy Shevchenko wrote:
> On Sat, Jul 22, 2023 at 06:57:23PM -0700, Yury Norov wrote:
> > On Thu, Jul 20, 2023 at 07:39:52PM +0200, Alexander Potapenko wrote:
> 
> > > +		map[index] &= ~(GENMASK(nbits - 1, 0) << offset);
> > 
> > 'GENMASK(nbits - 1, 0) << offset' looks really silly.
> 
> But you followed the thread to get a clue why it's written in this form, right?

Yes, I did. But I don't expect everyone looking at kernel code would spend
time recovering discussions that explain why that happened. So, at least it
would be fine to drop a comment.
 
> ...
> 
> > With all that I think the implementation should look something like
> > this:
> 
> I would go this way if and only if the code generation on main architectures
> with both GCC and clang is better.
> 
> And maybe even some performance tests need to be provided.

For the following implementation:

  void my_bitmap_write(unsigned long *map, unsigned long value,
                                 unsigned long start, unsigned long nbits)
  {
          unsigned long w, end;
  
          if (unlikely(nbits == 0))
                  return;
  
          value &= GENMASK(nbits - 1, 0);
  
          map += BIT_WORD(start);
          start %= BITS_PER_LONG;
          end = start + nbits - 1;
  
          w = *map & (end < BITS_PER_LONG ? ~GENMASK(end, start) : BITMAP_LAST_WORD_MASK(start));
          *map = w | (value << start);
  
          if (end < BITS_PER_LONG)
                  return;
  
          w = *++map & BITMAP_LAST_WORD_MASK(end + 1 - BITS_PER_LONG);
          *map = w | (value >> (BITS_PER_LONG - start));
  }

This is the bloat-o-meter output:

$ scripts/bloat-o-meter lib/test_bitmap.o.orig lib/test_bitmap.o
add/remove: 8/0 grow/shrink: 1/0 up/down: 2851/0 (2851)
Function                                     old     new   delta
test_bitmap_init                            3846    5484   +1638
test_bitmap_write_perf                         -     401    +401
bitmap_write                                   -     271    +271
my_bitmap_write                                -     248    +248
bitmap_read                                    -     229    +229
__pfx_test_bitmap_write_perf                   -      16     +16
__pfx_my_bitmap_write                          -      16     +16
__pfx_bitmap_write                             -      16     +16
__pfx_bitmap_read                              -      16     +16
Total: Before=36964, After=39815, chg +7.71%

And this is the performance test:

        for (cnt = 0; cnt < 5; cnt++) {
                time = ktime_get();
                for (nbits = 1; nbits <= BITS_PER_LONG; nbits++) {
                        for (i = 0; i < 1000; i++) {
                                if (i + nbits > 1000)
                                        break;
                                bitmap_write(bmap, val, i, nbits);
                        }
                }
                time = ktime_get() - time;
                pr_err("bitmap_write:\t%llu\t", time);

                time = ktime_get();
                for (nbits = 1; nbits <= BITS_PER_LONG; nbits++) {
                        for (i = 0; i < 1000; i++) {
                                if (i + nbits > 1000)
                                        break;
                                my_bitmap_write(bmap, val, i, nbits);
                        }
                }
                time = ktime_get() - time;
                pr_cont("%llu\n", time);
        }

Which on x86_64/kvm with GCC gives:
                                                Orig    My
[    1.630731] test_bitmap: bitmap_write:	299092	252764
[    1.631584] test_bitmap: bitmap_write:	299522	252554
[    1.632429] test_bitmap: bitmap_write:	299171	258665
[    1.633280] test_bitmap: bitmap_write:	299241	252794
[    1.634133] test_bitmap: bitmap_write:	306716	252934

So, it's ~15% difference in performance and 8% in size.

I don't insist on my implementation, but I think, we'd experiment for more
with code generation.

Thanks,
Yury



More information about the linux-arm-kernel mailing list