[RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path

K Prateek Nayak kprateek.nayak at amd.com
Mon Mar 16 22:11:07 PDT 2026


Hello Samuel,

On 3/17/2026 8:36 AM, Samuel Holland wrote:
>> @@ -1913,7 +1909,7 @@ int futex_hash_allocate_default(void)
>>        *   16 <= threads * 4 <= global hash size
>>        */
>>       buckets = roundup_pow_of_two(4 * threads);
>> -     buckets = clamp(buckets, 16, futex_hashmask + 1);
>> +     buckets = clamp(buckets, 16, __futex_mask + 1);
>>
>>       if (current_buckets >= buckets)
>>               return 0;
>> @@ -1983,10 +1979,19 @@ static int __init futex_init(void)
>>       hashsize = max(4, hashsize);
>>       hashsize = roundup_pow_of_two(hashsize);
>>  #endif
>> -     futex_hashshift = ilog2(hashsize);
>> +     __futex_mask = hashsize - 1;
>> +     __futex_shift = ilog2(hashsize);
> 
> __futex_mask is always a power of two minus 1, in other words all low bits set.
> Would it be worth using an n-bit zero extension operation instead of an
> arbitrary 32-bit mask? This would use fewer instructions on some architectures:
> for example a single ubfx on arm64 and slli+srli on riscv.

Sure that works for __futex_mask but runtime_const_mask_32() should be
generic enough to handle any mask, no?

Currently, the __futex_hash() with futex_hashmask ends up being:


  # ./include/linux/jhash.h:139:          __jhash_final(a, b, c);
          xor     a4,a4,a3        # tmp350, tmp353, tmp334
  ...
  # kernel/futex/core.c:449:      return &futex_queues[node][hash & futex_hashmask];
          lla     a3,.LANCHOR0    # tmp361,
  # kernel/futex/core.c:449:      return &futex_queues[node][hash & futex_hashmask];
          ld      a5,0(a3)                # __futex_data.hashmask, __futex_data.hashmask
  ...
  # kernel/futex/core.c:449:      return &futex_queues[node][hash & futex_hashmask];
          and     a5,a5,a4        # tmp358, tmp367, __futex_data.hashmask


which isn't too far from what runtime_const_mask_32() implements
where "lla + ld" sequence gets replaced by the "lui + addi"
sequence to load the immediate.

Sure it can be better here since we know the bitmask is of the form
GENMASK(N,0) but IMO runtime_const_mask_32() should generally work
for all masks.

Now, runtime_const_mask_lower_32(val, nbits) may be a better suited
API name for that purpose.

If there is enough interest, I'll go back to the drawing board and
go that route for v2 for arm64 and riscv.

-- 
Thanks and Regards,
Prateek




More information about the linux-riscv mailing list