[PATCH] random: defer use of bootloader randomness to random_init()

Ard Biesheuvel ardb at kernel.org
Tue Jun 7 07:19:26 PDT 2022


On Tue, 7 Jun 2022 at 14:22, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Hi Ard,
>
> On Tue, Jun 07, 2022 at 02:15:27PM +0200, Ard Biesheuvel wrote:
> > Note that jump labels use asm() blocks, which are opaque to the
> > compiler, and so it is not guaranteed that codegen will be better than
>
> I actually spent a lot of time looking at the codegen on a few
> platforms.
>

I did a quick experiment on ThunderX2 with the following program

#include <stdio.h>
#include <stdlib.h>
#include <sys/random.h>

static unsigned char buf[16];

int main(void)
{
  for (int i = 0; i < 1000000; i++) {
    if (getrandom(buf, sizeof(buf),
        GRND_RANDOM | GRND_NONBLOCK) < sizeof(buf)) {
          fprintf(stderr, "getrandom() error!\n");
          exit(-1);
    }
  }
  return 0;
}

both with and without your revert patch of f5bda35fba615ac applied
onto v5.19-rc1, the results are below (Cortex-A57 @ 2 GHz):

############## WITH STATIC BRANCH

ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            906.01 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                40      page-faults               #    0.044 K/sec
     1,812,010,034      cycles                    #    2.000 GHz
     1,944,549,733      instructions              #    1.07  insn per cycle
   <not supported>      branches
         2,014,408      branch-misses

       0.906695576 seconds time elapsed

       0.140334000 seconds user
       0.765871000 seconds sys


ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            918.37 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                37      page-faults               #    0.040 K/sec
     1,836,733,451      cycles                    #    2.000 GHz
     1,944,572,442      instructions              #    1.06  insn per cycle
   <not supported>      branches
         3,012,020      branch-misses

       0.919027080 seconds time elapsed

       0.159721000 seconds user
       0.758718000 seconds sys


ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            902.06 msec task-clock                #    0.999 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.043 K/sec
     1,804,103,600      cycles                    #    2.000 GHz
     1,944,563,889      instructions              #    1.08  insn per cycle
   <not supported>      branches
         1,956,027      branch-misses

       0.902793520 seconds time elapsed

       0.172464000 seconds user
       0.729822000 seconds sys

############## WITHOUT STATIC BRANCH

ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            924.79 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.042 K/sec
     1,849,568,681      cycles                    #    2.000 GHz
     1,950,584,115      instructions              #    1.05  insn per cycle
   <not supported>      branches
         4,012,227      branch-misses

       0.925500832 seconds time elapsed

       0.204227000 seconds user
       0.720739000 seconds sys


ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            933.06 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.042 K/sec
     1,866,097,571      cycles                    #    2.000 GHz
     1,950,574,944      instructions              #    1.05  insn per cycle
   <not supported>      branches
         3,984,008      branch-misses

       0.933798032 seconds time elapsed

       0.323067000 seconds user
       0.610271000 seconds sys


ard at dogfood:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            913.16 msec task-clock                #    0.999 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                37      page-faults               #    0.041 K/sec
     1,826,308,530      cycles                    #    2.000 GHz
     1,950,559,902      instructions              #    1.07  insn per cycle
   <not supported>      branches
         2,231,050      branch-misses

       0.913874672 seconds time elapsed

       0.164228000 seconds user
       0.749157000 seconds sys

So that's a 0.3% reduction (in terms of actual instructions executed)
in a synthetic benchmark that does nothing but call getrandom() in a
tight loop.

In other words, I think the static branch solves a problem that does not exist.

> > > > - Why do we need to enable this static key so early?
> > >
> > > We don't need to enable it especially early. I've now sent three
> > > different approaches for deferring it until later and you suggested one.
> > > The first of mine is kind of ugly (checking static_key_initialized and
> > > such at different points).  Your suggested one after that did the same
> > > but deferred into crng_reseed(), which I'm not a fan of. My second one
> > > is this patch, which is flawed for the reason you pointed out. But
> > > perhaps my third one is the right amount of simple and okay? That's the
> > > one I linked up top, [1]. Let me know what you think of that.
> > >
> > > My motivation for not wanting to defer it is that if the arch solution
> > > winds up being easy and straight forward (as it was for arm64), then it
> > > would be nice to not need to clutter up random.c as a result.
> >
> > If clutter is a concern, how about getting rid of the
> > execute_in_process_context() dance, and just use a late_initcall()
> > instead?
>
> As I already explained in [1], this does not work. If the order is
> (A)(B), then all this will happen *after* the late init call.
>
> [1] https://lore.kernel.org/lkml/Yp8oOH+9V336LrLk@zx2c4.com/
>

Yeah, I guess finding the right spot is tricky. The more reason just
to drop it altogether.



More information about the linux-arm-kernel mailing list