RNDR/SS vs. SMCCC

Fri May 28 19:36:06 PDT 2021

On Fri, 2021-05-28 at 13:56 +0100, Mark Brown wrote:
> On Fri, May 28, 2021 at 09:12:38AM +1000, Benjamin Herrenschmidt
> wrote:
> > 
> > Right, I was thinking about using RNDRSS instead.
> 
> Yeah, I *suspect* that's where we'll end up longer term now that
> the random.c code is less enthusiastic about calling the
> function - it was where the patches where initially until the
> concerns about overloading were raised and it does seem to map
> more naturally onto the API.  I do think we should hold off until
> we've got some concrete information on how real systems perform
> simply to avoid churn but I wouldn't be surprised to see us
> making changes once we have data.  Sounds like you'll be able to
> help out here!

Hehe yup.

> Note that they do both get washed through the PRNG, not that I
> think it makes a huge difference to the argument here.

Right. At this point, I think we can wait until HW is there and we have
enough data to decide what to do with the policy. I wish the ISA was
clearer in defining timing characteristics (and fail behaviour) of
those instructions... as-is, we'll probably have to mess around based
on whatever HW comes out, worse, possibly with quirks.

> > > In practice most of the non-seed arch_get_random usage is either
> > > a fallback if there's no seed variant or mixed in with the CRNG
> > > output anyway.
> > Right but I still don't believe the end result makes a whole lot of
> > sense. In absence of SMCCC we end up using RNDR as a seed which
> > hits
> > the "not a great match" comment. Not a huge deal I suppose, for our
> > (EC2) case, we could just not implement the SMCCC call, and let it
> > use
> > RNDR, it's still going to be better than no HW random source, I
> > just
> > don't like those firmware calls much.
> 
> I do see them as useful for the seeding case, it shouldn't be in
> quite such sensitive fast paths as the regular versions and 
> it means that if a system has a better entropy source than the
> one backing RNDR (especially if the one backing RNDR has some
> actual problem) then we can override in software.  As you say if
> the SMCCC isn't offering anything over the system registers then
> platforms don't need to implement it.

As far as I can tell, the ISA has some pretty strict requirements for
the entropy source backing RNDR, so ideally, if implementations are
compliants, it *should* be a non-issue. Famous last words... :)

> > > The arch_get_random_ interfaces already provide a return code
> > > which the callers can handle as needed, they can do something
> > > appropriate for the scenario rather than the architecture code
> > > having to pick.  Sometimes that's to retry later (random.c does
> > > this when seeding for example), sometimes that's to just carry on
> > > with whatever other entropy they've already got.
> > 
> > Ok. It's unfortunate that the ISA is so vague on the circumstances
> > where the instructions are allowed to fail... it says "a reasonable
> > amount of time", it may as well have said a "random amount of time"
> > for
> > the usefulness of it ;-)
> > The implementation I'm aware of will fail extremely rarely when the
> > HW
> > detects an issues that requires corrective action, but I could
> > imagine
> > some implemetations just failing when there's no entropy at hand
> > (esp.
> > with RNDRSS).
> 
> Yes, I think there being inadequate entropy at hand to reseed is
> the big concern here - some of that's going to be a quality
> tradeoff and it's very hard to actually enforce any constraints
> even if you define them so ultimately it all comes down to
> quality of implementation issues.

Yup. I hate this but I foresee a future where we'll have implementation
quirks. I hope not but ...

> > As long as the callers don't give up permanently, that's fine. I
> > was
> > just a bit concerned by cnrg_init_try_arch{_early}. It would be
> > preferable for these to "try harder".
> 
> Yeah, there is some scope for retries there - unfortunately the
> arch_get_random_ interface can't distinguish between temporary
> and permanent failures, and people won't want to to slow down the
> boot path at all by actually blocking.  Looking again there's
> some scope for improving this process, we will continue to pull
> seed values in crng_reseed() but not in quite the same way.  I'm
> now wondering if it'd make sense to hook retries of the
> architecture init into or alongside try_to_generate_entropy() in
> wait_for_random_bytes() when trust_cpu is set.

Could be. I'm away from the code right now, but I would like to avoid
having the system behaviour change overall based on whether it happened
to hit a failure case at boot or not. For deployments at scale, that
sort of "randomness" isn't the sort we want :)

Cheers,
Ben.