[EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround

Marc Zyngier maz at kernel.org
Mon Jul 26 11:03:06 PDT 2021


Hi Bharat,

On Mon, 26 Jul 2021 05:29:53 +0100,
Bharat Bhushan <bbhushan2 at marvell.com> wrote:
> 
> Sorry for delayed response
> 
> Please see inline
> 
> > -----Original Message-----
> > From: Mark Rutland <mark.rutland at arm.com>
> > Sent: Tuesday, July 13, 2021 9:43 PM
> >
> > 1) A guest can deliberately cause information to be leaked to itself via
> >    the corrupted GPRs. I haven't seen any rationale for why that is not
> >    a problem, nor have I seen a suggested workaround.
> > 
> > 2) A guest *may* be able to trigger this while the host is running. I
> >    haven't seen anything that rules this out so far.
> > 
> > 3) Even in the absence of virtualization, it would be necessary to
> >    workaround this for *every* level-triggered interrupt, which includes
> >    at the timer, PMU, and GIC maintenance interrupts, in addition to any
> >    other configurable PPIs or SPIs.
> > 
> > Without a fix that covers all of those, I don't think the
> > workaround is viable.
> 
> This patch covers workaround for ARM arch timer in non-virtualized
> cases.
> 
> While we are considering different scenarios which can trigger the
> issue.  After discussing with HW folks internally we have come to a
> conclusion that there is no single workaround which will fix all the
> scenarios. The host timer interrupt workaround is different from
> virtualization and from other interrupt sources.
> 
> While we are working on other workarounds, we want to push timer
> workaround first as currently that's the one customers are
> encountering right now and want a upstream accepted patch
> soon. Other workarounds will take time to test and qualify.
> 
> Wrt drivers disabling the interrupt, except changing the driver, we
> don't see any common place where we can add a workaround. Please let
> me your take on this.

I don't think a workaround limited to the timer is viable. It is quite
obvious that once you have worked around the most likely cause for a
crash (timer interrupts), you will need to come up with yet another
workaround for another interrupt source.

We need a solution that works for all interrupts, or at the very least
all per-CPU interrupts. For global interrupts, only you can find out
how they can be mitigated. If that means changing drivers, so be it.
I understand that this isn't what you want to read, but I'm not
confident taking this patch with the knowledge that there is still a
million ways to make it fall over.

Evidently, KVM cannot be enabled on such a system. More importantly, I
cannot see how we can support users of such a machine either. How to
analyse a crash report if there is a remote possibility that the CPU
has decided to ignore a number of instructions?

To sum it up, I'm not prepared to approve such a patch until there is
a compelling story for all the interrupts that may trigger such
behaviour.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list