[PATCH] arm64: traps: clean up arm64_ras_serror_get_severity()

James Morse james.morse at arm.com
Tue Aug 25 10:51:47 EDT 2020


Hi Zhang,

On 12/08/2020 12:09, Liguang Zhang wrote:
> Function arm64_is_fatal_ras_serror() is always called after
> arm64_is_ras_serror(), so we should remove some needless
> arm64_is_ras_serror() call in function arm64_ras_serror_get_severity().

> diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
> index cee5928e1b7d..287b4d64dc67 100644
> --- a/arch/arm64/include/asm/traps.h
> +++ b/arch/arm64/include/asm/traps.h
> @@ -79,13 +79,6 @@ static inline bool arm64_is_ras_serror(u32 esr)
>   */
>  static inline u32 arm64_ras_serror_get_severity(u32 esr)
>  {
> -	u32 aet = esr & ESR_ELx_AET;
> -
> -	if (!arm64_is_ras_serror(esr)) {
> -		/* Not a RAS error, we can't interpret the ESR. */
> -		return ESR_ELx_AET_UC;
> -	}

I agree this can go, it looks like I had it here as a sanity check while the KVM bits were
sorted out.

Please also remove the comment that says it does this:
| * Non-RAS SError's are reported as Uncontained/Uncategorized.

This becomes the callers problem.


>  	/*
>  	 * AET is RES0 if 'the value returned in the DFSC field is not
>  	 * [ESR_ELx_FSC_SERROR]'

> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 13ebd5ca2070..635d4cca0a4b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -913,7 +913,7 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
>  	case ESR_ELx_AET_UC:	/* Uncontainable or Uncategorized error */
>  	default:
>  		/* Error has been silently propagated */
> -		arm64_serror_panic(regs, esr);
> +		return true;

KVM depends on this, please don't remove it.

What does 'fatal' mean?
To the arch code it means panic(), as we don't (yet) have the information to fix the
error. But to KVM 'fatal' means kill-the-guest. KVM does this as without user-space's
involvement, there is very little else it can do.

KVM can only do this if the error is contained. As it must have been contained by stage2,
so the host can keep running. But if the error was reported as uncontained, KVM would need
to panic() the host.

(An example of an Uncontained error is a store that went to the wrong address due to
corruption that wasn't caught in time. We can't trust any value in memory once we've seen
one of these.)


I agree it looks funny, but it was simpler for the arch code helper to do this, instead of
having an 'arm64_is_uncontained_ras_serror(), as now you'd always have to check three things.


>  	}
>  }


Thanks,

James



More information about the linux-arm-kernel mailing list