[PATCH] ARM64: errata: Add workaround for HIP10/HIP10C erratum 162200803

Tue Jul 1 01:14:17 PDT 2025

On Fri, 27 Jun 2025 07:36:31 +0100,
Zhou Wang <wangzhou1 at hisilicon.com> wrote:
> 
> On 2025/6/26 21:27, Marc Zyngier wrote:
> > On Thu, 26 Jun 2025 13:41:42 +0100,
> > Zhou Wang <wangzhou1 at hisilicon.com> wrote:
> >>
> >> For GICv4.0 of Hip10 and Hip10C, it has a SoC bug with vPE schedule:
> >> when multiple vPEs are sending vpe schedule/deschedule commands
> >> concurrently and repeatedly, some vPE schedule command may not be
> >> scheduled, and it will cause the command timeout.
> >>
> >> The hardware implementation is that there is one GIC hardware in one CPU die,
> >> which handles all vPE schedule operations one by one in all CPUs of this die.
> >> The bug is that if the number of queued vPE schedule operations is more
> >> than a certain value, the last vPE schedule operation will be lost.
> >>
> >> One possible way to solve this problem is to limit the number of vLPIs, so
> >> the hardware could spend less time to scan virtual pending table when it
> >> handles the vPE schedule operations, so the queued vPE schedule operations
> >> will never be more than above certain value.
> >>
> >> Given the number of CPUs of die, and imagine there is 100 vPE schedule
> >> operations per second one CPU, it can be calculated that we can limit
> >> the number of vLPI to 4096 for virtual machine to avoid the issue.
> >>
> >> Signed-off-by: Zhou Wang <wangzhou1 at hisilicon.com>
> >> ---
> >>  Documentation/arch/arm64/silicon-errata.rst |  2 ++
> >>  arch/arm64/Kconfig                          | 12 ++++++++++++
> >>  arch/arm64/include/asm/cputype.h            |  4 ++++
> >>  arch/arm64/kernel/cpu_errata.c              | 15 +++++++++++++++
> >>  arch/arm64/kvm/vgic/vgic-mmio-v3.c          |  5 +++++
> >>  arch/arm64/tools/cpucaps                    |  1 +
> >>  include/linux/irqchip/arm-gic-v3.h          |  1 +
> >>  7 files changed, 40 insertions(+)
> >>
> > 
> > [...]
> > 
> >> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> index ae4c0593d114..495a56e9dc4b 100644
> >> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> @@ -81,6 +81,11 @@ static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
> >>  		if (vgic_has_its(vcpu->kvm)) {
> >>  			value |= (INTERRUPT_ID_BITS_ITS - 1) << 19;
> >>  			value |= GICD_TYPER_LPIS;
> >> +			/* Limit the number of vlpis to 4096 */
> >> +			if (cpus_have_final_cap(ARM64_WORKAROUND_HISI_162200803) &&
> >> +			    kvm_vgic_global_state.has_gicv4 &&
> >> +			    !kvm_vgic_global_state.has_gicv4_1)
> >> +				value |= 11 << GICD_TYPER_NUM_LPIS_SHIFT;
> > 
> > This really doesn't solve your problem. Yes, the guest *may* honor
> > this limit. But KVM doesn't care and will happily allocate 2^16 vLPIs
> > if the guest asks -- there is no code enforcing this limit.
> 
> Hi Marc,
> 
> I am not sure if there is any other place guest can ask vLPI over
> the limitation except for MAPTI/MAPT below?
>
> > And even if we did. What would we do on a MAPTI command that tries to
> > map a vLPI outside of the allowed range? Do we need to tell the guest
> > it has screwed up?
> 
> Thanks for pointing this. Yes, we miss the lpi_nr checking in vgic_its_cmd_handle_mapi.
> In fact, the fix of this errata introduces the usage of GICD.num_LPI,
> so we need make related logic right as well.

Exactly.

> 
> I am not sure that if we could add related checking for lpi_nr in MAPTI/MAPI
> as part of this errata fix, or we should add the basic support for
> GICD.num_LPI before adding this errata?

You definitely need to handle that before allowing such limit to be
enforced. Which also means allowing the limit to be saved/restored
from userspace in order to support migration.

I was really hoping to never have to support this thing (it really is
terrible), but if we have to introduce and honor it for correctness
reasons, then it has to be fully supported.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.