[PATCH] irqchip/gicv3-its: Add workaround for HIP09/HIP10/HIP10C erratum 162100803/162200807/162400807

Marc Zyngier maz at kernel.org
Mon Sep 15 05:23:52 PDT 2025


On Wed, 10 Sep 2025 04:27:15 +0100,
Zhou Wang <wangzhou1 at hisilicon.com> wrote:
> 
> On 2025/9/9 21:57, Marc Zyngier wrote:
> > On Tue, 09 Sep 2025 12:06:15 +0100,
> > Zhou Wang <wangzhou1 at hisilicon.com> wrote:
> >>
> >> HIP09/HIP10/HIP10C ITS have a problem, an ITS RAS will be reported in
> >> some cases when GICv4.1 is enable.
> > 
> > Do you mean a RAS *error*? Why can't the firmware ignore this event instead?
> 
> Not only a RAS error, vSGI will be lost :(  I will add this information.
> 
> > 
> >> A workaround is that set ITT size to max value(0xf) when doing MAPD with V = 1,
> >> and avoid to send MAPD with V = 0 to hardware. Just clear V field in ITS device
> >> table and clear ITS cache to implement MAPD with V = 0 instead.
> >>
> >> Signed-off-by: Zhou Wang <wangzhou1 at hisilicon.com>
> >> Reviewed-by: Nianyao Tang <tangnianyao at huawei.com>
> >> Reviewed-by: Kunkun Jiang <jiangkunkun at huawei.com>
> >> ---
> >>  Documentation/arch/arm64/silicon-errata.rst |   6 ++
> >>  arch/arm64/Kconfig                          |  13 +++
> >>  drivers/irqchip/irq-gic-v3-its.c            | 114 +++++++++++++++++---
> >>  3 files changed, 119 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> >> index b18ef4064bc0..dfafc608dc57 100644
> >> --- a/Documentation/arch/arm64/silicon-errata.rst
> >> +++ b/Documentation/arch/arm64/silicon-errata.rst
> >> @@ -264,6 +264,12 @@ stable kernels.
> >>  +----------------+-----------------+-----------------+-----------------------------+
> >>  | Hisilicon      | Hip09           | #162100801      | HISILICON_ERRATUM_162100801 |
> >>  +----------------+-----------------+-----------------+-----------------------------+
> >> +| Hisilicon      | Hip09           | #162100803      | HISILICON_ERRATUM_162100803 |
> >> ++----------------+-----------------+-----------------+-----------------------------+
> >> +| Hisilicon      | Hip10           | #162200807      | HISILICON_ERRATUM_162100803 |
> >> ++----------------+-----------------+-----------------+-----------------------------+
> >> +| Hisilicon      | Hip10c          | #162400807      | HISILICON_ERRATUM_162100803 |
> >> ++----------------+-----------------+-----------------+-----------------------------+
> >>  +----------------+-----------------+-----------------+-----------------------------+
> >>  | Qualcomm Tech. | Kryo/Falkor v1  | E1003           | QCOM_FALKOR_ERRATUM_1003    |
> >>  +----------------+-----------------+-----------------+-----------------------------+
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index e9bbfacc35a6..803df402c9af 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -1270,6 +1270,19 @@ config HISILICON_ERRATUM_162100801
> >>  
> >>  	  If unsure, say Y.
> >>  
> >> +config HISILICON_ERRATUM_162100803
> >> +	bool "Hip09/10/10c 162100803/162200807/162400807 erratum support"
> >> +	default y
> >> +	help
> >> +	  There is a hardware conflict between vSGI and vLPI, fix it by
> >> +	  configure a max ITS ITT size 0xf when doing MAPD with V = 1, clear V
> >> +	  field in ITS device table and clear ITS cache to implement MAPD with
> >> +	  V = 0 instead. Hip09/10/10c have this same problem, just use
> >> +	  HISILICON_ERRATUM_162100803 as the compile macro and
> >> +	  ITS_FLAGS_WORKAROUND_HISILICON_162100803 as ITS flag for convenience.
> >> +
> > 
> > I don't think any of the details belong to Kconfig. You don't even
> > explain *what* the problem is (hardware conflict doesn't mean
> > much). It is also completely unclear what MAPD has to do with vSGI and
> > vLPI.
> >
> 
> The hardware problem is a little tricky, let me try to explain.
> 
> In the case of ITS pipeline back-pressure, ITS hardware will mistake vSGI for vLPI,
> then use a wrong "eventid" to do ITT size RAS checking, if the "eventid" is larger
> than internal cached ITT size, an ITS RAS will be reported and related irq will be
> discarded.
> 
> So one way to fix this problem is to let the internal cache ITT size to be a very
> large value, above problem will not be triggered. Above mistake only happens in
> certain step of ITS pipeline, so after above step, ITS hardware still consider the
> irq as a vSGI...
> 
> Only MAPD can change internal cache ITT size, so we hack MAPD here...
> 
> >> +	  If unsure, say Y.
> >> +
> >>  config QCOM_FALKOR_ERRATUM_1003
> >>  	bool "Falkor E1003: Incorrect translation due to ASID change"
> >>  	default y
> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> >> index 467cb78435a9..647bc70cc2f7 100644
> >> --- a/drivers/irqchip/irq-gic-v3-its.c
> >> +++ b/drivers/irqchip/irq-gic-v3-its.c
> >> @@ -49,6 +49,7 @@
> >>  #define ITS_FLAGS_WORKAROUND_CAVIUM_23144	(1ULL << 2)
> >>  #define ITS_FLAGS_FORCE_NON_SHAREABLE		(1ULL << 3)
> >>  #define ITS_FLAGS_WORKAROUND_HISILICON_162100801	(1ULL << 4)
> >> +#define ITS_FLAGS_WORKAROUND_HISILICON_162100803	(1ULL << 5)
> >>  
> >>  #define RD_LOCAL_LPI_ENABLED                    BIT(0)
> >>  #define RD_LOCAL_PENDTABLE_PREALLOCATED         BIT(1)
> >> @@ -716,7 +717,12 @@ static struct its_collection *its_build_mapd_cmd(struct its_node *its,
> >>  
> >>  	its_encode_cmd(cmd, GITS_CMD_MAPD);
> >>  	its_encode_devid(cmd, desc->its_mapd_cmd.dev->device_id);
> >> -	its_encode_size(cmd, size - 1);
> >> +
> >> +	if (its->flags & ITS_FLAGS_WORKAROUND_HISILICON_162100803)
> >> +		its_encode_size(cmd, 0xf);
> > 
> > You are telling the ITS that it is allowed to go and access
> > unspecified data (16 bit worth of translations). That's not
> > acceptable. If you *have* to do that, then override the size in the
> > driver code to actually allocate the corresponding memory.
> 
> Then we have to override the ITT memory to 2 ^ 16 * 8 for one device
> :(

But what's the alternative? Letting the ITS speculate in random
memory, possibly under the control of userspace or a guest? I don't
think that's acceptable.

[...]

> >> +	/*
> >> +	 * Cache invalidate by a private register GITS_FUNC_EN, whose offset
> >> +	 * is 0x20080 of ITS base address. GICv4.1 already maps sgir_base
> >> +	 * (offset is 0x20000), so address of GITS_FUNC_EN can be got by
> >> +	 * sgir_base + 0x80. Bit 16 is used to clear DT cache, the flip of
> >> +	 * bit 19 indicates that DT cache has been cleared.
> >> +	 */
> >> +	while (--i) {
> >> +		tmp = readl_relaxed(its_func_en) & mask;
> >> +		writel_relaxed(tmp | (1 << 16), its_func_en);
> >> +		tmp1 = readl_relaxed(its_func_en) & mask;
> >> +		if (tmp != tmp1)
> >> +			break;
> >> +	}
> > 
> > Please define these bits so that they have names instead of using raw
> > values. Why do you need to write bit 16 every time? Surely once you
> > have written it *once*, the HW remember what it has been told to do?
> 
> HW will not remember it, so we have to write bit 16 every time.
> 
> > 
> > And frankly, using readl_relaxed_poll_timeout_atomic should be the
> > right thing to do.
> 
> The HW logic is that write bit 16 will trigger DT cache clear in one HW cycle
> or not. NOT that waiting bit 19 to flip.

Huh. This sounds crazy... Please document this.

[...]

> >> +#ifdef CONFIG_HISILICON_ERRATUM_162100803
> >> +	{
> >> +		.desc = "ITS: Hip09 erratum 162100803",
> >> +		.iidr = 0x00051736,
> >> +		.mask = 0xffffffff,
> >> +		.init = its_enable_quirk_162100803,
> >> +	},
> >> +	{
> >> +		.desc = "ITS: Hip10 erratum 162200807",
> >> +		.iidr = 0x01051736,
> >> +		.mask = 0xffffffff,
> >> +		.init = its_enable_quirk_162100803,
> >> +	},
> >> +	{
> >> +		.desc = "ITS: Hip10c erratum 162400807",
> >> +		.iidr = 0x00061736,
> >> +		.mask = 0xffffffff,
> >> +		.init = its_enable_quirk_162100803,
> > 
> > Can you try to merge these three entries and adjust the mask
> > accordingly? mask=0xeffcffff should do the trick, assuming that you
> > don't have any other hardware overlapping this.
> 
> The iidrs are different, how to merge these entries?

By using the mask:

	{
		.desc = "ITS: Hip10 erratum, will eat your vSGIs",
		.iidr = 0x01051736,
		.mask = 0xeffcffff,
		.init = its_enable_quirk_162100803,
	},

will match all three platforms, assuming I got the mask correctly.

> 
> > 
> >> +	},
> >> +#endif
> >>  #ifdef CONFIG_ROCKCHIP_ERRATUM_3588001
> >>  	{
> >>  		.desc   = "ITS: Rockchip erratum RK3588001",
> > 
> > If the HW is this bad, and that this only affects virtual interrupts,
> > why don't you simply disable GICv4 support? Messing with the internals
> > of the ITS tables feels like a pretty bad idea. Or as I suggested
> > above, get the firmware to ignore RAS events from the ITS.
> 
> The hack is ugly, but we want to try to save vSGI direct
> injection...

Well, if you think this is worth it...

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list