[PATCH] irqchip/gicv3-its: Add workaround for HIP09/HIP10/HIP10C erratum 162100803/162200807/162400807

Zhou Wang wangzhou1 at hisilicon.com
Tue Sep 9 20:27:15 PDT 2025


On 2025/9/9 21:57, Marc Zyngier wrote:
> On Tue, 09 Sep 2025 12:06:15 +0100,
> Zhou Wang <wangzhou1 at hisilicon.com> wrote:
>>
>> HIP09/HIP10/HIP10C ITS have a problem, an ITS RAS will be reported in
>> some cases when GICv4.1 is enable.
> 
> Do you mean a RAS *error*? Why can't the firmware ignore this event instead?

Not only a RAS error, vSGI will be lost :(  I will add this information.

> 
>> A workaround is that set ITT size to max value(0xf) when doing MAPD with V = 1,
>> and avoid to send MAPD with V = 0 to hardware. Just clear V field in ITS device
>> table and clear ITS cache to implement MAPD with V = 0 instead.
>>
>> Signed-off-by: Zhou Wang <wangzhou1 at hisilicon.com>
>> Reviewed-by: Nianyao Tang <tangnianyao at huawei.com>
>> Reviewed-by: Kunkun Jiang <jiangkunkun at huawei.com>
>> ---
>>  Documentation/arch/arm64/silicon-errata.rst |   6 ++
>>  arch/arm64/Kconfig                          |  13 +++
>>  drivers/irqchip/irq-gic-v3-its.c            | 114 +++++++++++++++++---
>>  3 files changed, 119 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
>> index b18ef4064bc0..dfafc608dc57 100644
>> --- a/Documentation/arch/arm64/silicon-errata.rst
>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>> @@ -264,6 +264,12 @@ stable kernels.
>>  +----------------+-----------------+-----------------+-----------------------------+
>>  | Hisilicon      | Hip09           | #162100801      | HISILICON_ERRATUM_162100801 |
>>  +----------------+-----------------+-----------------+-----------------------------+
>> +| Hisilicon      | Hip09           | #162100803      | HISILICON_ERRATUM_162100803 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> +| Hisilicon      | Hip10           | #162200807      | HISILICON_ERRATUM_162100803 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> +| Hisilicon      | Hip10c          | #162400807      | HISILICON_ERRATUM_162100803 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>>  +----------------+-----------------+-----------------+-----------------------------+
>>  | Qualcomm Tech. | Kryo/Falkor v1  | E1003           | QCOM_FALKOR_ERRATUM_1003    |
>>  +----------------+-----------------+-----------------+-----------------------------+
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index e9bbfacc35a6..803df402c9af 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1270,6 +1270,19 @@ config HISILICON_ERRATUM_162100801
>>  
>>  	  If unsure, say Y.
>>  
>> +config HISILICON_ERRATUM_162100803
>> +	bool "Hip09/10/10c 162100803/162200807/162400807 erratum support"
>> +	default y
>> +	help
>> +	  There is a hardware conflict between vSGI and vLPI, fix it by
>> +	  configure a max ITS ITT size 0xf when doing MAPD with V = 1, clear V
>> +	  field in ITS device table and clear ITS cache to implement MAPD with
>> +	  V = 0 instead. Hip09/10/10c have this same problem, just use
>> +	  HISILICON_ERRATUM_162100803 as the compile macro and
>> +	  ITS_FLAGS_WORKAROUND_HISILICON_162100803 as ITS flag for convenience.
>> +
> 
> I don't think any of the details belong to Kconfig. You don't even
> explain *what* the problem is (hardware conflict doesn't mean
> much). It is also completely unclear what MAPD has to do with vSGI and
> vLPI.
>

The hardware problem is a little tricky, let me try to explain.

In the case of ITS pipeline back-pressure, ITS hardware will mistake vSGI for vLPI,
then use a wrong "eventid" to do ITT size RAS checking, if the "eventid" is larger
than internal cached ITT size, an ITS RAS will be reported and related irq will be
discarded.

So one way to fix this problem is to let the internal cache ITT size to be a very
large value, above problem will not be triggered. Above mistake only happens in
certain step of ITS pipeline, so after above step, ITS hardware still consider the
irq as a vSGI...

Only MAPD can change internal cache ITT size, so we hack MAPD here...

>> +	  If unsure, say Y.
>> +
>>  config QCOM_FALKOR_ERRATUM_1003
>>  	bool "Falkor E1003: Incorrect translation due to ASID change"
>>  	default y
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 467cb78435a9..647bc70cc2f7 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -49,6 +49,7 @@
>>  #define ITS_FLAGS_WORKAROUND_CAVIUM_23144	(1ULL << 2)
>>  #define ITS_FLAGS_FORCE_NON_SHAREABLE		(1ULL << 3)
>>  #define ITS_FLAGS_WORKAROUND_HISILICON_162100801	(1ULL << 4)
>> +#define ITS_FLAGS_WORKAROUND_HISILICON_162100803	(1ULL << 5)
>>  
>>  #define RD_LOCAL_LPI_ENABLED                    BIT(0)
>>  #define RD_LOCAL_PENDTABLE_PREALLOCATED         BIT(1)
>> @@ -716,7 +717,12 @@ static struct its_collection *its_build_mapd_cmd(struct its_node *its,
>>  
>>  	its_encode_cmd(cmd, GITS_CMD_MAPD);
>>  	its_encode_devid(cmd, desc->its_mapd_cmd.dev->device_id);
>> -	its_encode_size(cmd, size - 1);
>> +
>> +	if (its->flags & ITS_FLAGS_WORKAROUND_HISILICON_162100803)
>> +		its_encode_size(cmd, 0xf);
> 
> You are telling the ITS that it is allowed to go and access
> unspecified data (16 bit worth of translations). That's not
> acceptable. If you *have* to do that, then override the size in the
> driver code to actually allocate the corresponding memory.

Then we have to override the ITT memory to 2 ^ 16 * 8 for one device :(

> 
>> +	else
>> +		its_encode_size(cmd, size - 1);
>> +
>>  	its_encode_itt(cmd, itt_addr);
>>  	its_encode_valid(cmd, desc->its_mapd_cmd.valid);
>>  
>> @@ -725,6 +731,61 @@ static struct its_collection *its_build_mapd_cmd(struct its_node *its,
>>  	return NULL;
>>  }
>>  
>> +static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < GITS_BASER_NR_REGS; i++) {
>> +		if (GITS_BASER_TYPE(its->tables[i].val) == type)
>> +			return &its->tables[i];
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static struct its_collection
>> +*its_build_mapd_cmd_hisi_quirk(struct its_node *its, struct its_cmd_block *cmd,
>> +			       struct its_cmd_desc *desc)
>> +{
>> +	struct its_baser *baser = its_get_baser(its, GITS_BASER_TYPE_DEVICE);
>> +	u32 devid = desc->its_mapd_cmd.dev->device_id;
>> +	void *base = baser->base;
>> +	u32 *dt_entry;
>> +	void __iomem *its_func_en = its->sgir_base + 0x80;
>> +	u32 tmp, tmp1, mask = 1 << 19;
>> +	int i = 100;
>> +
>> +	/*
>> +	 * Modify v to 0 in the dt entry of devid, dt entry format as below:
>> +	 * word0: [0-31] ITT address; word1: [0-7] ITT address, [8-12]: ITT
>> +	 * size, [13]: V. Only support flat device table.
> 
> Do you mean that the HW only supports flat tables? Or that you expect
> the kernel to only produce flat device tables?

Our hardware only support flat table, so here only consider flat table.
Will modify the comment to make it clearer.

> 
>> +	 */
>> +	dt_entry = (u32 *)(base + devid * 8 + 4);
>> +	*dt_entry &= ~(1 << 13);
>> +
>> +	dsb(ishst);
> 
> Is this always cacheable memory? Does the ITS always snoop the CPU
> caches? What protects against a concurrent access?

Yes,CPU and ITS is CC for DT table. HW follows the config in GITS_BASER.InnerCache.

> 
>> +
>> +	/*
>> +	 * Cache invalidate by a private register GITS_FUNC_EN, whose offset
>> +	 * is 0x20080 of ITS base address. GICv4.1 already maps sgir_base
>> +	 * (offset is 0x20000), so address of GITS_FUNC_EN can be got by
>> +	 * sgir_base + 0x80. Bit 16 is used to clear DT cache, the flip of
>> +	 * bit 19 indicates that DT cache has been cleared.
>> +	 */
>> +	while (--i) {
>> +		tmp = readl_relaxed(its_func_en) & mask;
>> +		writel_relaxed(tmp | (1 << 16), its_func_en);
>> +		tmp1 = readl_relaxed(its_func_en) & mask;
>> +		if (tmp != tmp1)
>> +			break;
>> +	}
> 
> Please define these bits so that they have names instead of using raw
> values. Why do you need to write bit 16 every time? Surely once you
> have written it *once*, the HW remember what it has been told to do?

HW will not remember it, so we have to write bit 16 every time.

> 
> And frankly, using readl_relaxed_poll_timeout_atomic should be the
> right thing to do.

The HW logic is that write bit 16 will trigger DT cache clear in one HW cycle
or not. NOT that waiting bit 19 to flip.

> 
>> +
>> +	if (i == 0)
>> +		WARN_ON(1);
>> +
>> +	return NULL;
>> +}
>> +
>>  static struct its_collection *its_build_mapc_cmd(struct its_node *its,
>>  						 struct its_cmd_block *cmd,
>>  						 struct its_cmd_desc *desc)
>> @@ -1306,11 +1367,18 @@ static void its_send_inv(struct its_device *dev, u32 event_id)
>>  static void its_send_mapd(struct its_device *dev, int valid)
>>  {
>>  	struct its_cmd_desc desc;
>> +	its_cmd_builder_t fn;
>> +
>> +	if (dev->its->flags & ITS_FLAGS_WORKAROUND_HISILICON_162100803 &&
>> +	    valid == 0)
>> +		fn = its_build_mapd_cmd_hisi_quirk;
> 
> Make the new helper handle both values of the valid bit.

OK.

> 
>> +	else
>> +		fn = its_build_mapd_cmd;
>>  
>>  	desc.its_mapd_cmd.dev = dev;
>>  	desc.its_mapd_cmd.valid = !!valid;
>>  
>> -	its_send_single_command(dev->its, its_build_mapd_cmd, &desc);
>> +	its_send_single_command(dev->its, fn, &desc);
>>  }
>>  
>>  static void its_send_mapc(struct its_node *its, struct its_collection *col,
>> @@ -3354,18 +3422,6 @@ static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
>>  	return its_dev;
>>  }
>>  
>> -static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>> -{
>> -	int i;
>> -
>> -	for (i = 0; i < GITS_BASER_NR_REGS; i++) {
>> -		if (GITS_BASER_TYPE(its->tables[i].val) == type)
>> -			return &its->tables[i];
>> -	}
>> -
>> -	return NULL;
>> -}
>> -
>>  static bool its_alloc_table_entry(struct its_node *its,
>>  				  struct its_baser *baser, u32 id)
>>  {
>> @@ -4902,6 +4958,16 @@ static bool __maybe_unused its_enable_rk3568002(void *data)
>>  	return true;
>>  }
>>  
>> +static bool __maybe_unused its_enable_quirk_162100803(void *data)
>> +{
>> +	struct its_node *its = data;
>> +
>> +	if (is_v4_1(its))
>> +		its->flags |= ITS_FLAGS_WORKAROUND_HISILICON_162100803;
>> +
>> +	return true;
>> +}
>> +
>>  static const struct gic_quirk its_quirks[] = {
>>  #ifdef CONFIG_CAVIUM_ERRATUM_22375
>>  	{
>> @@ -4956,6 +5022,26 @@ static const struct gic_quirk its_quirks[] = {
>>  		.init	= its_enable_quirk_hip09_162100801,
>>  	},
>>  #endif
>> +#ifdef CONFIG_HISILICON_ERRATUM_162100803
>> +	{
>> +		.desc = "ITS: Hip09 erratum 162100803",
>> +		.iidr = 0x00051736,
>> +		.mask = 0xffffffff,
>> +		.init = its_enable_quirk_162100803,
>> +	},
>> +	{
>> +		.desc = "ITS: Hip10 erratum 162200807",
>> +		.iidr = 0x01051736,
>> +		.mask = 0xffffffff,
>> +		.init = its_enable_quirk_162100803,
>> +	},
>> +	{
>> +		.desc = "ITS: Hip10c erratum 162400807",
>> +		.iidr = 0x00061736,
>> +		.mask = 0xffffffff,
>> +		.init = its_enable_quirk_162100803,
> 
> Can you try to merge these three entries and adjust the mask
> accordingly? mask=0xeffcffff should do the trick, assuming that you
> don't have any other hardware overlapping this.

The iidrs are different, how to merge these entries?

> 
>> +	},
>> +#endif
>>  #ifdef CONFIG_ROCKCHIP_ERRATUM_3588001
>>  	{
>>  		.desc   = "ITS: Rockchip erratum RK3588001",
> 
> If the HW is this bad, and that this only affects virtual interrupts,
> why don't you simply disable GICv4 support? Messing with the internals
> of the ITS tables feels like a pretty bad idea. Or as I suggested
> above, get the firmware to ignore RAS events from the ITS.

The hack is ugly, but we want to try to save vSGI direct injection...

Best,
Zhou

> 
> 	M.
> 



More information about the linux-arm-kernel mailing list