[PATCH v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4

Wed Mar 15 01:34:52 PDT 2023

On Tue, 14 Mar 2023 13:51:28 +0000,
Shanker Donthineni <sdonthineni at nvidia.com> wrote:
> 
> The T241 platform suffers from the T241-FABRIC-4 erratum which causes
> unexpected behavior in the GIC when multiple transactions are received
> simultaneously from different sources. This hardware issue impacts
> NVIDIA server platforms that use more than two T241 chips
> interconnected. Each chip has support for 320 {E}SPIs.
> 
> This issue occurs when multiple packets from different GICs are
> incorrectly interleaved at the target chip. The erratum text below
> specifies exactly what can cause multiple transfer packets susceptible
> to interleaving and GIC state corruption. GIC state corruption can
> lead to a range of problems, including kernel panics, and unexpected
> behavior.
> 
> From the erratum text:
>   "In some cases, inter-socket AXI4 Stream packets with multiple
>   transfers, may be interleaved by the fabric when presented to ARM
>   Generic Interrupt Controller. GIC expects all transfers of a packet
>   to be delivered without any interleaving.
> 
>   The following GICv3 commands may result in multiple transfer packets
>   over inter-socket AXI4 Stream interface:
>    - Register reads from GICD_I* and GICD_N*
>    - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
>    - ITS command MOVALL
> 
>   Multiple commands in GICv4+ utilize multiple transfer packets,
>   including VMOVP, VMOVI and VMAPP.
> 
>   This issue impacts system configurations with more than 2 sockets,
>   that require multi-transfer packets to be sent over inter-socket
>   AXI4 Stream interface between GIC instances on different sockets.
>   GICv4 cannot be supported. GICv3 SW model can only be supported
>   with the workaround. Single and Dual socket configurations are not
>   impacted by this issue and support GICv3 and GICv4."
> 
> Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
> 
> Writing to the chip alias region of the GICD_In{E} registers except
> GICD_ICENABLERn has an equivalent effect as writing to the global
> distributor. The SPI interrupt deactivate path is not impacted by
> the erratum.
> 
> To fix this problem, implement a workaround that ensures read accesses
> to the GICD_In{E} registers are directed to the chip that owns the
> SPI, and disables GICv4.x features for KVM. To simplify code changes,
> the gic_configure_irq() function uses the same alias region for both
> read and write operations to GICD_ICFGR.
> 
> Co-developed-by: Vikram Sethi <vsethi at nvidia.com>
> Signed-off-by: Vikram Sethi <vsethi at nvidia.com>
> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
> ---
> Changes since v1:
>  - Use SMCCC SOC-ID API for detecting the T241 chip
>  - Implement Marc's suggestions
>  - Edit commit text
> 
>  Documentation/arm64/silicon-errata.rst |   2 +
>  drivers/firmware/smccc/smccc.c         |  13 +++
>  drivers/irqchip/irq-gic-v3.c           | 116 +++++++++++++++++++++++--
>  include/linux/arm-smccc.h              |   1 +
>  4 files changed, 123 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index ec5f889d76819..e31f6c0687041 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -172,6 +172,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
> ++----------------+-----------------+-----------------+-----------------------------+
>  +----------------+-----------------+-----------------+-----------------------------+
>  | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
>  +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index 60ccf3e90d7de..cb688121270b8 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -17,9 +17,12 @@ static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
>  
>  bool __ro_after_init smccc_trng_available = false;
>  u64 __ro_after_init smccc_has_sve_hint = false;
> +s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
>  
>  void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
>  {
> +	struct arm_smccc_res res;
> +
>  	smccc_version = version;
>  	smccc_conduit = conduit;
>  
> @@ -27,6 +30,16 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
>  	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
>  	    smccc_version >= ARM_SMCCC_VERSION_1_3)
>  		smccc_has_sve_hint = true;
> +
> +	if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
> +	    (smccc_conduit != SMCCC_CONDUIT_NONE)) {
> +		arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +				     ARM_SMCCC_ARCH_SOC_ID, &res);
> +		if ((s32)res.a0 >= 0) {
> +			arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
> +			smccc_soc_id_version = (s32)res.a0;
> +		}
> +	}

Please don't duplicate existing code. There is already the required
infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
is:

- disassociate the SMCCC probing from the device registration

- probe the SOC_ID early

- add accessors for the relevant data

- select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig

>  }
>  
>  enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 997104d4338e7..02c699ce92b12 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -24,6 +24,7 @@
>  #include <linux/irqchip/arm-gic-common.h>
>  #include <linux/irqchip/arm-gic-v3.h>
>  #include <linux/irqchip/irq-partition-percpu.h>
> +#include <linux/arm-smccc.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/exception.h>
> @@ -57,8 +58,14 @@ struct gic_chip_data {
>  	bool			has_rss;
>  	unsigned int		ppi_nr;
>  	struct partition_desc	**ppi_descs;
> +	phys_addr_t		dist_phys_base;

Place this field next to its mapping.

>  };
>  
> +

Spurious blank line.

> +#define T241_CHIPS_MAX		4
> +static void __iomem *t241_dist_base_alias[T241_CHIPS_MAX];

Make this __read_mostly.

> +static DEFINE_STATIC_KEY_FALSE(gic_nvidia_t241_erratum);
> +
>  static struct gic_chip_data gic_data __read_mostly;
>  static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
>  
> @@ -188,6 +195,34 @@ static inline bool gic_irq_in_rdist(struct irq_data *d)
>  	}
>  }
>  
> +static inline void __iomem *gic_dist_base_alias(irq_hw_number_t intid)

Have this function take a struct irq_data * instead, and only perform
the conversion in the workaround code.

> +{
> +	u32 chip;

Move this into the if ().

> +
> +	if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
> +		/**
> +		 *  {E}SPI mappings for all 4 chips
> +		 *    Chip0 = 32-351
> +		 *    Chip1 = 352-671
> +		 *    Chip2 = 672-991
> +		 *    Chip3 = 4096-4415
> +		 */
> +		switch (__get_intid_range(intid)) {
> +		case SPI_RANGE:
> +			chip = (intid - 32) / 320;
> +			break;
> +		case ESPI_RANGE:
> +			chip = 3;
> +			break;
> +		default:
> +			unreachable();
> +		}
> +		return t241_dist_base_alias[chip];
> +	}
> +
> +	return gic_data.dist_base;
> +}
> +
>  static inline void __iomem *gic_dist_base(struct irq_data *d)
>  {
>  	switch (get_intid_range(d)) {
> @@ -343,10 +378,14 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
>  	offset = convert_offset_index(d, offset, &index);
>  	mask = 1 << (index % 32);
>  
> +	/**
> +	 * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
> +	 * registers are directed to the chip that owns the SPI.
> +	 */

Move this into the workaround code, and drop the ** at the beginning
of the comment,

>  	if (gic_irq_in_rdist(d))
>  		base = gic_data_rdist_sgi_base();
>  	else
> -		base = gic_data.dist_base;
> +		base = gic_dist_base_alias(irqd_to_hwirq(d));
>  
>  	return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask);
>  }
> @@ -594,13 +633,17 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
>  	    type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING)
>  		return -EINVAL;
>  
> +	/**
> +	 * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
> +	 * registers are directed to the chip that owns the SPI. Use the
> +	 * same alias region for GICD_ICFGR writes to simplify code.
> +	 */

Drop this and merge it with the above comment as required.

>  	if (gic_irq_in_rdist(d))
>  		base = gic_data_rdist_sgi_base();
>  	else
> -		base = gic_data.dist_base;
> +		base = gic_dist_base_alias(irqd_to_hwirq(d));
>  
>  	offset = convert_offset_index(d, GICD_ICFGR, &index);
> -

Spurious blank line change.

>  	ret = gic_configure_irq(index, type, base + offset, NULL);
>  	if (ret && (range == PPI_RANGE || range == EPPI_RANGE)) {
>  		/* Misconfigured PPIs are usually not fatal */
> @@ -1719,6 +1762,45 @@ static bool gic_enable_quirk_hip06_07(void *data)
>  	return false;
>  }
>  
> +#define T241_CHIPN_MASK		GENMASK_ULL(45, 44)
> +#define T241_CHIP_GICDA_OFFSET	0x1580000
> +#define SMCCC_SOC_ID_MASK	0x7f7fffff
> +#define SMCCC_SOC_ID_T241	0x036b0241
> +
> +static bool gic_enable_quirk_nvidia_t241(void *data)
> +{
> +	unsigned long chip_bmask = 0;
> +	phys_addr_t phys;
> +	u32 i;
> +
> +	/* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
> +	if ((smccc_soc_id_version < 0) ||
> +	    ((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
> +		return false;
> +	}
> +
> +	/* Find the chips based on GICR regions PHYS addr */
> +	for (i = 0; i < gic_data.nr_redist_regions; i++) {
> +		chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
> +				  gic_data.redist_regions[i].phys_base));
> +	}
> +
> +	if (hweight32(chip_bmask) < 3)
> +		return false;
> +
> +	/* Setup GICD alias regions */
> +	for (i = 0; i < ARRAY_SIZE(t241_dist_base_alias); i++) {
> +		if (chip_bmask & BIT(i)) {
> +			phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
> +			phys |= FIELD_PREP(T241_CHIPN_MASK, i);
> +			t241_dist_base_alias[i] = ioremap(phys, SZ_64K);
> +			WARN_ON_ONCE(!t241_dist_base_alias[i]);
> +		}
> +	}
> +	static_branch_enable(&gic_nvidia_t241_erratum);
> +	return true;
> +}
> +
>  static const struct gic_quirk gic_quirks[] = {
>  	{
>  		.desc	= "GICv3: Qualcomm MSM8996 broken firmware",
> @@ -1750,6 +1832,12 @@ static const struct gic_quirk gic_quirks[] = {
>  		.mask	= 0xe8f00fff,
>  		.init	= gic_enable_quirk_cavium_38539,
>  	},
> +	{
> +		.desc	= "GICv3: NVIDIA erratum T241-FABRIC-4",
> +		.iidr	= 0x0402043b,
> +		.mask	= 0xffffffff,
> +		.init	= gic_enable_quirk_nvidia_t241,
> +	},
>  	{
>  	}
>  };
> @@ -1821,7 +1909,8 @@ static int __init gic_init_bases(void __iomem *dist_base,
>  				 struct redist_region *rdist_regs,
>  				 u32 nr_redist_regions,
>  				 u64 redist_stride,
> -				 struct fwnode_handle *handle)
> +				 struct fwnode_handle *handle,
> +				 phys_addr_t dist_phys_base)

Pass dist_phys_base as a the first parameter, not the last.

>  {
>  	u32 typer;
>  	int err;
> @@ -1834,6 +1923,7 @@ static int __init gic_init_bases(void __iomem *dist_base,
>  
>  	gic_data.fwnode = handle;
>  	gic_data.dist_base = dist_base;
> +	gic_data.dist_phys_base = dist_phys_base;
>  	gic_data.redist_regions = rdist_regs;
>  	gic_data.nr_redist_regions = nr_redist_regions;
>  	gic_data.redist_stride = redist_stride;
> @@ -2072,11 +2162,12 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
>  	void __iomem *dist_base;
>  	struct redist_region *rdist_regs;
>  	struct resource res;
> +	struct resource dist_res;

Surely you don't need a full struct resource to keep the distributor
base address around...

>  	u64 redist_stride;
>  	u32 nr_redist_regions;
>  	int err, i;
>  
> -	dist_base = gic_of_iomap(node, 0, "GICD", &res);
> +	dist_base = gic_of_iomap(node, 0, "GICD", &dist_res);
>  	if (IS_ERR(dist_base)) {
>  		pr_err("%pOF: unable to map gic dist registers\n", node);
>  		return PTR_ERR(dist_base);
> @@ -2114,7 +2205,7 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
>  	gic_enable_of_quirks(node, gic_quirks, &gic_data);
>  
>  	err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions,
> -			     redist_stride, &node->fwnode);
> +			     redist_stride, &node->fwnode, dist_res.start);
>  	if (err)
>  		goto out_unmap_rdist;
>  
> @@ -2377,8 +2468,14 @@ static void __init gic_acpi_setup_kvm_info(void)
>  		vcpu->end = vcpu->start + ACPI_GICV2_VCPU_MEM_SIZE - 1;
>  	}
>  
> -	gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> -	gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> +	/* Disable GICv4.x features for the erratum T241-FABRIC-4 */
> +	if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
> +		gic_v3_kvm_info.has_v4 = false;
> +		gic_v3_kvm_info.has_v4_1 = false;
> +	} else {
> +		gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> +		gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> +	}

This is the wrong place to do this. We want to disable *all* the
GICv4.1 features, and not just KVM's view. Specially given that there
are a bunch of fields that are evaluated in the ITS driver.

Something like the change below seems more logical. Maybe you can
salvage direct_lpi from the disaster, but that's about it.

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index fd134e1f481a..a68d33b4523f 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1841,10 +1841,13 @@ static int __init gic_init_bases(void __iomem *dist_base,
 	gic_data.domain = irq_domain_create_tree(handle, &gic_irq_domain_ops,
 						 &gic_data);
 	gic_data.rdists.rdist = alloc_percpu(typeof(*gic_data.rdists.rdist));
-	gic_data.rdists.has_rvpeid = true;
-	gic_data.rdists.has_vlpis = true;
-	gic_data.rdists.has_direct_lpi = true;
-	gic_data.rdists.has_vpend_valid_dirty = true;
+	if (!static_branch_unlikely(&gic_nvidia_t241_erratum)) {
+		/* Disable GICv4.x features for the erratum T241-FABRIC-4 */
+		gic_data.rdists.has_rvpeid = true;
+		gic_data.rdists.has_vlpis = true;
+		gic_data.rdists.has_direct_lpi = true;
+		gic_data.rdists.has_vpend_valid_dirty = true;
+	}
 
 	if (WARN_ON(!gic_data.domain) || WARN_ON(!gic_data.rdists.rdist)) {
 		err = -ENOMEM;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.