[PATCH v2] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
Marc Zyngier
maz at kernel.org
Wed Mar 15 01:34:52 PDT 2023
On Tue, 14 Mar 2023 13:51:28 +0000,
Shanker Donthineni <sdonthineni at nvidia.com> wrote:
>
> The T241 platform suffers from the T241-FABRIC-4 erratum which causes
> unexpected behavior in the GIC when multiple transactions are received
> simultaneously from different sources. This hardware issue impacts
> NVIDIA server platforms that use more than two T241 chips
> interconnected. Each chip has support for 320 {E}SPIs.
>
> This issue occurs when multiple packets from different GICs are
> incorrectly interleaved at the target chip. The erratum text below
> specifies exactly what can cause multiple transfer packets susceptible
> to interleaving and GIC state corruption. GIC state corruption can
> lead to a range of problems, including kernel panics, and unexpected
> behavior.
>
> From the erratum text:
> "In some cases, inter-socket AXI4 Stream packets with multiple
> transfers, may be interleaved by the fabric when presented to ARM
> Generic Interrupt Controller. GIC expects all transfers of a packet
> to be delivered without any interleaving.
>
> The following GICv3 commands may result in multiple transfer packets
> over inter-socket AXI4 Stream interface:
> - Register reads from GICD_I* and GICD_N*
> - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
> - ITS command MOVALL
>
> Multiple commands in GICv4+ utilize multiple transfer packets,
> including VMOVP, VMOVI and VMAPP.
>
> This issue impacts system configurations with more than 2 sockets,
> that require multi-transfer packets to be sent over inter-socket
> AXI4 Stream interface between GIC instances on different sockets.
> GICv4 cannot be supported. GICv3 SW model can only be supported
> with the workaround. Single and Dual socket configurations are not
> impacted by this issue and support GICv3 and GICv4."
>
> Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
>
> Writing to the chip alias region of the GICD_In{E} registers except
> GICD_ICENABLERn has an equivalent effect as writing to the global
> distributor. The SPI interrupt deactivate path is not impacted by
> the erratum.
>
> To fix this problem, implement a workaround that ensures read accesses
> to the GICD_In{E} registers are directed to the chip that owns the
> SPI, and disables GICv4.x features for KVM. To simplify code changes,
> the gic_configure_irq() function uses the same alias region for both
> read and write operations to GICD_ICFGR.
>
> Co-developed-by: Vikram Sethi <vsethi at nvidia.com>
> Signed-off-by: Vikram Sethi <vsethi at nvidia.com>
> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
> ---
> Changes since v1:
> - Use SMCCC SOC-ID API for detecting the T241 chip
> - Implement Marc's suggestions
> - Edit commit text
>
> Documentation/arm64/silicon-errata.rst | 2 +
> drivers/firmware/smccc/smccc.c | 13 +++
> drivers/irqchip/irq-gic-v3.c | 116 +++++++++++++++++++++++--
> include/linux/arm-smccc.h | 1 +
> 4 files changed, 123 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index ec5f889d76819..e31f6c0687041 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -172,6 +172,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index 60ccf3e90d7de..cb688121270b8 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -17,9 +17,12 @@ static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
>
> bool __ro_after_init smccc_trng_available = false;
> u64 __ro_after_init smccc_has_sve_hint = false;
> +s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
>
> void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> {
> + struct arm_smccc_res res;
> +
> smccc_version = version;
> smccc_conduit = conduit;
>
> @@ -27,6 +30,16 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> smccc_version >= ARM_SMCCC_VERSION_1_3)
> smccc_has_sve_hint = true;
> +
> + if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
> + (smccc_conduit != SMCCC_CONDUIT_NONE)) {
> + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> + ARM_SMCCC_ARCH_SOC_ID, &res);
> + if ((s32)res.a0 >= 0) {
> + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
> + smccc_soc_id_version = (s32)res.a0;
> + }
> + }
Please don't duplicate existing code. There is already the required
infrastructure in drivers/firmware/smccc/soc_id.c. All you need to do
is:
- disassociate the SMCCC probing from the device registration
- probe the SOC_ID early
- add accessors for the relevant data
- select ARM_SMCCC_SOD_ID/ARM_SMCCC_DISCOVERY from the GICv3 Kconfig
> }
>
> enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 997104d4338e7..02c699ce92b12 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -24,6 +24,7 @@
> #include <linux/irqchip/arm-gic-common.h>
> #include <linux/irqchip/arm-gic-v3.h>
> #include <linux/irqchip/irq-partition-percpu.h>
> +#include <linux/arm-smccc.h>
>
> #include <asm/cputype.h>
> #include <asm/exception.h>
> @@ -57,8 +58,14 @@ struct gic_chip_data {
> bool has_rss;
> unsigned int ppi_nr;
> struct partition_desc **ppi_descs;
> + phys_addr_t dist_phys_base;
Place this field next to its mapping.
> };
>
> +
Spurious blank line.
> +#define T241_CHIPS_MAX 4
> +static void __iomem *t241_dist_base_alias[T241_CHIPS_MAX];
Make this __read_mostly.
> +static DEFINE_STATIC_KEY_FALSE(gic_nvidia_t241_erratum);
> +
> static struct gic_chip_data gic_data __read_mostly;
> static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
>
> @@ -188,6 +195,34 @@ static inline bool gic_irq_in_rdist(struct irq_data *d)
> }
> }
>
> +static inline void __iomem *gic_dist_base_alias(irq_hw_number_t intid)
Have this function take a struct irq_data * instead, and only perform
the conversion in the workaround code.
> +{
> + u32 chip;
Move this into the if ().
> +
> + if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
> + /**
> + * {E}SPI mappings for all 4 chips
> + * Chip0 = 32-351
> + * Chip1 = 352-671
> + * Chip2 = 672-991
> + * Chip3 = 4096-4415
> + */
> + switch (__get_intid_range(intid)) {
> + case SPI_RANGE:
> + chip = (intid - 32) / 320;
> + break;
> + case ESPI_RANGE:
> + chip = 3;
> + break;
> + default:
> + unreachable();
> + }
> + return t241_dist_base_alias[chip];
> + }
> +
> + return gic_data.dist_base;
> +}
> +
> static inline void __iomem *gic_dist_base(struct irq_data *d)
> {
> switch (get_intid_range(d)) {
> @@ -343,10 +378,14 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
> offset = convert_offset_index(d, offset, &index);
> mask = 1 << (index % 32);
>
> + /**
> + * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
> + * registers are directed to the chip that owns the SPI.
> + */
Move this into the workaround code, and drop the ** at the beginning
of the comment,
> if (gic_irq_in_rdist(d))
> base = gic_data_rdist_sgi_base();
> else
> - base = gic_data.dist_base;
> + base = gic_dist_base_alias(irqd_to_hwirq(d));
>
> return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask);
> }
> @@ -594,13 +633,17 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
> type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING)
> return -EINVAL;
>
> + /**
> + * For the erratum T241-FABRIC-4, read accesses to the GICD_In{E}
> + * registers are directed to the chip that owns the SPI. Use the
> + * same alias region for GICD_ICFGR writes to simplify code.
> + */
Drop this and merge it with the above comment as required.
> if (gic_irq_in_rdist(d))
> base = gic_data_rdist_sgi_base();
> else
> - base = gic_data.dist_base;
> + base = gic_dist_base_alias(irqd_to_hwirq(d));
>
> offset = convert_offset_index(d, GICD_ICFGR, &index);
> -
Spurious blank line change.
> ret = gic_configure_irq(index, type, base + offset, NULL);
> if (ret && (range == PPI_RANGE || range == EPPI_RANGE)) {
> /* Misconfigured PPIs are usually not fatal */
> @@ -1719,6 +1762,45 @@ static bool gic_enable_quirk_hip06_07(void *data)
> return false;
> }
>
> +#define T241_CHIPN_MASK GENMASK_ULL(45, 44)
> +#define T241_CHIP_GICDA_OFFSET 0x1580000
> +#define SMCCC_SOC_ID_MASK 0x7f7fffff
> +#define SMCCC_SOC_ID_T241 0x036b0241
> +
> +static bool gic_enable_quirk_nvidia_t241(void *data)
> +{
> + unsigned long chip_bmask = 0;
> + phys_addr_t phys;
> + u32 i;
> +
> + /* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
> + if ((smccc_soc_id_version < 0) ||
> + ((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
> + return false;
> + }
> +
> + /* Find the chips based on GICR regions PHYS addr */
> + for (i = 0; i < gic_data.nr_redist_regions; i++) {
> + chip_bmask |= BIT(FIELD_GET(T241_CHIPN_MASK,
> + gic_data.redist_regions[i].phys_base));
> + }
> +
> + if (hweight32(chip_bmask) < 3)
> + return false;
> +
> + /* Setup GICD alias regions */
> + for (i = 0; i < ARRAY_SIZE(t241_dist_base_alias); i++) {
> + if (chip_bmask & BIT(i)) {
> + phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
> + phys |= FIELD_PREP(T241_CHIPN_MASK, i);
> + t241_dist_base_alias[i] = ioremap(phys, SZ_64K);
> + WARN_ON_ONCE(!t241_dist_base_alias[i]);
> + }
> + }
> + static_branch_enable(&gic_nvidia_t241_erratum);
> + return true;
> +}
> +
> static const struct gic_quirk gic_quirks[] = {
> {
> .desc = "GICv3: Qualcomm MSM8996 broken firmware",
> @@ -1750,6 +1832,12 @@ static const struct gic_quirk gic_quirks[] = {
> .mask = 0xe8f00fff,
> .init = gic_enable_quirk_cavium_38539,
> },
> + {
> + .desc = "GICv3: NVIDIA erratum T241-FABRIC-4",
> + .iidr = 0x0402043b,
> + .mask = 0xffffffff,
> + .init = gic_enable_quirk_nvidia_t241,
> + },
> {
> }
> };
> @@ -1821,7 +1909,8 @@ static int __init gic_init_bases(void __iomem *dist_base,
> struct redist_region *rdist_regs,
> u32 nr_redist_regions,
> u64 redist_stride,
> - struct fwnode_handle *handle)
> + struct fwnode_handle *handle,
> + phys_addr_t dist_phys_base)
Pass dist_phys_base as a the first parameter, not the last.
> {
> u32 typer;
> int err;
> @@ -1834,6 +1923,7 @@ static int __init gic_init_bases(void __iomem *dist_base,
>
> gic_data.fwnode = handle;
> gic_data.dist_base = dist_base;
> + gic_data.dist_phys_base = dist_phys_base;
> gic_data.redist_regions = rdist_regs;
> gic_data.nr_redist_regions = nr_redist_regions;
> gic_data.redist_stride = redist_stride;
> @@ -2072,11 +2162,12 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
> void __iomem *dist_base;
> struct redist_region *rdist_regs;
> struct resource res;
> + struct resource dist_res;
Surely you don't need a full struct resource to keep the distributor
base address around...
> u64 redist_stride;
> u32 nr_redist_regions;
> int err, i;
>
> - dist_base = gic_of_iomap(node, 0, "GICD", &res);
> + dist_base = gic_of_iomap(node, 0, "GICD", &dist_res);
> if (IS_ERR(dist_base)) {
> pr_err("%pOF: unable to map gic dist registers\n", node);
> return PTR_ERR(dist_base);
> @@ -2114,7 +2205,7 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
> gic_enable_of_quirks(node, gic_quirks, &gic_data);
>
> err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions,
> - redist_stride, &node->fwnode);
> + redist_stride, &node->fwnode, dist_res.start);
> if (err)
> goto out_unmap_rdist;
>
> @@ -2377,8 +2468,14 @@ static void __init gic_acpi_setup_kvm_info(void)
> vcpu->end = vcpu->start + ACPI_GICV2_VCPU_MEM_SIZE - 1;
> }
>
> - gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> - gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> + /* Disable GICv4.x features for the erratum T241-FABRIC-4 */
> + if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
> + gic_v3_kvm_info.has_v4 = false;
> + gic_v3_kvm_info.has_v4_1 = false;
> + } else {
> + gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> + gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> + }
This is the wrong place to do this. We want to disable *all* the
GICv4.1 features, and not just KVM's view. Specially given that there
are a bunch of fields that are evaluated in the ITS driver.
Something like the change below seems more logical. Maybe you can
salvage direct_lpi from the disaster, but that's about it.
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index fd134e1f481a..a68d33b4523f 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1841,10 +1841,13 @@ static int __init gic_init_bases(void __iomem *dist_base,
gic_data.domain = irq_domain_create_tree(handle, &gic_irq_domain_ops,
&gic_data);
gic_data.rdists.rdist = alloc_percpu(typeof(*gic_data.rdists.rdist));
- gic_data.rdists.has_rvpeid = true;
- gic_data.rdists.has_vlpis = true;
- gic_data.rdists.has_direct_lpi = true;
- gic_data.rdists.has_vpend_valid_dirty = true;
+ if (!static_branch_unlikely(&gic_nvidia_t241_erratum)) {
+ /* Disable GICv4.x features for the erratum T241-FABRIC-4 */
+ gic_data.rdists.has_rvpeid = true;
+ gic_data.rdists.has_vlpis = true;
+ gic_data.rdists.has_direct_lpi = true;
+ gic_data.rdists.has_vpend_valid_dirty = true;
+ }
if (WARN_ON(!gic_data.domain) || WARN_ON(!gic_data.rdists.rdist)) {
err = -ENOMEM;
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list