[PATCH] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
Shanker Donthineni
sdonthineni at nvidia.com
Sun Mar 12 20:47:57 PDT 2023
Thanks Marc for a quick and valuable feedback.
On 3/12/23 08:43, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
>
>
> On Fri, 10 Mar 2023 14:16:34 +0000,
> Shanker Donthineni <sdonthineni at nvidia.com> wrote:
>>
>> Hi Marc,
>>
>> On 3/7/23 02:32, Marc Zyngier wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, 06 Mar 2023 01:31:48 +0000,
>>> Shanker Donthineni <sdonthineni at nvidia.com> wrote:
>>>>
>>>> The purpose of this patch is to address the T241 erratum T241-FABRIC-4,
>>>> which causes unexpected behavior in the GIC when multiple transactions
>>>
>>> nit: "The purpose of this patch" is superfluous. Instead, write
>>> something like:
>>>
>>> "The T241 platform suffers from the T241-FABRIC-4 erratum which
>>> causes..."
>>>
>> I'll fix in v2 patch.
>>
>>>> are received simultaneously from different sources. This hardware issue
>>>> impacts NVIDIA server platforms that use more than two T241 chips
>>>> interconnected. Each chip has support for 320 {E}SPIs.
>>>>
>>>> This issue occurs when multiple packets from different GICs are
>>>> incorrectly interleaved at the target chip. The erratum text below
>>>> specifies exactly what can cause multiple transfer packets susceptible
>>>> to interleaving and GIC state corruption. GIC state corruption can
>>>> lead to a range of problems, including kernel panics, and unexpected
>>>> behavior.
>>>>
>>>> From the erratum text:
>>>> "In some cases, inter-socket AXI4 Stream packets with multiple
>>>> transfers, may be interleaved by the fabric when presented to ARM
>>>> Generic Interrupt Controller. GIC expects all transfers of a packet
>>>> to be delivered without any interleaving.
>>>>
>>>> The following GICv3 commands may result in multiple transfer packets
>>>> over inter-socket AXI4 Stream interface:
>>>> - Register reads from GICD_I* and GICD_N*
>>>> - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
>>>> - ITS command MOVALL
>>>
>>> Does is also affect cross-chip traffic such as SPI deactivation?
>>>
>> No, it is not impacted.
>>
>>>>
>>>> Multiple commands in GICv4+ utilize multiple transfer packets,
>>>> including VMOVP, VMOVI and VMAPP.
>>>>
>>>> This issue impacts system configurations with more than 2 sockets,
>>>> that require multi-transfer packets to be sent over inter-socket
>>>> AXI4 Stream interface between GIC instances on different sockets.
>>>> GICv4 cannot be supported. GICv3 SW model can only be supported
>>>> with the workaround. Single and Dual socket configurations are not
>>>> impacted by this issue and support GICv3 and GICv4."
>>>
>>> Do you have a public link to this erratum? This is really something
>>> that we should be go back to when changing things in the GIC code
>>> (should we ever use MOVALL, for example).
>>>
>> https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
>
> Great. Please add this to the commit message and a comment next to the
> workaround code.
>
I'll do it.
> [...]
>
>>>> +static inline void __iomem *gic_dist_base_read_alias(irq_hw_number_t intid)
>>>> +{
>>>> + struct dist_base_alias *base_alias;
>>>> + int i;
>>>> +
>>>> + if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
>>>> + base_alias = gic_data.base_read_aliases;
>>>> + for (i = 0; i < gic_data.nr_dist_base_aliases; i++) {
>>>> + if (base_alias->base &&
>>>> + (intid >= base_alias->intid_start) &&
>>>> + (intid <= base_alias->intid_end)) {
>>>> + return base_alias->base;
>>>> + }
>>>> + base_alias++;
>>>> + }
>>>> + }
>>>
>>> Each distributor has the exact same number of SPIs. So why isn't this
>>> just a division that gives you a distributor number?
>>>
>>
>> I considered creating a generic function that could potentially be
>> utilized in the future for other Workarounds (WARs).
>>
>> I'll change to this in v2.
>>
>> static inline void __iomem *gic_dist_base_alias(irq_hw_number_t intid)
>> {
>> u32 chip;
>>
>> if (static_branch_unlikely(&gic_nvidia_t241_erratum)) {
>> /**
>> * {E}SPI mappings for all 4 chips
>> * Chip0 = 32-351
>> * Chip1 = 52-671
>
> s/52/352/, right?
Thanks, typo, fix in v2.
>
>> * Chip2 = 672-991
>> * Chip3 = 4096-4415
>> */
>> switch (__get_intid_range(intid)) {
>> case SPI_RANGE:
>> chip = (intid - 32) / 320;
>> break;
>> case ESPI_RANGE:
>> chip = 3;
>> break;
>> default:
>> unreachable();
>> }
>> BUG_ON(!t241_dist_base_alias[chip]);
>
> You can drop this BUG_ON(), and replace it with on at probe time.
>
>> return t241_dist_base_alias[chip];
>> }
>>
>> return gic_data.dist_base;
>> }
>
> Yup, that's much better.
>
>>
>>>> +
>>>> + return gic_data.dist_base;
>>>> +}
>>>> +
>>>> static inline void __iomem *gic_dist_base(struct irq_data *d)
>>>> {
>>>> switch (get_intid_range(d)) {
>>>> @@ -346,7 +377,7 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
>>>> if (gic_irq_in_rdist(d))
>>>> base = gic_data_rdist_sgi_base();
>>>> else
>>>> - base = gic_data.dist_base;
>>>> + base = gic_dist_base_read_alias(irqd_to_hwirq(d));
>>>>
>>>> return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask);
>>>> }
>>>> @@ -580,6 +611,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
>>>> enum gic_intid_range range;
>>>> unsigned int irq = gic_irq(d);
>>>> void __iomem *base;
>>>> + void __iomem *base_read_alias;
>>>> u32 offset, index;
>>>> int ret;
>>>>
>>>> @@ -594,14 +626,17 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
>>>> type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING)
>>>> return -EINVAL;
>>>>
>>>> - if (gic_irq_in_rdist(d))
>>>> + if (gic_irq_in_rdist(d)) {
>>>> base = gic_data_rdist_sgi_base();
>>>> - else
>>>> + base_read_alias = base;
>>>> + } else {
>>>> base = gic_data.dist_base;
>>>> + base_read_alias = gic_dist_base_read_alias(irqd_to_hwirq(d));
>>>> + }
>>>>
>>>> offset = convert_offset_index(d, GICD_ICFGR, &index);
>>>> -
>>>> - ret = gic_configure_irq(index, type, base + offset, NULL);
>>>> + ret = gic_configure_irq(index, type, base + offset, NULL,
>>>> + base_read_alias + offset);
>>>> if (ret && (range == PPI_RANGE || range == EPPI_RANGE)) {
>>>> /* Misconfigured PPIs are usually not fatal */
>>>> pr_warn("GIC: PPI INTID%d is secure or misconfigured\n", irq);
>>>> @@ -1719,6 +1754,70 @@ static bool gic_enable_quirk_hip06_07(void *data)
>>>> return false;
>>>> }
>>>>
>>>> +static bool gic_enable_quirk_nvidia_t241(void *data)
>>>> +{
>>>> +#ifdef CONFIG_ACPI
>>>> + struct dist_base_alias *base_alias;
>>>> + struct acpi_table_header *madt;
>>>> + int i, intid, nchips = 0;
>>>> + acpi_status status;
>>>> + phys_addr_t phys;
>>>> +
>>>> + status = acpi_get_table(ACPI_SIG_MADT, 0, &madt);
>>>> + if (ACPI_FAILURE(status))
>>>> + return false;
>>>> +
>>>> + /* Check NVIDIA OEM ID */
>>>> + if (memcmp(madt->oem_id, "NVIDIA", 6)) {
>>>
>>> What guarantees do we have that this string will always be present?
>>> "oem_id" is usually updated to reflect the integrator, not the
>>> silicon vendor.
>>>
>>
>> Our company provides UEFI firmware porting guidelines to OEMs that
>> ensure the MADT table generation, along with the ACPI header, remains
>> unaltered. Thanks to your input, we are now looking into alternative
>> approaches for identifying platform types and removing our dependence
>> on ACPI. Specifically, we are interested in utilizing the SMCCC API
>> to detect the CHIP. Determine whether the individual chips are present
>> by referring to the GICR regions described in MADT.
>
> Seems like a reasonable alternative.
>
>>
>>
>>>> + acpi_put_table(madt);
>>>> + return false;
>>>> + }
>>>> +
>>>> + /* Find the number of chips based on OEM_TABLE_ID */
>>>> + if ((!memcmp(madt->oem_table_id, "T241x3", 6)) ||
>>>> + (!memcmp(madt->oem_table_id, "T241c3", 6))) {
>>>> + nchips = 3;
>>>> + } else if ((!memcmp(madt->oem_table_id, "T241x4", 6)) ||
>>>> + (!memcmp(madt->oem_table_id, "T241c4", 6))) {
>>>> + nchips = 4;
>>>> + }
>>>
>>> Same question for these. This seems pretty fragile.
>>>
>> This can be avoid for the SMCCC based platform detection.
>>
>>>> +
>>>> + acpi_put_table(madt);
>>>> + if (nchips < 3)
>>>> + return false;
>>>> +
>>>> + base_alias = kmalloc_array(nchips, sizeof(*base_alias),
>>>> + GFP_KERNEL | __GFP_ZERO);
>>>
>>> You are fully initialising the structures, right? So why the
>>> __GFP_ZERO?
>> Yes, not needed. will use the staic array since size is small after
>> removing INTID_start/end feilds.
>>
>>>
>>>> + if (!base_alias)
>>>> + return false;
>>>> +
>>>> + gic_data.base_read_aliases = base_alias;
>>>> + gic_data.nr_dist_base_aliases = nchips;
>>>> +
>>>> + /**
>>>> + * Setup GICD alias and {E}SPIs range for each chip
>>>> + * {E}SPI blocks mappings:
>>>> + * Chip0 = 00-09
>>>> + * Chip1 = 10-19
>>>> + * Chip2 = 20-29
>>>> + * Chip3 = 30-39
>>>
>>> What are these ranges? From the code below, I can (sort of) guess that
>>> each chip has 10 registers in the SPI/ESPI range, with chips 0-1
>>> dealing with SPIs, and chips 2-3 dealing with ESPIs.
>>>
>>> It would be a lot clearer if you indicated the actual INTID ranges.
>> Agree.
>>
>>>
>>>> + */
>>>> + for (i = 0; i < nchips; i++, base_alias++) {
>>>> + phys = ((1ULL << 44) * i) | 0x23580000;
>>>
>>> Where is this address coming from? Can it be inferred from the MADT?
>>> It would also be a lot more readable if written as:
>>>
>>> #define CHIP_MASK GENMASK_ULL(45, 44)
>>> #define CHIP_ALIAS_BASE 0x23580000
>>>
>> I'll define macros for constants. Use the offset, global GICD-PHYS,
>> and CHIP number to get the alias addressses.
>>
>> #define T241_CHIPN_MASK GENMASK_ULL(45, 44)
>> #define T241_CHIP_GICDA_OFFSET 0x1580000
>>
>> phys = gic_data.dist_phys_base + T241_CHIP_GICDA_OFFSET;
>> phys |= FIELD_PREP(T241_CHIPN_MASK, i);
>>
>>
>>> phys = CHIP_ALIAS_BASE;
>>> phys |= FIELD_PREP(CHIP_MASK, i);
>>>
>>>> + base_alias->base = ioremap(phys, SZ_64K);
>>>> + WARN_ON(!base_alias->base);
>>>> +
>>>> + intid = i < 3 ? 32 + i * 10 * 32 : ESPI_BASE_INTID;
>>>> + base_alias->intid_start = intid;
>>>> + base_alias->intid_end = intid + 10 * 32 - 1;
>>>
>>> This really is obfuscated. And it also shows that we really don't need
>>> the INTID ranges in the data structure. You can easily get to the chip
>>> number with something like:
>> ACK
>>
>>>
>>> switch (__get_intid_range(intid)) {
>>> case SPI_RANGE:
>>> chip = (intid - 32) / 320;
>>> break;
>>> case ESPI_RANGE:
>>> chip = (intid - ESPI_BASE_INTID) / 320;
>>> break;
>>> }
>>>
>>> alias = base_alias[chip];
>>>
>>> Bonus point if you add a #define for the magic numbers.
>>>
>> ACK
>>
>>>> + }
>>>> + static_branch_enable(&gic_nvidia_t241_erratum);
>>>> + return true;
>>>> +#else
>>>> + return false;
>>>> +#endif
>>>> +}
>>>
>>> How about moving the whole function under #ifdef CONFIG_ACPI?
>>>
>>
>> If you're not satisfied with SMCCC-based platform detection, I'll
>> make the necessary changes. We value your input and would appreciate
>> your opinion on whether we should use SMCCC or ACPI-OEM-ID based
>> platform detection. Our preference is to go with SMC if that's
>> agreeable to you.
>
> If you can guarantee that this FW-based discovery will always be
> available, then this is a more robust way of doing it.
>
We can assure you that the FW-based discovery will be available on
T241 platform. Thank you for acknowledging the alternative mechanism.
>>
>>
>> #define SMCCC_JEP106_BANK_ID(v) FIELD_GET(GENMASK(30, 24), (v))
>> #define SMCCC_JEP106_ID_CODE(v) FIELD_GET(GENMASK(22, 16), (v))
>> #define SMCCC_JEP106_SOC_ID(v) FIELD_GET(GENMASK(15, 0), (v))
>>
>> #define JEP106_NVIDIA_BANK_ID 0x3
>> #define JEP106_NVIDIA_ID_CODE 0x6b
>> #define T241_CHIPN_MASK GENMASK_ULL(45, 44)
>> #define T241_CHIP_GICDA_OFFSET 0x1580000
>> #define T241_CHIP_ID 0x241
>>
>> static bool gic_enable_quirk_nvidia_t241(void *data)
>> {
>> unsigned long chip_bmask = 0;
>> struct arm_smccc_res res;
>> phys_addr_t phys;
>> u32 i;
>>
>> if ((arm_smccc_get_version() < ARM_SMCCC_VERSION_1_2) ||
>> (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)) {
>> return false;
>> }
>>
>> arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
>> ARM_SMCCC_ARCH_SOC_ID, &res);
>> if ((s32)res.a0 < 0)
>> return false;
>>
>> arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
>> if ((s32)res.a0 < 0)
>> return false;
>
> Most of this should probably directly come from the soc_id
> infrastructure. It would need to probe early and expose the low-level
> data.
>
Agree, I will move the above code to drivers/firmware/smccc with a new patch.
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -17,9 +17,14 @@ static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
bool __ro_after_init smccc_trng_available = false;
u64 __ro_after_init smccc_has_sve_hint = false;
+bool __ro_after_init smccc_has_soc_id;
+s32 __ro_after_init smccc_soc_id_version;
+s32 __ro_after_init smccc_soc_id_revision;
void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
{
+ struct arm_smccc_res res;
+
smccc_version = version;
smccc_conduit = conduit;
@@ -27,6 +32,21 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
if (IS_ENABLED(CONFIG_ARM64_SVE) &&
smccc_version >= ARM_SMCCC_VERSION_1_3)
smccc_has_sve_hint = true;
+
+ if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
+ (smccc_conduit != SMCCC_CONDUIT_NONE)) {
+ arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+ ARM_SMCCC_ARCH_SOC_ID, &res);
+ if ((s32)res.a0 >= 0) {
+ smccc_has_soc_id = true;
+ arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 0, &res);
+ smccc_soc_id_version = (s32)res.a0;
+ arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_SOC_ID, 1, &res);
+ smccc_soc_id_revision = (s32)res.a0;
+ } else if ((s32)res.a0 != SMCCC_RET_NOT_SUPPORTED) {
+ pr_err("ARCH_SOC_ID returned error: %lx\n", res.a0);
+ }
+ }
}
GICv3:
#define T241_CHIPN_MASK GENMASK_ULL(45, 44)
#define T241_CHIP_GICDA_OFFSET 0x1580000
#define SMCCC_SOC_ID_MASK 0x7f7fffff
#define SMCCC_SOC_ID_T241 0x036b0241
static bool gic_enable_quirk_nvidia_t241(void *data)
{
unsigned long chip_bmask = 0;
phys_addr_t phys;
u32 i;
/* Check JEP106 code for NVIDIA T241 chip (036b:0241) */
if (!smccc_has_soc_id || (smccc_soc_id_version < 0) ||
((smccc_soc_id_version & SMCCC_SOC_ID_MASK) != SMCCC_SOC_ID_T241)) {
return false;
}
More information about the linux-arm-kernel
mailing list