[PATCH v4] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4

Shanker Donthineni sdonthineni at nvidia.com
Sat Mar 18 08:29:01 PDT 2023


Hi Marc,

On 3/18/23 04:44, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Sat, 18 Mar 2023 04:58:12 +0000,
> Shanker Donthineni <sdonthineni at nvidia.com> wrote:
>>
>> The T241 platform suffers from the T241-FABRIC-4 erratum which causes
>> unexpected behavior in the GIC when multiple transactions are received
>> simultaneously from different sources. This hardware issue impacts
>> NVIDIA server platforms that use more than two T241 chips
>> interconnected. Each chip has support for 320 {E}SPIs.
>>
>> This issue occurs when multiple packets from different GICs are
>> incorrectly interleaved at the target chip. The erratum text below
>> specifies exactly what can cause multiple transfer packets susceptible
>> to interleaving and GIC state corruption. GIC state corruption can
>> lead to a range of problems, including kernel panics, and unexpected
>> behavior.
>>
>>  From the erratum text:
>>    "In some cases, inter-socket AXI4 Stream packets with multiple
>>    transfers, may be interleaved by the fabric when presented to ARM
>>    Generic Interrupt Controller. GIC expects all transfers of a packet
>>    to be delivered without any interleaving.
>>
>>    The following GICv3 commands may result in multiple transfer packets
>>    over inter-socket AXI4 Stream interface:
>>     - Register reads from GICD_I* and GICD_N*
>>     - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
>>     - ITS command MOVALL
>>
>>    Multiple commands in GICv4+ utilize multiple transfer packets,
>>    including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."
>>
>>    This issue impacts system configurations with more than 2 sockets,
>>    that require multi-transfer packets to be sent over inter-socket
>>    AXI4 Stream interface between GIC instances on different sockets.
>>    GICv4 cannot be supported. GICv3 SW model can only be supported
>>    with the workaround. Single and Dual socket configurations are not
>>    impacted by this issue and support GICv3 and GICv4."
>>
>> Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
>>
>> Writing to the chip alias region of the GICD_In{E} registers except
>> GICD_ICENABLERn has an equivalent effect as writing to the global
>> distributor. The SPI interrupt deactivate path is not impacted by
>> the erratum.
>>
>> To fix this problem, implement a workaround that ensures read accesses
>> to the GICD_In{E} registers are directed to the chip that owns the
>> SPI, and disables GICv4.x features for KVM. To simplify code changes,
>> the gic_configure_irq() function uses the same alias region for both
>> read and write operations to GICD_ICFGR.
>>
>> Co-developed-by: Vikram Sethi <vsethi at nvidia.com>
>> Signed-off-by: Vikram Sethi <vsethi at nvidia.com>
>> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
>> ---
>> Changes since v2:
>>   - Fix the build issue for the 32bit arch
>> Changes since v2:
>>   - Add accessors for the SOC-ID version & revision
>>   - Include "linux/bitfield.h" and "linux/bits.h" in irq-gic-v3.c
>> Changes since v1:
>>   - Use SMCCC SOC-ID API for detecting the T241 chip
>>   - Implement Marc's suggestions
>>   - Edit commit text
> 
> You seem to have ignored most of my comments on v2[1] apart from the
> SOC_ID stuff. I guess I'll wait for v5...
> 
>          M.
> 
> [1] https://lore.kernel.org/all/871qlqif9v.wl-maz@kernel.org/
> 

Sorry, I did not intentionally ignore your input, but unfortunately, lost
this specific email in my outlook. Your feedback is valuable, and we will
ensure that all of your review comments are addressed in the v5.

-Shanker




More information about the linux-arm-kernel mailing list