[PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
Samiullah Khawaja
skhawaja at google.com
Wed Mar 18 15:06:48 PDT 2026
Hi Nicolin,
On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
>On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
>> > From: Nicolin Chen <nicolinc at nvidia.com>
>> > Sent: Wednesday, March 18, 2026 3:16 AM
>> >
>> > An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
>> > aware of the timeout via a GERROR interrupt, the driver thread issuing the
>> > commands lacks a direct mechanism to verify whether its specific batch was
>> > the cause or not, as polling the CMD_SYNC status doesn't natively return a
>> > failure code, making it very difficult to coordinate per-device recovery.
>> >
>> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
>> > gap. When the ISR detects an ATC timeout, set the bit corresponding to the
>> > physical CMDQ index of the faulting CMD_SYNC command.
>> >
>>
>> It's nice to see the ability of allowing sw to identify the faulting sync command
>> upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
>> (similar to CMD_SYNC) are in-fly... :/
>
>Actually SMMU doesn't know which device is faulting when CMD_SYNC
VT-d is able to find out the SID of the device for which the device TLB
invalidation timed-out occured by using the SID reported in the
"Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
>follows ATC_INV commands for multiple devices. The commit message
>in PATCH-7 describes this in the end. So Jason suggested to retry
>those ATC_INV commands by bisecting them per-device, which allows
>us to pinpoint which device.
But for a software timeout, something like this would be needed.
>
>Could VT-d do the same?
>
>Nicolin
>
Thanks,
Sami
More information about the linux-arm-kernel
mailing list