[REGRESSION] nvme: code command_id with a genctr for use-after-free validation crashes apple T2 SSD

Keith Busch kbusch at kernel.org
Sat Sep 25 10:16:18 PDT 2021


On Sat, Sep 25, 2021 at 01:10:42PM +0000, Orlando Chamberlain wrote:
> Commit e7006de6c238 causes the SSD controller on Apple T2 computers to crash
> and prevents linux from booting.
> 
> This commit implemented a counter that is stored within the NVMe command_id,
> however this counter makes the command_id higher than normal, causing a panic
> on the T2 security chip that functions as the SSD controller, which then
> causes the system to power off after a few seconds.

Ah, yet another spec non-complainat quirk from these controllers.

> This was reported on bugzilla here:
> https://bugzilla.kernel.org/show_bug.cgi?id=214509 but it was not originally
> classified as NVMe (when the report was created it was unknown what was
> causing it), so I don't know if it notified the NVMe mailing list when it
> was later reclassified to NVMe. Sorry if you've already seen this issue.

The mailing list was not copied, so thank you for directly notifying
this list. 
 
> The T2 security chip (which is the SSD) has this line in its crash log (the
> rest of this log is in an attachment on the bugzilla report):
> 
> panic(cpu 1 caller 0xfffffff028d884ec): ANS2 Recoverable Panic - assert failed: [7447]:command id out of range error (cid = 4120), status_reg: 0x2000 - Null(2)
> 
> This is the entry in lspci -nn for the ssd:
> 
> 04:00.0 Mass storage controller [0180]: Apple Inc. ANS2 NVMe Controller [106b:2005] (rev 01)
> 
> This commit was included in 5.14.6 and backported to 5.10.67, but does not
> occur in 5.14.5 and 5.10.66. I am on a MacBookPro16,1, the crash has been
> reproduced on a MacBookPro16,2 as well. 

Is the PCI VID:DID the same from in your lspci output for all affected
macbooks?

> I have been able to reproduce on Arch
> Linux with vanilla kernel 5.10.67 (others have gotten it on 5.14.6) with no
> DKMS modules, and I bisected it to that commit
> (e7006de6c23803799be000a5dcce4d916a36541a).
> 
> I've tried to modify the genctr so that it is in the other side of the
> command_id (which I thought might make the command_id's lower) with the patch
> below, but it did not prevent the crash.

That might mean the h/w is using the command id as an index into
internal structures. That is not spec compliant, so it sounds like
we'll need to introduce another quirk for the macs.



More information about the Linux-nvme mailing list