NVME controller issue prevents boot on Kernel versions >= 5.8 on MacBook 7,1 (MacBook Air 2015)

B.L. Jones brandon.gustav at googlemail.com
Fri Oct 30 00:31:12 EDT 2020


Hello all,

This is my first time emailing any kernel mail list so I will do my
best to furnish any information you may need regarding this.

To begin, I reported a boot issue against the kernel on the
Fedora/RedHat Bugzilla last month here:
https://bugzilla.redhat.com/show_bug.cgi?id=1878596

Essentially, booting any kernel greater than version 5.8 on my 2015
MacBook Air would cause a soft freeze, with an error stating:

[   34.890739] f-air kernel: nvme nvme0: controller is down; will
reset: CSTS=0x3, PCI_STATUS=0x10
[   34.917820] f-air kernel: nvme nvme0: detected Apple NVMe
controller, set queue depth=2 to work around controller resets

Another individual in the bugzilla thread took a look at the error
logs and identified a potential commit that may have caused the issue:

commit 54b2fcee1db041a83b52b51752dade6090cf952f
Author: Keith Busch <kbusch at kernel.org>
Date:   Mon Apr 27 11:54:46 2020 -0700

    nvme-pci: remove last_sq_tail

    The nvme driver does not have enough tags to wrap the queue, and blk-mq
    will no longer call commit_rqs() when there are no new submissions to
    notify.

    Signed-off-by: Keith Busch <kbusch at kernel.org>
    Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
    Signed-off-by: Christoph Hellwig <hch at lst.de>
    Signed-off-by: Jens Axboe <axboe at kernel.dk>
===

The main/interesting hunk in this patch is as follows:

===
@@ -446,24 +445,11 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
  return 0;
 }

-/*
- * Write sq tail if we are asked to, or if the next command would wrap.
- */
-static inline void nvme_write_sq_db(struct nvme_queue *nvmeq, bool write_sq)
+static inline void nvme_write_sq_db(struct nvme_queue *nvmeq)
 {
- if (!write_sq) {
- u16 next_tail = nvmeq->sq_tail + 1;
-
- if (next_tail == nvmeq->q_depth)
- next_tail = 0;
- if (next_tail != nvmeq->last_sq_tail)
- return;
- }
-
  if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail,
  nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei))
  writel(nvmeq->sq_tail, nvmeq->q_db);
- nvmeq->last_sq_tail = nvmeq->sq_tail;
 }
===

I built kernel version 5.8.16 per the recommendation on the bugzilla
thread, reverting this commit: 54b2fcee1db0 ("nvme-pci: remove
last_sq_tail"), and was able to successfully boot, whereas before no
kernel version greater than 5.8 would boot.

Additionally, another user built this kernel version and packaged it
with the Fedora build service with this commit reversion. That build
also booted successfully and seemed to confirm that this commit was
causing the issue.

More detailed technical information can be found in M. Vefa Bicakci's
comment on the bugzilla thread:
https://bugzilla.redhat.com/show_bug.cgi?id=1878596#c22

It is my hope that this information can lead to a fix in a newer
kernel update for this hardware. If you require any additional
information, please let me know and I will do my best to provide what
you need (I am not quite a developer, just an enthusiastic user).


All the best,
Brandon Jones



More information about the Linux-nvme mailing list