[PATCH] nvme-pci: 512 byte aligned dma pool segment quirk

Thu Nov 14 03:38:03 PST 2024

Hi all,

I've been tracking down an issue that seems to be related (identical?) to
this one, and I would like to propose a different fix.

I have a device with the aforementioned NVMe-eMMC bridge, and I was
experiencing nvme read timeouts after updating the kernel from 5.15 to
6.6. Doing a kernel bisect, I arrived at the same dma pool commit as
Robert in the original thread.

After trying out some changes in the nvme-pci driver, I came up with the
same fix as in this thread: change the alignment of the small pool to
512. However, I wanted to get a deeper understanding of what's going on.

After a lot of analysis, I found out why the nvme timeouts were happening:
The bridge incorrectly implements PRP list chaining.

When doing a read of exactly 264 sectors, and allocating a PRP list with
offset 0xf00, the last PRP entry in that list lies right before a page
boundary.  The bridge incorrectly (?) assumes that it's a pointer to a
chained PRP list, tries to do a DMA to address 0x0, gets a bus error,
and crashes.

When doing a write of 264 sectors with PRP list offset of 0xf00,
the bridge treats data as a pointer, and writes incorrect data to
the drive. This might be why Robert is experiencing fs corruption.

So if my findings are right, the correct quirk would be "don't make PRP
lists ending on a page boundary".

Changing the small dma pool alignment to 512 happens to fix the issue
because it never allocates a PRP list with offset 0xf00. Theoretically,
the issue could still happen with the page pool, but this bridge has
a max transfer size of 64 pages, which is not enough to fill an entire
page-sized PRP list.

Robert, could you check that the fs corruption happens only after
transfers of size 257-264 and PRP list offset of 0xf00? This would
confirm my theory.