GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9

Keith Busch kbusch at kernel.org
Mon Nov 30 11:40:10 EST 2020


On Sun, Nov 29, 2020 at 04:56:39AM +0100, Marek Marczykowski-Górecki wrote:
> I can reliably hit kernel panic in nvme_map_data() which looks like the
> one below. It happens on Linux 5.9.9, while 5.4.75 works fine. I haven't
> tried other version on this hardware. Linux is running as Xen
> PV dom0, on top of nvme there is LUKS and then LVM with thin
> provisioning. The crash happens reliably when starting a Xen domU (which
> uses one of thin provisioned LVM volumes as its disk). But booting dom0
> works fine (even though it is using the same disk setup for its root
> filesystem).
> 
> I did a bit of debugging and found it's about this part:
> 
> drivers/nvme/host/pci.c:
>  800 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  801         struct nvme_command *cmnd)
>  802 {
>  803     struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>  804     blk_status_t ret = BLK_STS_RESOURCE;
>  805     int nr_mapped;
>  806 
>  807     if (blk_rq_nr_phys_segments(req) == 1) {
>  808         struct bio_vec bv = req_bvec(req);
>  809 
>  810         if (!is_pci_p2pdma_page(bv.bv_page)) {
> 
> Here, bv.bv_page->pgmap is LIST_POISON1, while page_zonenum(bv.bv_page)
> says ZONE_DEVICE. So, is_pci_p2pdma_page() crashes on accessing
> bv.bv_page->pgmap->type.

Something sounds off. I thought all ZONE_DEVICE pages require a pgmap
because that's what holds a references to the device's live-ness. What
are you allocating this memory from that makes ZONE_DEVICE true without
a pgmap?



More information about the Linux-nvme mailing list