GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9

Marek Marczykowski-Górecki marmarek at invisiblethingslab.com
Tue Dec 1 19:06:42 EST 2020


On Tue, Dec 01, 2020 at 01:40:10AM +0900, Keith Busch wrote:
> On Sun, Nov 29, 2020 at 04:56:39AM +0100, Marek Marczykowski-Górecki wrote:
> > I can reliably hit kernel panic in nvme_map_data() which looks like the
> > one below. It happens on Linux 5.9.9, while 5.4.75 works fine. I haven't
> > tried other version on this hardware. Linux is running as Xen
> > PV dom0, on top of nvme there is LUKS and then LVM with thin
> > provisioning. The crash happens reliably when starting a Xen domU (which
> > uses one of thin provisioned LVM volumes as its disk). But booting dom0
> > works fine (even though it is using the same disk setup for its root
> > filesystem).
> > 
> > I did a bit of debugging and found it's about this part:
> > 
> > drivers/nvme/host/pci.c:
> >  800 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> >  801         struct nvme_command *cmnd)
> >  802 {
> >  803     struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> >  804     blk_status_t ret = BLK_STS_RESOURCE;
> >  805     int nr_mapped;
> >  806 
> >  807     if (blk_rq_nr_phys_segments(req) == 1) {
> >  808         struct bio_vec bv = req_bvec(req);
> >  809 
> >  810         if (!is_pci_p2pdma_page(bv.bv_page)) {
> > 
> > Here, bv.bv_page->pgmap is LIST_POISON1, while page_zonenum(bv.bv_page)
> > says ZONE_DEVICE. So, is_pci_p2pdma_page() crashes on accessing
> > bv.bv_page->pgmap->type.
> 
> Something sounds off. I thought all ZONE_DEVICE pages require a pgmap
> because that's what holds a references to the device's live-ness. What
> are you allocating this memory from that makes ZONE_DEVICE true without
> a pgmap?

Well, I allocate anything myself. I just try to start the system with
unmodified Linux 5.9.9 and NVME drive...
I didn't managed to find where this page is allocated, nor where it gets
broken. I _suspect_ it gets allocated as ZONE_DEVICE page and then gets
released as ZONE_NORMAL which sets another part of the union to
LIST_POISON1. But I have absolutely no data to confirm/deny this theory.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20201202/d122f3e5/attachment.sig>


More information about the Linux-nvme mailing list