GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9

Jürgen Groß jgross at suse.com
Mon Dec 7 07:00:14 EST 2020


On 07.12.20 12:48, Marek Marczykowski-Górecki wrote:
> On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:
>> Marek,
>>
>> On 06.12.20 17:47, Jason Andryuk wrote:
>>> On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau at citrix.com> wrote:
>>>>
>>>> On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki wrote:
>>>>> On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:
>>>>>> On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek Marczykowski-Górecki wrote:
>>>>>>> culprit:
>>>>>>>
>>>>>>> commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
>>>>>>> Author: Roger Pau Monne <roger.pau at citrix.com>
>>>>>>> Date:   Tue Sep 1 10:33:26 2020 +0200
>>>>>>>
>>>>>>>       xen: add helpers to allocate unpopulated memory
>>>>>>>
>>>>>>> I'm adding relevant people and xen-devel to the thread.
>>>>>>> For completeness, here is the original crash message:
>>>>>>
>>>>>> That commit definitively adds a new ZONE_DEVICE user, so it does look
>>>>>> related.  But you are not running on Xen, are you?
>>>>>
>>>>> I am. It is Xen dom0.
>>>>
>>>> I'm afraid I'm on leave and won't be able to look into this until the
>>>> beginning of January. I would guess it's some kind of bad
>>>> interaction between blkback and NVMe drivers both using ZONE_DEVICE?
>>>>
>>>> Maybe the best is to revert this change and I will look into it when
>>>> I get back, unless someone is willing to debug this further.
>>>
>>> Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they
>>> both use page->lru which is part of the anonymous union shared with
>>> *pgmap.  That matches Marek's suspicion that the ZONE_DEVICE memory is
>>> being used as ZONE_NORMAL.
>>>
>>> memmap_init_zone_device() says:
>>> * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
>>> * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
>>> * ever freed or placed on a driver-private list.
>>
>> Second try, now even tested to work on a test system (without NVMe).
> 
> It doesn't work for me:
> 
> [  526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants
> [  526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 (x86_64-abi) persistent grants
> [  526.034810] BUG: kernel NULL pointer dereference, address: 0000000000000010

Oh, indeed. Silly bug. My test was with qdisk as backend :-(

3rd try...


Juergen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-xen-add-helpers-for-caching-grant-mapping-pages.patch
Type: text/x-patch
Size: 15906 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20201207/74557c8a/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-xen-don-t-use-page-lru-for-ZONE_DEVICE-memory.patch
Type: text/x-patch
Size: 6111 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20201207/74557c8a/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB0DE9DD628BF132F.asc
Type: application/pgp-keys
Size: 3091 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20201207/74557c8a/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20201207/74557c8a/attachment-0001.sig>


More information about the Linux-nvme mailing list