nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034

Keith Busch kbusch at kernel.org
Wed Mar 15 12:39:04 PDT 2023


On Wed, Mar 15, 2023 at 06:23:32PM +0000, Belanger, Martin wrote:
> > 
> > On Wed, Mar 15, 2023 at 05:48:14PM +0000, Belanger, Martin wrote:
> > > I'm running tests where I connect/disconnect to/from a few I/O controllers
> > using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the
> > target. The kernel module crashes (trace below) while trying to connect over
> > TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel
> > using the latest nvme-6.4 branch and I'm still seeing the crash.
> > >
> > > I'm not sure how to debug this further. Any suggestions?
> > 
> > Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
> > like that would work for a connect command since there's no bdev at this point,
> > and polling needs a bdev.
> 
> Thanks for pointing me in the right direction.
> I wrote a test program that exercises all the different options available.
> The crash went away once I removed "nr-poll-queues=4". 
> But this begs the question: should a user-space program be given the ability
> to crash the kernel by simply providing the wrong (or weird) arguments?

Right, we certainly don't want to let an easy kernel crash like this exist now
that we know it's there. I'm just consdering a couple different ways to fix it.
We could just reject user polling options for nvme fabrics, or we could make
polling work with just a request_queue instead of needing a bdev.



More information about the Linux-nvme mailing list