kernel NULL pointer on nvmet with stress rescan_controller/reset_controller test during I/O

Sagi Grimberg sagi at grimberg.me
Mon Mar 6 03:52:13 PST 2017


> Hi

Hi Yi,

> I always can reproduce this issue during stress test on rescan_controller/reset_controller, could you help check it, thanks.
>
> Reproduce steps on Initiator side:
> #fio -filename=/dev/nvme0n1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -bs_unaligned -runtime=1200 -size=-group_reporting -name=mytest -numjobs=60 &
> #num=0
> while [ $num -lt 200 ]
> do
>         echo "-------------------------------$num"
>         echo 1 >/sys/block/nvme0n1/device/rescan_controller || exit 1
>         echo 1 >/sys/block/nvme0n1/device/reset_controller || exit 1
>         ((num++))
> done

nvmet-rdma makes sure that no inflight IO nor completions are pending
when destroying the queue. The below looks like we got a recv completion
event after we freed all the tasks for the queue (which happens after
ib_drain_qp and rdma_destroy_qp).

So this is definitely weird. Which device are you using? I ran
the exact scenario on my VM and didn't see any NULL deref...



More information about the Linux-nvme mailing list