nvmf/rdma host crash during heavy load and keep alive recovery
Steve Wise
swise at opengridcomputing.com
Wed Aug 10 08:46:09 PDT 2016
Hey guys, I've rebased the nvmf-4.8-rc branch on top of 4.8-rc2 so I have the
latest/gratest, and continued debugging this crash. I see:
0) 10 ram disks attached via nvmf/iw_cxgb4, and fio started on all 10 disks.
This node has 8 cores, so that is 80 connections.
1) the cxgb4 interface brought down a few seconds later
2) kato fires on all connections
3) the interface is brought back up 8 seconds after #1
4) 10 seconds after #2 all the qps are destroyed
5) reconnects start happening
6) a blk request is executed and the nvme_rdma_request struct still has a
pointer to one of the qps destroyed in 3 and whamo...
I'm digging into the request cancel logic. Any ideas/help is greatly
appreciated...
Thanks,
Steve.
More information about the Linux-nvme
mailing list