nvme-fabrics: crash at nvme connect-all
Steve Wise
swise at opengridcomputing.com
Thu Jun 9 08:40:32 PDT 2016
> > I don't see this on my 16 core/64GB memory note, I successfully did a
> > discover/connect-all with the target/host on the same node with 7 target devices
> > w/o any errors. Note I'm using the nvmf-all.2 branch Christoph setup up
> > yesterday.
> >
> > Marta, I need to learn more about your T5 setup and the "stats" file output.
> > Thanks!
> >
> > Steve.
>
> Steve, It seems to me that there's a PBLMEM exhaustion because my card has less
> resources than yours (224 MRs if I repeat your calculations):
> # cat /sys/kernel/debug/iw_cxgb4/0000\:09\:00.4/stats
> Object: Total Current Max Fail
> PDID: 65536 1 2 0
> QID: 1024 0 0 0
> TPTMEM: 91136 0 0 0
> PBLMEM: 227840 0 0 0
> RQTMEM: 318976 0 0 0
> OCQPMEM: 0 0 0 0
> DB FULL: 0
> DB EMPTY: 0
> DB DROP: 0
> DB State: NORMAL Transitions 0 FC Interruptions 0
> TCAM_FULL: 0
> ACT_OFLD_CONN_FAILS: 0
> PAS_OFLD_CONN_FAILS: 0
> NEG_ADV_RCVD: 0
> AVAILABLE IRD: 1024
>
> Fore the more exact reference, it's:
> [ 18.651764] cxgb4 0000:09:00.4 eth1: eth1: Chelsio T580-LP-SO (0000:09:00.4)
> 40GBASE-R QSFP
> [ 18.651979] cxgb4 0000:09:00.4 eth2: eth2: Chelsio T580-LP-SO (0000:09:00.4)
> 40GBASE-R QSFP
> [ 18.652025] cxgb4 0000:09:00.4: Chelsio T580-LP-SO rev 0
>
> No config file in the firmware directory.
>
Thanks Marta. That card has less memory than the T580-CR. I'm checking with Chelsio on the details. The "-SO" might mean a mem-free card.
Also, can you email me the output of 'cat /sys/kernel/debug/cxgb4/blah/meminfo'?
So to make it work given the adapter resources, you need to make the queues shallower and have less of them. If I can get you a config file that increases the available rdma memory, I'll send it to you. But perhaps this card is just a low/no memory card more tailored for NIC only vs RDMA. (I'll confirm this soon).
Steve
More information about the Linux-nvme
mailing list