nvme-fabrics: crash at nvme connect-all
Marta Rybczynska
mrybczyn at kalray.eu
Thu Jun 9 08:04:11 PDT 2016
----- Le 9 Juin 16, à 16:29, Steve Wise swise at opengridcomputing.com a écrit :
>> > >
>> > > >>> Steve, did you see this before? I'm wandering if we need some sort
>> > > >>> of logic handling with resource limitation in iWARP (global mrs
> pool...)
>> > > >>
>> > > >> Haven't seen this. Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats'
> show
>> > > >> anything interesting? Where/why is it crashing?
>> > > >>
>> > > >
>> > > > So this is the failure:
>> > > >
>> > > > [ 703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
>> > > > [ 703.239498] failed to init MR pool ret= -12
>> > > > [ 703.239541] nvmet_rdma: failed to create_qp ret= -12
>> > > > [ 703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue
>> > > failed
>> > > > (-12).
>> > > >
>> > > > Not sure why it would fail. I would think my setup would be allocating
>> more
>> > > > given I have 16 cores on the host and target. The debugfs "stats" file
> I
>> > > > mentioned above should show us something if we're running out of adapter
>> > > > resources for MR or PBL records.
>> > >
>> > > Note that Marta ran both the host and the target on the same machine.
>> > > So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
>> > > MRs...
>> > >
>> > > What is the T5 limitation?
>> >
>> > It varies based on a config file that gets loaded when cxgb4 loads. Note
> the
>> > error has nothing to do with the low fastreg sg depth limit of T5. If we
> were
>> > hitting that then we would be seeing EINVAL and not ENOMEM. Looking at
>> > c4iw_alloc_mr(), the ENOMEM paths are either failures from kzalloc() or
>> > dma_alloc_coherent(), or failures to allocate adapter resources for MR and
> PBL
>> > records. Each MR takes a 32B record in adapter mem, and the PBL takes
>> whatever
>> > based on the max sg depth (roughly sg_depth * 8 + some rounding up). The
>> > debugfs "stats" file will show us what is being exhausted and how much
> adapter
>> > mem is available for these resources.
>> >
>> > Also, the amount of available adapter mem depends on the type of T5 adapter.
>> > The T5 adapter info should be in the dmesg log when cxgb4 is loaded.
>> >
>> > Steve
>>
>> Here is an example of the iw_cxgb4 debugfs "stats" output. This is for a
>> T580-CR with the "default" configuration, which means there is no config file
>> named t5-config.txt in /lib/firmware/cxgb4/.
>>
>> [root at stevo1 linux-2.6]# cat /sys/kernel/debug/iw_cxgb4/0000\:82\:00.4/stats
>> Object: Total Current Max Fail
>> PDID: 65536 0 0 0
>> QID: 24576 0 0 0
>> TPTMEM: 36604800 0 0 0
>> PBLMEM: 91512064 0 0 0
>> RQTMEM: 128116864 0 0 0
>> OCQPMEM: 0 0 0 0
>> DB FULL: 0
>> DB EMPTY: 0
>> DB DROP: 0
>> DB State: NORMAL Transitions 0 FC Interruptions 0
>> TCAM_FULL: 0
>> ACT_OFLD_CONN_FAILS: 0
>> PAS_OFLD_CONN_FAILS: 0
>> NEG_ADV_RCVD: 0
>> AVAILABLE IRD: 589824
>>
>> Note it shows the total, currently allocated, max ever allocated, and failures
>> for each rdma resource, most of which are tied to HW resources. So if we see
>> failures, then we know the adapter resources were exhausted.
>>
>> TPTMEM is the available adapter memory for MR records. Each record is 32B.
> So
>> a total of 1143900 MRs (TPTMEM / 32) can be created. The PBLMEM resource is
>> for
>> holding the dma addresses for all pages in a MR, so each MR uses some number
>> depending on the sg depth passed in when allocating a FRMR. So if we allocate
>> 128 deep page lists, we should be able to allocate 89367 PBLs (PBLMEM / 8 /
>> 128).
>>
>> Seems like we shouldn't be exhausting the adapter resources with 2048 MRs...
>>
>> Steve
>
> I don't see this on my 16 core/64GB memory note, I successfully did a
> discover/connect-all with the target/host on the same node with 7 target devices
> w/o any errors. Note I'm using the nvmf-all.2 branch Christoph setup up
> yesterday.
>
> Marta, I need to learn more about your T5 setup and the "stats" file output.
> Thanks!
>
> Steve.
Steve, It seems to me that there's a PBLMEM exhaustion because my card has less
resources than yours (224 MRs if I repeat your calculations):
# cat /sys/kernel/debug/iw_cxgb4/0000\:09\:00.4/stats
Object: Total Current Max Fail
PDID: 65536 1 2 0
QID: 1024 0 0 0
TPTMEM: 91136 0 0 0
PBLMEM: 227840 0 0 0
RQTMEM: 318976 0 0 0
OCQPMEM: 0 0 0 0
DB FULL: 0
DB EMPTY: 0
DB DROP: 0
DB State: NORMAL Transitions 0 FC Interruptions 0
TCAM_FULL: 0
ACT_OFLD_CONN_FAILS: 0
PAS_OFLD_CONN_FAILS: 0
NEG_ADV_RCVD: 0
AVAILABLE IRD: 1024
Fore the more exact reference, it's:
[ 18.651764] cxgb4 0000:09:00.4 eth1: eth1: Chelsio T580-LP-SO (0000:09:00.4) 40GBASE-R QSFP
[ 18.651979] cxgb4 0000:09:00.4 eth2: eth2: Chelsio T580-LP-SO (0000:09:00.4) 40GBASE-R QSFP
[ 18.652025] cxgb4 0000:09:00.4: Chelsio T580-LP-SO rev 0
No config file in the firmware directory.
--
Marta Rybczynska
Phone : +33 6 71 09 68 03
mrybczyn at kalray.eu
More information about the Linux-nvme
mailing list