[PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec

Thu Aug 11 09:35:29 PDT 2016

On Thu, 2016-08-11 at 10:03 +0300, Sagi Grimberg wrote:
> 
> On 11/08/16 07:07, Jay Freyensee wrote:
> > 
> > Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
> > a 0-based value.
> > 
> > Also per spec, the RDMA binding values shall be set
> > to sqsize, which makes hsqsize 0-based values.
> > 
> > Also per spec, but not very clear, is hrqsize is +1
> > of hsqsize.
> > 
> > Thus, the sqsize during NVMf connect() is now:
> > 
> > [root at fedora23-fabrics-host1 for-48]# dmesg
> > [  318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
> > admin queue: 31
> > [  318.720884] nvme nvme0: creating 16 I/O queues.
> > [  318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for
> > i/o
> > queue: 127
> > 
> > Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
> > Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
> > ---
> >  drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index 3e3ce2b..8be64f1 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -1284,8 +1284,21 @@ static int nvme_rdma_route_resolved(struct
> > nvme_rdma_queue *queue)
> > 
> >  	priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
> >  	priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
> > -	priv.hrqsize = cpu_to_le16(queue->queue_size);
> > -	priv.hsqsize = cpu_to_le16(queue->queue_size);
> > +
> > +	/*
> > +	 * On one end, the fabrics spec is pretty clear that
> > +	 * hsqsize variables shall be set to the value of sqsize,
> > +	 * which is a 0-based number. What is confusing is the
> > value for
> > +	 * hrqsize.  After clarification from NVMe spec committee
> > member,
> > +	 * the minimum value of hrqsize is hsqsize+1.
> > +	 */
> > +	if (priv.qid == 0) {
> > +		priv.hsqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize);
> > +		priv.hrqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize+1);
> > +	} else {
> > +		priv.hsqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.sqsize);
> > +		priv.hrqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.sqsize+1);
> > +	}
> 
> Huh? (scratch...) using priv.hrqsize = priv.hsqsize+1 is pointless.

It may be pointless, but Dave said that is the current interpretation
of the NVMe-over-Fabrics spec (which I don't really understand either).

> 
> We expose to the block layer X and we send to the target X-1 and
> the target does X+1 (looks goofy, but ok). We also size our RDMA
> send/recv according to X so why on earth would we want to tell the
> target we have a recv queue of size X+1

Could be the reason I see kato timeouts then kernel crashing...