[PATCH 2/2] nvmet-tcp: fix connect error when setting param_inline_data_size to zero.

Hou Pu houpu.main at gmail.com
Thu May 20 19:57:14 PDT 2021


On Fri, May 21, 2021 at 6:44 AM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
> > When setting inline_data_size to zero, connect failed. This could be
> > reproduced with following steps.
> >
> > Controller side:
> > mkdir /sys/kernel/config/nvmet/ports/1
> > cd /sys/kernel/config/nvmet/ports/1
> > echo 0.0.0.0 > addr_traddr
> > echo 4421 > addr_trsvcid
> > echo ipv4 > addr_adrfam
> > echo tcp > addr_trtype
> > echo 0 > param_inline_data_size
> > ln -s /sys/kernel/config/nvmet/subsystems/mysub /sys/kernel/config/nvmet/ports/1/subsystems/mysub
> >
> > Host side:
> > [  325.145323][  T203] nvme nvme1: Connect command failed, error wo/DNR bit: 22
> > [  325.159481][  T203] nvme nvme1: failed to connect queue: 0 ret=16406
> > Failed to write to /dev/nvme-fabrics: Input/output error
> >
> > Kernel log from controller side is:
> > [  114.567411][   T56] nvmet_tcp: queue 0: failed to map data
> > [  114.568093][   T56] nvmet_tcp: unexpected pdu type 201
> >
> > When admin-connect command comes with 1024 inline data size, in nvmet_tcp_map_data(),
> > this size is compared with cmd->req.port->inline_data_size (which is 0),
> > thus the command is responded with an error code. But admin-connect command
> > is always allowed to use no more than 8192 bytes according to the nvme over
> > fabrics specification.
> >
> > The host side decides the inline data size when allocating the queue
> > according the queue number, the size of queue 0 is 8k and others is ioccsz*16.
> > The target side should do like this.
> >
> > Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
> > Signed-off-by: Hou Pu <houpu.main at gmail.com>
> > ---
> >   drivers/nvme/target/tcp.c | 24 +++++++++++++++++++++---
> >   1 file changed, 21 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index d8aceef83284..83985ab8c3aa 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -167,6 +167,24 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops;
> >   static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
> >   static void nvmet_tcp_finish_cmd(struct nvmet_tcp_cmd *cmd);
> >
> > +static inline int nvmet_tcp_inline_data_size(struct nvmet_tcp_cmd *cmd)
> > +{
> > +     struct nvmet_tcp_queue *queue = cmd->queue;
> > +     struct nvme_command *nvme_cmd = cmd->req.cmd;
> > +     int inline_data_size = NVME_TCP_ADMIN_CCSZ;
> > +     u16 qid = 0;
> > +
> > +     if (likely(queue->nvme_sq.ctrl)) {
> > +             /* The connect admin/io queue has been executed. */
> > +             qid = queue->nvme_sq.qid;
> > +             if (qid)
> > +                     inline_data_size = cmd->req.port->inline_data_size;
> > +     } else if (nvme_cmd->connect.qid)
> > +             inline_data_size = cmd->req.port->inline_data_size;
>
> How can a connection to an I/O queue arrive without having the ctrl
> reference installed? Is this for the failure case?

Hi Sagi,
AFAIK after the host finishes setting up the admin queue,
it connects to the io queue and sends the io-connect command. At this
point the nvmet_tcp_queue is firstly allocated and does not have a valid
queue->nvme_sq.ctrl. It is assigned after io-connect in nvmet_install_queue().
So this function tries to find the correct queue number before or after a
fabrics connect command.

Thank,
Hou



More information about the Linux-nvme mailing list