[PATCH] nvme-tcp: fix incorrect h2cdata pdu offset accounting

Keith Busch kbusch at kernel.org
Tue Sep 28 13:40:52 PDT 2021


On Tue, Sep 14, 2021 at 06:38:55PM +0300, Sagi Grimberg wrote:
> When the controller sends us multiple r2t PDUs in a single
> request we need to account for it correctly as our send/recv
> context run concurrently (i.e. we get a new r2t with r2t_offset
> before we updated our iterator and req->data_sent marker). This
> can cause wrong offsets to be sent to the controller.
> 
> To fix that, we will first know that this may happen only in
> the send sequence of the last page, hence we will take
> the r2t_offset to the h2c PDU data_offset, and in
> nvme_tcp_try_send_data loop, we make sure to increment
> the request markers also when we completed a PDU but
> we are expecting more r2t PDUs as we still did not send
> the entire data of the request.
> 
> Fixes: 825619b09ad3 ("nvme-tcp: fix possible use-after-completion")
> Reported-by: Nowak, Lukasz <Lukasz.Nowak at Dell.com>
> Tested-by: Nowak, Lukasz <Lukasz.Nowak at Dell.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> - Keith, can you ask the WD team to test this as well?

I finally have some feedback on testing, and it looks like this patch
has created a regression. The testers are observing:

  [Tue Sep 28 02:12:55 2021] nvme nvme6: req 3 r2t len 8192 exceeded data len 8192 (4096 sent)
  [Tue Sep 28 02:12:55 2021] nvme nvme6: receive failed:  -71
  [Tue Sep 28 02:12:55 2021] nvme nvme6: starting error recovery

Which we had seen before, and you fixed with commit:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=825619b09ad351894d2c6fb6705f5b3711d145c7

I only just recently heard about this, so I haven't looked into any
additional possible cause for this regression yet, but just wanted to
let you know earlier.



More information about the Linux-nvme mailing list