[PATCH] nvme-tcp: fix incorrect h2cdata pdu offset accounting

Keith Busch kbusch at kernel.org
Thu Sep 30 13:15:10 PDT 2021


On Wed, Sep 29, 2021 at 12:00:20AM +0300, Sagi Grimberg wrote:
> 
> 
> On 9/28/21 11:40 PM, Keith Busch wrote:
> > On Tue, Sep 14, 2021 at 06:38:55PM +0300, Sagi Grimberg wrote:
> > > When the controller sends us multiple r2t PDUs in a single
> > > request we need to account for it correctly as our send/recv
> > > context run concurrently (i.e. we get a new r2t with r2t_offset
> > > before we updated our iterator and req->data_sent marker). This
> > > can cause wrong offsets to be sent to the controller.
> > > 
> > > To fix that, we will first know that this may happen only in
> > > the send sequence of the last page, hence we will take
> > > the r2t_offset to the h2c PDU data_offset, and in
> > > nvme_tcp_try_send_data loop, we make sure to increment
> > > the request markers also when we completed a PDU but
> > > we are expecting more r2t PDUs as we still did not send
> > > the entire data of the request.
> > > 
> > > Fixes: 825619b09ad3 ("nvme-tcp: fix possible use-after-completion")
> > > Reported-by: Nowak, Lukasz <Lukasz.Nowak at Dell.com>
> > > Tested-by: Nowak, Lukasz <Lukasz.Nowak at Dell.com>
> > > Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> > > ---
> > > - Keith, can you ask the WD team to test this as well?
> > 
> > I finally have some feedback on testing, and it looks like this patch
> > has created a regression. The testers are observing:
> > 
> >    [Tue Sep 28 02:12:55 2021] nvme nvme6: req 3 r2t len 8192 exceeded data len 8192 (4096 sent)
> >    [Tue Sep 28 02:12:55 2021] nvme nvme6: receive failed:  -71
> >    [Tue Sep 28 02:12:55 2021] nvme nvme6: starting error recovery
> > 
> > Which we had seen before, and you fixed with commit:
> > 
> >    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=825619b09ad351894d2c6fb6705f5b3711d145c7
> > 
> > I only just recently heard about this, so I haven't looked into any
> > additional possible cause for this regression yet, but just wanted to
> > let you know earlier.
> 
> Yes I definitely see the bug... Damn it..
> 
> The issue here is that we peek at req->data_sent and req->data_len
> after the last send, but by then the request may have completed
> and these members were overwritten by a new execution of the
> request.
> 
> We really should make sure that we record these before we perform
> the network send to avoid this race.
> 
> Can you guys check that this restores the correct behavior?

Unfortunately this was unsuccessful. The same issue is still occuring. Please
let me know if you have another patch to try. I'll also keep looking for a
solution as well.



More information about the Linux-nvme mailing list