Data corruption when using multiple devices with NVMEoF TCP
Hao Wang
pkuwangh at gmail.com
Tue Jan 12 03:55:59 EST 2021
Yes, this patch fixes the problem! Thanks!
Tested on top of a0d54b4f5b21.
Hao
On Mon, Jan 11, 2021 at 5:29 PM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
> > Hey Hao,
> >
> >> Here is the entire log (and it's a new one, i.e. above snippet not
> >> included):
> >> https://drive.google.com/file/d/16ArIs5-Jw4P2f17A_ftKLm1A4LQUFpmg/view?usp=sharing
> >>
> >>
> >> What I found is the data corruption does not always happen, especially
> >> when I copy a small directory. So I guess a lot of log entries should
> >> just look fine.
> >
> > So this seems to be a breakage that existed for some time now with
> > multipage bvecs that you have been the first one to report. This
> > seems to be related to bio merges, which is seems strange to me
> > why this just now comes up, perhaps it is the combination with
> > raid0 that triggers this, I'm not sure.
>
> OK, I think I understand what is going on. With multipage bvecs
> bios can split in the middle of a bvec entry, and then merge
> back with another bio.
>
> The issue is that we are not capping the last bvec entry send length
> calculation in that.
>
> I think that just this can also resolve the issue:
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 973d5d683180..c6b0a189a494 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -201,8 +201,9 @@ static inline size_t nvme_tcp_req_cur_offset(struct
> nvme_tcp_request *req)
>
> static inline size_t nvme_tcp_req_cur_length(struct nvme_tcp_request *req)
> {
> - return min_t(size_t, req->iter.bvec->bv_len - req->iter.iov_offset,
> - req->pdu_len - req->pdu_sent);
> + return min_t(size_t, req->iter.count,
> + min_t(size_t, req->iter.bvec->bv_len -
> req->iter.iov_offset,
> + req->pdu_len - req->pdu_sent));
> }
>
> static inline size_t nvme_tcp_pdu_data_left(struct nvme_tcp_request *req)
> --
More information about the Linux-nvme
mailing list