Failure with 8K Write operations

Narayan Ayalasomayajula narayan.ayalasomayajula at kazan-networks.com
Tue Nov 1 18:24:13 PDT 2016


Hi Ming,

We have a HW implementation of the NVMeF target and the issue was different :) I believe the issue was related to some buffer mis-use under heavy workload.

Thanks,
Narayan

-----Original Message-----
From: Ming Lin [mailto:mlin at kernel.org] 
Sent: Tuesday, November 01, 2016 4:28 PM
To: Narayan Ayalasomayajula <narayan.ayalasomayajula at kazan-networks.com>
Cc: J Freyensee <james_p_freyensee at linux.intel.com>; Sagi Grimberg <sagi at grimberg.me>; linux-nvme at lists.infradead.org
Subject: Re: Failure with 8K Write operations

On Tue, Nov 1, 2016 at 4:07 PM, Ming Lin <mlin at kernel.org> wrote:
> On Thu, Sep 15, 2016 at 6:36 AM, Narayan Ayalasomayajula 
> <narayan.ayalasomayajula at kazan-networks.com> wrote:
>> Hi Jay,
>>
>> Thanks for pointing out that I was not running the latest version of the kernel. I updated to 4.8rc6 and my FIO test that had previously failed with the Linux NVMeF target (using null_blk device as the target) is now completing successfully. I am still seeing the same NAK (Remote Access Error) failure when I use our target instead. I will debug this further but updating to 4.8rc6 did improve things.
>
> Hi Narayan,
>
> I also saw similar error with 8k write when I use my own target implementation.
> Did you fix it already?

Hi Narayan,

With Sagi's great help off-line, I just fixed it.
In my code, when I post RDMA_READ, I didn't set rdma_wr.next to NULL.

Shame on myself ... example code as below
	
int rw_ctx_post(NvmetRdmaRsp *rsp)
{
    rsp->rdma_wr.next = NULL;
    return ibv_post_send(cm_id->qp, &rsp->rdma_wr, &bad_wr); }

Possibly you may don't have this kind of stupid bug ...


More information about the Linux-nvme mailing list