NVMEoF oops on reset

Max Gurtovoy maxg at mellanox.com
Wed Feb 7 15:23:00 PST 2018



On 2/7/2018 10:54 PM, Berck Nash wrote:
> On 02/06/2018 06:06 PM, Max Gurtovoy wrote:
>> On 2/7/2018 12:04 AM, Berck Nash wrote:
>>> We're experiencing an oops whenever we issue an "nvme reset" via the
>>> nvme cli on fabric setups.  Appears to be in the nvme_rdma code.  The
>>> problem occurs on mainline 4.15, as well as on 4.16-nvme (commit
>>> ca5554a696dce37852f6d6721520b4f13fc295c3).
>>
>> please try me patches for fixing the state machine (attached).
>> These should apply over nvme-4.16 but still there is a missing commit
>> from Sagi the I mentioned in the cover letter. So with these 4 patches
>> your test should pass...
> 
> Thanks, but that doesn't seem to be any better.  Loaded all 4 patches
> against nvme-4.16, and got a slightly different crash.  Entire log attached.
> 

I suggest you taking Linus master branch and apply my 3 patches and 
re-test it. nvme-4.16 is not rebased over 4.15.0 IMO.
You might need to fix some stright-forward conflict applying the patches..

I run successfuly a loop with 100 iterations of "nvme reset /dev/nvme0"

BTW, please add more details regarding your setup (I use ConnectX-5 in 
my test connected B2B using IB link layer).


-Max.




More information about the Linux-nvme mailing list