NVMEoF oops on reset
Max Gurtovoy
maxg at mellanox.com
Wed Feb 7 15:23:00 PST 2018
On 2/7/2018 10:54 PM, Berck Nash wrote:
> On 02/06/2018 06:06 PM, Max Gurtovoy wrote:
>> On 2/7/2018 12:04 AM, Berck Nash wrote:
>>> We're experiencing an oops whenever we issue an "nvme reset" via the
>>> nvme cli on fabric setups. Appears to be in the nvme_rdma code. The
>>> problem occurs on mainline 4.15, as well as on 4.16-nvme (commit
>>> ca5554a696dce37852f6d6721520b4f13fc295c3).
>>
>> please try me patches for fixing the state machine (attached).
>> These should apply over nvme-4.16 but still there is a missing commit
>> from Sagi the I mentioned in the cover letter. So with these 4 patches
>> your test should pass...
>
> Thanks, but that doesn't seem to be any better. Loaded all 4 patches
> against nvme-4.16, and got a slightly different crash. Entire log attached.
>
I suggest you taking Linus master branch and apply my 3 patches and
re-test it. nvme-4.16 is not rebased over 4.15.0 IMO.
You might need to fix some stright-forward conflict applying the patches..
I run successfuly a loop with 100 iterations of "nvme reset /dev/nvme0"
BTW, please add more details regarding your setup (I use ConnectX-5 in
my test connected B2B using IB link layer).
-Max.
More information about the Linux-nvme
mailing list