mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller

Yi Zhang yizhan at redhat.com
Tue Mar 14 06:35:32 PDT 2017



On 03/13/2017 02:16 AM, Max Gurtovoy wrote:
>
>
> On 3/10/2017 6:52 PM, Leon Romanovsky wrote:
>> On Thu, Mar 09, 2017 at 12:20:14PM +0800, Yi Zhang wrote:
>>>
>>>> I'm using CX5-LX device and have not seen any issues with it.
>>>>
>>>> Would it be possible to retest with kmemleak?
>>>>
>>> Here is the device I used.
>>>
>>> Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>>>
>>> The issue always can be reproduced with about 1000 time.
>>>
>>> Another thing is I found one strange phenomenon from the log:
>>>
>>> before the OOM occurred, most of the log are  about "adding queue", and
>>> after the OOM occurred, most of the log are about "nvmet_rdma: freeing
>>> queue".
>>>
>>> seems the release work: "schedule_work(&queue->release_work);" not 
>>> executed
>>> timely, not sure whether the OOM is caused by this reason.
>>
>> Sagi,
>> The release function is placed in global workqueue. I'm not familiar
>> with NVMe design and I don't know all the details, but maybe the 
>> proper way will
>> be to create special workqueue with MEM_RECLAIM flag to ensure the 
>> progress?
>>
>
> Hi,
>
> I was able to repro it in my lab with ConnectX3. added a dedicated 
> workqueue with high priority but the bug still happens.
> if I add a "sleep 1" after echo 1 
> >/sys/block/nvme0n1/device/reset_controller the test pass. So there is 
> no leak IMO, but the allocation process is much faster than the 
> destruction of the resources.
> In the initiator we don't wait for RDMA_CM_EVENT_DISCONNECTED event 
> after we call rdma_disconnect, and we try to connect immediatly again.
> maybe we need to slow down the storm of connect requests from the 
> initiator somehow to let the target time to settle up.
>
> Max.
>
>
Hi Sagi
Let's use this mail loop to track the OOM issue. :)

Thanks
Yi
>>>
>>> Here is the log before/after OOM
>>> http://pastebin.com/Zb6w4nEv
>>>
>>>> _______________________________________________
>>>> Linux-nvme mailing list
>>>> Linux-nvme at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>>
>>>
>>> _______________________________________________
>>> Linux-nvme mailing list
>>> Linux-nvme at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme




More information about the Linux-nvme mailing list