target crash / host hang with nvme-all.3 branch of nvme-fabrics
Steve Wise
swise at opengridcomputing.com
Tue Jun 21 07:18:43 PDT 2016
> On 17/06/16 20:20, Ming Lin wrote:
> > On Fri, Jun 17, 2016 at 7:16 AM, Steve Wise <swise at opengridcomputing.com>
> wrote:
> >>>
> >>>>
> >>>> Steve, is this something that started happening recently? does the
> >>>> 4.6-rc3 tag suffer from the same phenomenon?
> >>>
> >>> Where is this tag?
> >>
> >> Never mind. I found it (needed 'git pull -t' for pull the tags from the
gitlab
> >> nvmef repo).
> >
> > I run this overnight,
> >
> > #!/bin/bash
> >
> > while [ 1 ] ; do
> > ifconfig eth5 down ; sleep 15; ifconfig eth5 up; sleep 15
> > done
> >
> > Although the crash is not reproduced, but it triggers another OOM bug.
>
> Which code-base is this? it looks like this code is just leaking queues.
> Obviously something changed...
>
Yoichi has hit what is apparently the same OOM bug:
Kernel panic - not syncing: Out of memory and no killable processes...
He hit it with NVMF on the host, _and_ with NVME/PCI with a local SSD. This is
all with using nvmf-all.3 + Christoph's fix for the target queue deletion crash
(http://lists.infradead.org/pipermail/linux-nvme/2016-June/005075.html).
Ming, any luck on isolating this? I'm going to enable kernel memory leak
detection and see if I can figure this out.
Steve.
More information about the Linux-nvme
mailing list