Hang when running LLVM+clang test suite
David Zarzycki
dave at znu.io
Sun Jan 21 05:49:26 PST 2018
> On Jan 20, 2018, at 21:50, Keith Busch <keith.busch at intel.com> wrote:
>
> On Sat, Jan 20, 2018 at 05:47:06AM -0500, David Zarzycki wrote:
>> Hello NVMe developers,
>>
>> The LLVM+clang the test suite regularly (but not reliably) hangs the kernel (version 4.14.13-300.fc27.x86_64). I don’t see this hang when running the test suite in /tmp (tmpfs) or on a SATA SSD.
>>
>> Here are photos of the console debug info, with the NVMe driver in the backtrace:
>>
>> http://znu.io/dual8168hang.tar
>>
>> Here is another instance of the hang, again with NVMe in the backtrace:
>>
>> http://znu.io/IMG_0362.jpg
>
> It looks like the scheduler is stuck or a task struct is corrupt. I can't
> think of anything off the top of my head what nvme has to do with that,
> though. It just invokes the callback associated with a command and
> doesn't directly manipulate any scheduler structs.
Hi Keith,
Thanks for looking at the backtraces. What other subsystems should I be looking at then?
Given that the LLVM+clang test suite is reliable when built/run in tmpfs, that implies that most of the kernel is reliable. I’ve also run the test suite reliably on an ext4 filesystem on a SATA SSD.
I’ve tried both xfs and ext4 on NVMe and they both crash, which implies that individual filesystems aren't the problem. Please note that the NVMe setup is simple: one partition and no LVM, RAID, bcache, etc.
What’s left at this point? What other combinations or debug parameters should I test?
Thanks for any help you can give,
Dave
More information about the Linux-nvme
mailing list