completion timeouts with pin-based interrupts in QEMU hw/nvme
Keith Busch
kbusch at kernel.org
Wed Jan 18 08:33:05 PST 2023
On Wed, Jan 18, 2023 at 03:04:06PM +0000, Peter Maydell wrote:
> On Tue, 17 Jan 2023 at 19:21, Guenter Roeck <linux at roeck-us.net> wrote:
> > Anyway - any idea what to do to help figuring out what is happening ?
> > Add tracing support to pci interrupt handling, maybe ?
>
> For intermittent bugs, I like recording the QEMU session under
> rr (using its chaos mode to provoke the failure if necessary) to
> get a recording that I can debug and re-debug at leisure. Usually
> you want to turn on/add tracing to help with this, and if the
> failure doesn't hit early in bootup then you might need to
> do a QEMU snapshot just before point-of-failure so you can
> run rr only on the short snapshot-to-failure segment.
>
> https://translatedcode.wordpress.com/2015/05/30/tricks-for-debugging-qemu-rr/
> https://translatedcode.wordpress.com/2015/07/06/tricks-for-debugging-qemu-savevm-snapshots/
>
> This gives you a debugging session from the QEMU side's perspective,
> of course -- assuming you know what the hardware is supposed to do
> you hopefully wind up with either "the guest software did X,Y,Z
> and we incorrectly did A" or else "the guest software did X,Y,Z,
> the spec says A is the right/a permitted thing but the guest got confused".
> If it's the latter then you have to look at the guest as a separate
> code analysis/debug problem.
Here's what I got, though I'm way out of my depth here.
It looks like Linux kernel's fasteoi for RISC-V's PLIC claims the
interrupt after its first handling, which I think is expected. After
claiming, QEMU masks the pending interrupt, lowering the level, though
the device that raised it never deasserted.
More information about the Linux-nvme
mailing list