spinlock recursion in aio_complete()
Russell King (Oracle)
linux at armlinux.org.uk
Tue May 23 03:51:45 PDT 2023
On Tue, May 23, 2023 at 12:24:04PM +0200, Helge Deller wrote:
> On 5/22/23 23:22, Helge Deller wrote:
> > > > It hangs in fs/aio.c:1128, function aio_complete(), in this call:
> > > > spin_lock_irqsave(&ctx->completion_lock, flags);
> > >
> > > All code that I found and that obtains ctx->completion_lock disables IRQs.
> > > It is not clear to me how this spinlock can be locked recursively? Is it
> > > sure that the "spinlock recursion" report is correct?
> >
> > Yes, it seems correct.
> > [...]
>
> Bart, thanks to your suggestions I was able to narrow down the problem!
>
> I got LOCKDEP working on parisc, which then reports:
> raw_local_irq_restore() called with IRQs enabled
> for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen.
>
> Finally, I found that parisc's flush_dcache_page() re-enables the IRQs
> which leads to the spinlock hang in aio_complete().
>
> So, this is NOT a bug in aio or scsci, but we need fix in the the arch code.
You can find some of the background to this at:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=16ceff2d5dc9f0347ab5a08abff3f4647c2fee04
which introduced flush_dcache_mmap_lock(). It looks like Hugh had
questions over whether this should be _irqsave() rather than _irq()
but I guess at the time all callers had interrupts enabled, and
it's only recently that someone came up with the idea of calling
flush_dcache_page() with interrupts disabled.
Adding another arg to flush_dcache_mmap_lock() to save the flags
may be doable, but requires a patch that touches not only architectures
that have a private implementation, but also various code in mm/.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
More information about the linux-arm-kernel
mailing list