spinlock recursion in aio_complete()

Helge Deller deller at gmx.de
Tue May 23 03:24:04 PDT 2023


On 5/22/23 23:22, Helge Deller wrote:
>>> It hangs in fs/aio.c:1128, function aio_complete(), in this call:
>>>      spin_lock_irqsave(&ctx->completion_lock, flags);
>>
>> All code that I found and that obtains ctx->completion_lock disables IRQs.
>> It is not clear to me how this spinlock can be locked recursively? Is it
>> sure that the "spinlock recursion" report is correct?
>
> Yes, it seems correct.
> [...]

Bart, thanks to your suggestions I was able to narrow down the problem!

I got LOCKDEP working on parisc, which then reports:
	raw_local_irq_restore() called with IRQs enabled
for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen.

Finally, I found that parisc's flush_dcache_page() re-enables the IRQs
which leads to the spinlock hang in aio_complete().

So, this is NOT a bug in aio or scsci, but we need fix in the the arch code.


While checking flush_dcache_page() re-enables IRQs, I see on parisc and ARM(32):
flush_dcache_page()  calls:
   -> flush_dcache_mmap_lock()   /  flush_dcache_mmap_unlock()
which uses: xa_lock_irq()	/  xa_unlock_irq()

So, the call to xa_unlock_irq() re-enables the IRQs unconditionally
and triggers the hang in aio_complete().

I temporarily #defined flush_dcache_mmap_lock() to NOP and the kernel booted nicely.

Not sure yet what the best fix is...

Helge



More information about the linux-arm-kernel mailing list