Issue while oops and panic message logging to MTD partition

Boris Brezillon boris.brezillon at bootlin.com
Tue May 15 05:07:40 PDT 2018


Hi,

On Tue, 15 May 2018 10:49:15 +0000
Jagdish Gediya <jagdish.gediya at nxp.com> wrote:

> Hi,
> 
> Setup details:
> Board - Freescale ls1046ardb(ARM64)
> MTD device - nand(IFC)
> 
> CONFIG_MTD_OOPS is enabled to collect oops and panic logs. 
> Added bootargs to collect logs : mtdoops.mtddev=3 mtdoops.record_size=16384
> 
> Issue:
> Kernel hangs during oops log collection in function :fsl_ifc_run_command".
> Below is the code location where it hangs exactly,
> 
> /*
>  * execute IFC NAND command and wait for it to complete
>  */
> static void fsl_ifc_run_command(struct mtd_info *mtd)
> {		.
> 		.
> 		.
> 		.
> 		.
> 
>         /* wait for command complete flag or timeout */
>         wait_event_timeout(ctrl->nand_wait, ctrl->nand_stat,
>                            msecs_to_jiffies(IFC_TIMEOUT_MSECS));
> 
> 		.
> 		.
> 		.
> 		.
> }
> 
> "wait_event_timeout" is the exact culrit where kernel hangs. As panic(...) disables the local interrupt by calling local_irq_disable(),
> It looks like behavior is expected because timer interrupts are disabled and because of that "wait_event_timeout" hangs forever.
> 
> The odd behaviour is sometimes "wait_event_timeout" does not hang. The reason could be being a multicore processor, some other core would receive the
> timer interrupt and as a result "wait_event_timeout" gets unblocked.
> 
> How the other driver accomplish the timer replated work if any during the panic path or in general when local interrupts are disabled?

MTD_OOPS is just a mess, and I'm sure most driver simply don't support
it properly. If you still want to use the feature, you'll probably have
to fallback to status polling instead of using wait_event_timeout().
See what the core does here [1].

Still, I'd recommend not using MTD_OOPS if possible, because I fear
that's not the only problem you'll face. One problem I see is that the
locking is completely bypassed when ->panic_write() is called, and your
->cmdfunc() might be called while another operation is still in
progress (PROGRAM, ERASE, READ...) in order to get the NAND status.
Looking at the ifc code, it seems the driver is not ready to cope with
that.

Regards,

Boris

[1]https://elixir.bootlin.com/linux/v4.17-rc5/source/drivers/mtd/nand/raw/nand_base.c#L648



More information about the linux-mtd mailing list