[BUG RESEND] unsuspend failure under load

Sean Nyekjaer sean at geanix.com
Tue Aug 24 00:53:18 PDT 2021


On Mon, Aug 23, 2021 at 10:18:02AM +0200, Miquel Raynal wrote:
> Hello,
> 
> I think we should involve Richard as well in the discussion.
> 
> Sean Nyekjaer <sean at geanix.com> wrote on Tue, 6 Jul 2021 11:35:41 +0200:
> 
> > On Mon, Jul 05, 2021 at 08:58:38AM +0200, Sean Nyekjaer wrote:
> > > Hi Miquel and Sascha,  
> > + Richard + linux-pm
> > > 
> > > We are having some trouble when our i.MX6 unsuspends while writes to
> > > ubifs is in progess. In the log it looks like it syncing the filesystem
> > > before suspend.
> > > 
> > > The SoC a i.MX6ul/ull, the issue is (lucky for us) quite easy to reproduce.
> > > The reproduce script: [0]
> > > Kernel log when it happens: [1]
> > > 
> > > I have bisected the bug to: ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op")
> > > 
> > > Any idea to where I should start looking? or to what happens?
> > > 
> > > Esben have posted to patches that relates to suspend/unsuspend but it
> > > doesn't seem to releated to this issue.
> > > 5bc6bb603b4d ("mtd: rawnand: gpmi: Fix suspend/resume problem")
> > > d70486668cdf ("mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume")
> > > 
> > > /Sean  
> > 
> > nand_resume() is called some time after ubi_io_write tries to write. Thats why
> > mtd_write() is returning -EBUSY.
> 
> Just to be sure:
> - platform resumes
> - your app started a write before being suspended
> - the write gets refused because the suspended state has not been
>   cleared yet
> Am I understanding this issue correctly?

Yes :)

> 
> But I would expect "Filesystems sync" to actually let the lower layers
> the time to flush the data to the storage devices, suspending without
> waiting for this to happen looks strange to me.

Yes, thats why [4] is also fixing the issue. Userspace is freezed and
then disks are synced before suspend.

> 
> Sascha, Richard, what's your input?
> 
> > I have tried patch [3], and it seems to fix it.
> > I think it would be okay to add the retry option, but the mdelay is not
> > obviously a nogo.
> > 
> > Any idea to how we could wait here for the nand_resume() to be called?
> > 
> > @linux-pm:
> > I have noticed "Filsystems sync" happens before "Freezing user space
> > processes".
> > If I apply patch [4] (without [3]), it would also fix our issue. But I
> > don't have en insight in to what impact the change might have.
> > 
> > /Sean
> > 
> > > 
> > > [0]
> > > #!/bin/sh
> > > dd if=/dev/urandom of=/tmp/test50M bs=1M count=50
> > > cp /tmp/test50M /data/ &
> > > echo mem > /sys/power/state
> > > 
> > > [1]
> > > root at iwg26-v2:/data/root# ./ubicrash.sh
> > > 50+0 records in
> > > 50+0 records out
> > > PM: suspend entry (deep)
> > > Filesystems sync: 33.642 seconds
> > > Freezing user space processes ... (elapsed 0.004 seconds) done.
> > > OOM killer disabled.
> > > Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
> > > printk: Suspending console(s) (use no_console_suspend to debug)
> > > <SUSPEND/WAKE>
> > > PM: suspend devices took 0.040 seconds
> > > Disabling non-boot CPUs ...
> > > ubi0 error: ubi_io_write: error -16 while writing 4096 bytes to PEB 544:53248, written 0 bytes
> > > CPU: 0 PID: 69 Comm: kworker/u2:2 Not tainted 5.13.0 #3
> > > Hardware name: Freescale i.MX6 Ultralite (Device Tree)
> > > Workqueue: writeback wb_workfn (flush-ubifs_0_8)
> > > [<c010d9b0>] (unwind_backtrace) from [<c010a28c>] (show_stack+0x10/0x14)
> > > [<c010a28c>] (show_stack) from [<c0970798>] (dump_stack+0xc0/0xdc)
> > > [<c0970798>] (dump_stack) from [<c05dfe10>] (ubi_io_write+0x510/0x6b0)
> > > [<c05dfe10>] (ubi_io_write) from [<c05dcd90>] (ubi_eba_write_leb+0x388/0x910)
> > > [<c05dcd90>] (ubi_eba_write_leb) from [<c05daf34>] (ubi_leb_write+0xd0/0xe8)
> > > [<c05daf34>] (ubi_leb_write) from [<c03cfeb4>] (ubifs_leb_write+0x68/0x104)  
> > 
> > [ ... ]
> > 
> > > UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
> > > UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 10962 of inode 821, error -30
> > > UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
> > > UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 10963 of inode 821, error -30
> > > UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 696 bytes in jhead 2, error -30
> > > UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 0 of inode 819, error -30
> > > UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30  
> > 
> > [3]:
> > diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
> > index 14d890b00d2c..b24c571fa022 100644
> > --- a/drivers/mtd/ubi/io.c
> > +++ b/drivers/mtd/ubi/io.c
> > @@ -268,8 +269,18 @@ int ubi_io_write(struct ubi_device *ubi, const void *buf, int pnum, int offset,
> >  	}
> > 
> >  	addr = (loff_t)pnum * ubi->peb_size + offset;
> > +retry:
> >  	err = mtd_write(ubi->mtd, addr, len, &written, buf);
> >  	if (err) {
> > +		if (retries++ < UBI_IO_RETRIES) {
> > +			ubi_warn(ubi, "error %d while writing %d bytes to PEB %d:%d, written %zd bytes",
> > +				 err, len, pnum, offset, written);
> > +			mdelay(25); yield();
> > +			goto retry;
> > +		}
> > +
> >  		ubi_err(ubi, "error %d while writing %d bytes to PEB %d:%d, written %zd bytes",
> >  			err, len, pnum, offset, written);
> >  		dump_stack();
> > 
> > [4]:
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 32391acc806b..61a213ad5a13 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -563,18 +563,18 @@ static int enter_state(suspend_state_t state)
> >  	if (state == PM_SUSPEND_TO_IDLE)
> >  		s2idle_begin();
> >  
> > -	if (sync_on_suspend_enabled) {
> > -		trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> > -		ksys_sync_helper();
> > -		trace_suspend_resume(TPS("sync_filesystems"), 0, false);
> > -	}
> > -
> >  	pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
> >  	pm_suspend_clear_flags();
> >  	error = suspend_prepare(state);
> >  	if (error)
> >  		goto Unlock;
> >  
> > +	if (sync_on_suspend_enabled) {
> > +		trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> > +		ksys_sync_helper();
> > +		trace_suspend_resume(TPS("sync_filesystems"), 0, false);
> > +	}
> > +
> >  	if (suspend_test(TEST_FREEZER))
> >  		goto Finish;
> >  
> 
> Thanks,
> Miquèl



More information about the linux-mtd mailing list