[PATCH v5 3/4] mtd: core: protect access to MTD devices while in suspend

Tue Nov 30 05:37:05 PST 2021

On Tue, 30 Nov 2021 14:29:12 +0100
Sean Nyekjaer <sean at geanix.com> wrote:

> On Tue, Nov 30, 2021 at 02:15:51PM +0100, Boris Brezillon wrote:
> > On Tue, 30 Nov 2021 13:41:31 +0100
> > Sean Nyekjaer <sean at geanix.com> wrote:
> >   
> > > > On 29.11.2021 10:41, Sean Nyekjaer wrote:    
> > > > > On Mon, Nov 29, 2021 at 10:19:08AM +0100, Miquel Raynal wrote:    
> > > > >> boris.brezillon at collabora.com wrote on Tue, 23 Nov 2021 14:07:15 +0100:    
> > > > >>> On Tue, 23 Nov 2021 13:50:12 +0100
> > > > >>> Sean Nyekjaer <sean at geanix.com> wrote:    
> > > > >>>> @Boris do we need to do something similar here to what we did with the
> > > > >>>> mtdconcat stuff?    
> > > > >>> Absolutely, physmasp subdevices are never initialized/registered, so
> > > > >>> you can't call any of the mtd helpers taking the suspend lock on those.
> > > > >>> I guess it'd be better to call mtd_suspend/resume() on the concat device
> > > > >>> in though:    
> > > > >> Any chance that you will come up with a fix or v6 of the series anytime
> > > > >> soon?
> > > > >>    
> > > > >>> static void physmap_flash_shutdown(struct platform_device *dev)
> > > > >>> {
> > > > >>> 	struct physmap_flash_info *info = platform_get_drvdata(dev);
> > > > >>>
> > > > >>> 	mtd_suspend(info->cmtd);    
> > > 
> > > There is one more issue when using concat during boot, I think we should
> > > start here :)
> > > It uses uninitialized rwsm.
> > > 
> > > 
> > > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd070]
> > > [    0.000000] Linux version 5.16.0-rc1-00013-g67bcbe202b48 (sean at zen) (aarch64-linux-gnu-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #207 SMP PREEMPT Tue Nov 30 13:22:48 CET 2021
> > > [    0.000000] Machine model: linux,dummy-virt
> > > [    0.000000] efi: UEFI not found.
> > > [    0.000000] NUMA: No NUMA configuration found
> > > 
> > > [ ... ]
> > > 
> > > [    1.848327] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
> > > [    1.848567] Intel/Sharp Extended Query Table at 0x0031
> > > [    1.848999] Using buffer write method
> > > [    1.849237] Concatenating MTD devices:
> > > [    1.849347] (0): "0.flash"
> > > [    1.849637] (1): "0.flash"
> > > [    1.849726] into device "0.flash"
> > > [    1.904985] libphy: Fixed MDIO Bus: probed
> > > [    1.915812] tun: Universal TUN/TAP device driver, 1.6
> > > 
> > > [ ... ]
> > > 
> > > [    6.064628] The code is fine but needs lockdep annotation, or maybe
> > > [    6.064756] you didn't initialize this object before use?
> > > [    6.064857] turning off the locking correctness validator.
> > > [    6.065111] CPU: 0 PID: 49 Comm: kworker/0:1H Not tainted 5.16.0-rc1-00013-g67bcbe202b48 #207
> > > [    6.065273] Hardware name: linux,dummy-virt (DT)
> > > [    6.065572] Workqueue: kblockd blk_mq_run_work_fn
> > > [    6.065879] Call trace:
> > > [    6.065943]  dump_backtrace+0x0/0x1a0
> > > [    6.066057]  show_stack+0x1c/0x30
> > > [    6.066130]  dump_stack_lvl+0x8c/0xb8
> > > [    6.066206]  dump_stack+0x1c/0x38
> > > [    6.066270]  register_lock_class+0x49c/0x4c0
> > > [    6.066352]  __lock_acquire+0x78/0x20cc
> > > [    6.066422]  lock_acquire.part.0+0xe0/0x230
> > > [    6.066497]  lock_acquire+0x6c/0x90
> > > [    6.066562]  down_read+0x58/0x7c
> > > [    6.066626]  mtd_read_oob_std+0x98/0x170
> > > [    6.066703]  mtd_read_oob+0x80/0x13c
> > > [    6.066770]  mtd_read+0x64/0xa0
> > > [    6.066834]  concat_read+0xd4/0x1b0
> > > [    6.066900]  mtd_read_oob_std+0x158/0x170
> > > [    6.066972]  mtd_read_oob+0x80/0x13c
> > > [    6.067038]  mtd_read+0x64/0xa0
> > > [    6.067099]  mtdblock_readsect+0x6c/0x160
> > > [    6.067169]  mtd_queue_rq+0x240/0x460
> > > [    6.067234]  blk_mq_dispatch_rq_list+0x19c/0x830
> > > [    6.067313]  __blk_mq_do_dispatch_sched+0x248/0x2d4
> > > [    6.067397]  __blk_mq_sched_dispatch_requests+0x110/0x170
> > > [    6.067487]  blk_mq_sched_dispatch_requests+0x3c/0x80
> > > [    6.067573]  __blk_mq_run_hw_queue+0x60/0xa0
> > > [    6.067647]  blk_mq_run_work_fn+0x24/0x30
> > > [    6.067718]  process_one_work+0x28c/0x6e4
> > > [    6.067790]  worker_thread+0x78/0x464
> > > [    6.067856]  kthread+0x180/0x190
> > > [    6.067919]  ret_from_fork+0x10/0x20
> > > [    6.068868] ------------[ cut here ]------------
> > > [    6.068993] DEBUG_RWSEMS_WARN_ON(sem->magic != sem): count = 0x100, magic = 0x0, owner = 0xffff528f84c617c1, curr 0xffff528f84c617c0, list not empty
> > > [    6.069592] WARNING: CPU: 0 PID: 49 at kernel/locking/rwsem.c:1302 __up_read+0x1e8/0x270
> > > [    6.069790] Modules linked in:
> > > [    6.069963] CPU: 0 PID: 49 Comm: kworker/0:1H Not tainted 5.16.0-rc1-00013-g67bcbe202b48 #207
> > > [    6.070099] Hardware name: linux,dummy-virt (DT)
> > > [    6.070179] Workqueue: kblockd blk_mq_run_work_fn
> > > [    6.070347] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [    6.070487] pc : __up_read+0x1e8/0x270
> > > [    6.070577] lr : __up_read+0x1e8/0x270
> > > [    6.070660] sp : ffff80001026b740
> > > [    6.070728] x29: ffff80001026b740 x28: 000000000003ff88 x27: 0000000000040000
> > > [    6.070912] x26: ffff80001026ba28 x25: ffff528f84e92000 x24: ffff528f8576d000
> > > [    6.071040] x23: 0000000000000000 x22: 0000000003ff0000 x21: ffff80001026b878
> > > [    6.071165] x20: ffffadee32213000 x19: ffff528f84e918b0 x18: ffffffffffffffff
> > > [    6.071292] x17: 672d33313030302d x16: 3163722d302e3631 x15: ffff80009026b457
> > > [    6.071418] x14: 0000000000000007 x13: ffffadee32236658 x12: 0000000000000384
> > > [    6.071543] x11: 000000000000012c x10: ffffadee32236658 x9 : ffffadee32236658
> > > [    6.071692] x8 : 00000000ffffefff x7 : ffffadee3228e658 x6 : ffffadee3228e658
> > > [    6.071817] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
> > > [    6.071942] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff528f84c617c0
> > > [    6.072094] Call trace:
> > > [    6.072163]  __up_read+0x1e8/0x270
> > > [    6.072238]  up_read+0x44/0x300
> > > [    6.072301]  mtd_read_oob_std+0xd0/0x170
> > > [    6.072374]  mtd_read_oob+0x80/0x13c
> > > [    6.072441]  mtd_read+0x64/0xa0
> > > [    6.072502]  concat_read+0xd4/0x1b0
> > > [    6.072567]  mtd_read_oob_std+0x158/0x170
> > > [    6.072640]  mtd_read_oob+0x80/0x13c
> > > [    6.072706]  mtd_read+0x64/0xa0
> > > [    6.072766]  mtdblock_readsect+0x6c/0x160
> > > [    6.072843]  mtd_queue_rq+0x240/0x460
> > > [    6.072909]  blk_mq_dispatch_rq_list+0x19c/0x830
> > > [    6.072990]  __blk_mq_do_dispatch_sched+0x248/0x2d4
> > > [    6.073073]  __blk_mq_sched_dispatch_requests+0x110/0x170
> > > [    6.073163]  blk_mq_sched_dispatch_requests+0x3c/0x80
> > > [    6.073249]  __blk_mq_run_hw_queue+0x60/0xa0
> > > [    6.073325]  blk_mq_run_work_fn+0x24/0x30
> > > [    6.073421]  process_one_work+0x28c/0x6e4
> > > [    6.073568]  worker_thread+0x78/0x464
> > > [    6.073643]  kthread+0x180/0x190
> > > [    6.073710]  ret_from_fork+0x10/0x20
> > > [    6.073820] irq event stamp: 1529
> > > [    6.073891] hardirqs last  enabled at (1529): [<ffffadee3114576c>] _raw_spin_unlock_irq+0x48/0x9c
> > > [    6.074067] hardirqs last disabled at (1528): [<ffffadee31145b70>] _raw_spin_lock_irq+0xac/0xb0
> > > [    6.074211] softirqs last  enabled at (1444): [<ffffadee30010528>] __do_softirq+0x478/0x5f0
> > > [    6.074350] softirqs last disabled at (1429): [<ffffadee3009f064>] __irq_exit_rcu+0x184/0x1b0
> > > [    6.074506] ---[ end trace 738f7ffe764b66d5 ]---
> > > 
> > > I can solve this with this hack from mtd_set_dev_defaults():
> > > 
> > > diff --git a/drivers/mtd/mtdconcat.c b/drivers/mtd/mtdconcat.c
> > > index f4a4274489b4..a4b69b9558c9 100644
> > > --- a/drivers/mtd/mtdconcat.c
> > > +++ b/drivers/mtd/mtdconcat.c
> > > @@ -693,6 +693,8 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[],	/* subdevices to c
> > >  	concat->mtd.oobavail = subdev[0]->oobavail;
> > >  
> > >  	subdev_master = mtd_get_master(subdev[0]);
> > > +	init_waitqueue_head(&subdev_master->master.resume_wq);
> > > +	init_rwsem(&subdev_master->master.suspend_lock);
> > >  	if (subdev_master->_writev)
> > >  		concat->mtd._writev = concat_writev;
> > >  	if (subdev_master->_read_oob)
> > > @@ -740,6 +742,8 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[],	/* subdevices to c
> > >  		}
> > >  
> > >  		subdev_master = mtd_get_master(subdev[i]);
> > > +		init_waitqueue_head(&subdev_master->master.resume_wq);
> > > +		init_rwsem(&subdev_master->master.suspend_lock);
> > >  		concat->mtd.size += subdev[i]->size;
> > >  		concat->mtd.ecc_stats.badblocks +=
> > >  			subdev[i]->ecc_stats.badblocks;
> > > 
> > > I do not have a great overview of what is happing here with all these
> > > master devices :/
> > > 
> > > Any ideas Boris/Miquel?  
> > 
> > It's the same problem, over and over again: the master device uses MTD
> > helpers on its unregistered/unitialized subdevices. The following diff
> > should fix the issue, but let's be honest, it's just papering over the
> > real design issue, that his, MTD implementations use the MTD helpers
> > assuming nothing needs to be initialized. So, I'm tempted to just drop
> > the series and work an a more robust solution. In the meantime, I guess
> > getting back to a raw-NAND-only fix would make sense.
> >   
> 
> Fine by me, lets drop this series.
> 
> We have +10.000 devices that runs with this patch:
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index 1f0d542d5923..58d48c3070fa 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -4337,7 +4337,6 @@ static int nand_suspend(struct mtd_info *mtd)
>  		ret = chip->ops.suspend(chip);
>  	if (!ret)
>  		chip->suspended = 1;
> -	mutex_unlock(&chip->lock);
>  
>  	return ret;
>  }
> @@ -4350,7 +4349,6 @@ static void nand_resume(struct mtd_info *mtd)
>  {
>  	struct nand_chip *chip = mtd_to_nand(mtd);
>  
> -	mutex_lock(&chip->lock);
>  	if (chip->suspended) {
>  		if (chip->ops.resume)
>  			chip->ops.resume(chip);
> 

But it's abusing the chip lock... Use a wait queue as we did at the MTD
level, and make nand_get_device() wait on this wait queue when the
device is suspended.