[Patch] fix MTD CFI/LPDDR flash driver huge latency bug

Stefani Seibold stefani at seibold.net
Sat Mar 13 12:00:44 EST 2010


Am Samstag, den 13.03.2010, 06:25 -0500 schrieb Andrew Morton:
> On Sat, 13 Mar 2010 13:31:30 +0100 Stefani Seibold <stefani at seibold.net> wrote:
> 
> > Am Freitag, den 12.03.2010, 14:23 -0800 schrieb Andrew Morton:
> > > On Sat, 06 Mar 2010 17:48:57 +0100
> > > Stefani Seibold <stefani at seibold.net> wrote:
> > > 
> > > The patch change all the use of spin_lock operations for xxxx->mutex
> > > > into mutex operations, which is exact what the name says and means. 
> > > > 
> > > > There is no performance regression since the mutex is normally not
> > > > acquired.
> > > 
> > > hm, big scary patch.  Are you sure this mutex is never taken from
> > > atomic or irq contexts?  Is it ully tested with all relevant debug options
> > > and lockdep enabled?
> > > 
> > > 
> > 
> > I have analyzed this drivers and IMHO i don't think there will be used
> > from irq or atomic contexts. There is no request interrupt and there are
> > a lot msleep and add_wait_queues/schedule calls during holding the
> > mutex, which are not very useful in a irq or atomic context. But i don't
> > know the whole mtd stack. 
> > 
> > I tested the patch with the following kernel debug options:
> > 
> > CONFIG_DEBUG_KERNEL=y
> > CONFIG_DETECT_SOFTLOCKUP=y
> > CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
> > CONFIG_SCHED_DEBUG=y
> > CONFIG_SCHEDSTATS=y
> > CONFIG_TIMER_STATS=y
> > CONFIG_DEBUG_MUTEXES=y
> > CONFIG_DEBUG_SPINLOCK_SLEEP=y
> > 
> 
> Neato.  As was mentioned, one thing to check is the mtdoops path. 
> oopses can happen with locks held, from IRQ context, etc.
> 

Okay, i didn't checked that case. But the old code has also a dead lock,
if the oops occurred during the spinlock(xxx->mutex) was held. With the
new mutex solution the change is bigger to run into that deadlock due
the possible preemption. 

But i did a "grep" at the whole mtd code and there is no panic_write
function assigned to mtd_info struct for the CFI flash chips. So this
problem will currently never occure.

> If we're trying to take that mutex in oops context then I guess that's
> fixable by just not taking it and hoping for the best.  Or, better,
> mutex_trylock() and conditional mutex_unlock() to try to be nice to
> possible concurrent activity on other CPUs.
> 

Concurrent access are dangerous and in most cases are not possible,
that's why the spinlock(xxxx->mutex) was for.

I also did some concurrency checks like:

cat /dev/zero >/flash/aa & cat /dev/zero >/flash/bb

without and side effects.





More information about the linux-mtd mailing list