Numonyx NOR and chip->mutex bug?

Michael Cashwell mboards at prograde.net
Tue Jan 25 17:03:21 EST 2011


On Jan 25, 2011, at 1:56 PM, Joakim Tjernlund wrote:

> On Jan 25, 2011, at 1:14 PM, Michael Cashwell wrote:
> 
>> With this new part I'm seeing MTD errors that I think I've traced to cfi_cmdset_0001.c that I'd like to ask about.
>> 
>> The error manifests when I write hard to a UBIFS file system on this NOR flash. What I see is a "NOR Flash: buffer write error" and then either "(block locked)" or "(Bad VPP)"
> 
> I think chip hw error(s). These chips has some strange chip errors so you better check the errata for your chip. We have seen similar problems with these newer 65Nm Numonyx chips.

The latest spec update for these is dated Nov 2009. It lists an issue with block lock/unlock (which I've handled separately) but nothing related to buffered programming.

>> Interestingly, this new FLASH part has a write buffer of 512 words while the previous part was 32 words. Thus the write times (and time outs) have also increased by a similar x16 factor. I think this is why this has not been seen before.
> 
> Should not the write time be about the same? What is the point with a bigger buffer otherwise?

It seems the answer is between "the same" and "linear". The 32-word part had a max buffered write time of 880us. The 512-word part's is 4096us. So the timeout increases by 4.65 not 16. That yields faster writes but also the potential for different code paths in inval_cache_and_wait_for_operation().

>> Am I wildly confused in all this? When is dropping the chip->mutex while waiting for lengthy commands needed?
> 
> When you want to suspend an erase to do a read for example. You don't want be be without erase suspend, trust me :)

I believe that. But in that case there's an active command sent to suspend the in-process erase via writing 0xB0 at line 785.

For non-XIP there is no suspension of buffered writes. Yet somehow, in the middle of one, the part goes back to array mode unexpectedly.

>> Input welcome.
> 
> It is unlikely there is a locking problem I think. You only need to lock when testing/changing the chip->state.

Quite possible. This could just be a hardware bug in the chip. But I'm suspicious of that easy answer. We know these parts have longer write times and we know that makes the wait function more likely to schedule than with the older chips.

The fact that the errors stop if I comment out the chip->mutex calls while waiting suggests to me that there's a reentrancy problem. It doesn't mean the locks are wrong or that doing that is a real fix.

I'm going to explore this on Wed. If I find a problem I'll report back.

-Mike




More information about the linux-mtd mailing list