Numonyx NOR and chip->mutex bug?

Stefan Bigler stefan.bigler at keymile.com
Thu Feb 3 11:38:05 EST 2011


Hi

I did some more anaysis of the write probem when creating a ubivolume.
I come to the conclusion that there must be a problem in the fuction
inval_cache_and_wait_for_operation().

Why I think this. In the log below you see that in the first part the
do_erase_oneblock() start the erasing. In the function
inval_cache_and_wait_for_operation() this thread is interrupted.

A do_write_buffer() suspends the erase by chip_ready() and runs until
cmd 0xe8 and waits for completion in WAIT_TIMEOUT.

Here again the do_erase_oneblock() continues to run and runs a
erase resume sequence to the end of erase the block.
--> THIS IS WRONG.

Now the do_write_buffer() continue but it ends in a write error.


When reading the inval_cache_and_wait_for_operation() there are 2
places were the mutex is unlocked

1) while INVALIDATE_CACHED_RANGE()
2) if status is not status_OK

My measurement shows that the mutex is taken in the first place.

Now I do expect that this method should make sure that the erasing
thread get "suspended" and the writing task can continue.
But this is not the case.
There is code were we compare the chip_state and chip->state.
But we are not hitting this code.

This is the summary of my analysis for today.
I was asking me why this code is working on other chips?

Regards
Stefan


do_erase_oneblock() begin
[1507][0209] do_erase_oneblock start adr=0x00000000 len=0x20000
[1507][0209] map_write 0x50 to 0x00000000
[1507][0209] map_write 0x20 to 0x00000000
[1507][0209] map_write 0xd0 to 0x00000000
[1507][0209] do_erase_oneblock before INVAL_CACHE_AND_WAIT 
adr=0x00000000 len=0x20000
[1507][0209] inval_cache_and_wait_for_operation short unlock

do_write_buffer() begin
[1507][0465] map_write 0xb0 to 0x03fe4c00
[1515][0465] erase suspend 1         adr=0x03fe4c00
[1515][0465] map_write 0x70 to 0x03fe4c00
[1515][0465] map_write 0xe8 to 0x03fe4c00
[1515][0465] buffer write before 0xe8 WAIT_TIMEOUT
[1515][0465] inval_cache_and_wait_for_operation short unlock

do_erase_oneblock() continue & finish
[1515][0209] inval_cache_and_wait_for_operation short lock
[1515][0209] do_erase_oneblock after  INVAL_CACHE_AND_WAIT 
adr=0x00000000 len=0x20000
[1515][0209] map_write 0x70 to 0x00000000
[1515][0209] map_write 0x50 to 0x00000000
[1515][0209] map_write 0xd0 to 0x00000000
[1515][0209] map_write 0x70 to 0x00000000
[1523][0209] erase resumed 2b        adr=0x00000000
[1527][0209] do_erase_oneblock end   adr=0x00000000 len=0x20000

do_write_buffer() continue & error
[1527][0465] inval_cache_and_wait_for_operation short lock
[1531][0465] buffer write after  0xe8 WAIT_TIMEOUT
[1531][0465] buffer write data to buf words=511
[1531][0465] buffer write data buf done
[1531][0465] map_write 0xd0 to 0x03fe4c00
[1531][0465] 50000000.flash: buffer write before INVAL_CACHE_AND_WAIT 
adr=0x03fe5000 len=0x400
[1531][0465] inval_cache_and_wait_for_operation short unlock
[1531][0465] inval_cache_and_wait_for_operation short lock
[1531][0465] inval_cache_and_wait_for_operation long unlock
[1543][0465] inval_cache_and_wait_for_operation long lock

[2503][0465] inval_cache_and_wait_for_operation long unlock
[2511][0465] inval_cache_and_wait_for_operation long lock
[2511][0465] 50000000.flash: buffer write after  INVAL_CACHE_AND_WAIT 
adr=0x03fe5000 len=0x400
[2511][0465] map_write 0x50 to 0x03fe4c00
[2511][0465] map_write 0x70 to 0x03fe4c00
[2519][0465] 50000000.flash: buffer write error status 0xb0 
adr=0x03fe5000 len=0x400

Am 02/03/2011 04:24 PM, schrieb Michael Cashwell:
> On Feb 3, 2011, at 4:50 AM, Joakim Tjernlund wrote:
>
>>>> My failures were caused by the erase-resume status read "seeing" a non-busy WSM. Adding a 20µs delay before that read (actually after the command write, but it amounts to the same thing) prevents this from happening. By the time awakened "wait-for-completion" code does its first status read it sees the WSM as busy as it should until the erase actually finishes.
>> Perhaps you can test for ESS(SR.6) too?
>> something like:
>>   if DWS&&  !ESS
> Hmmm. So instead of a blind delay it would loop (with a timeout) while DWS&&  ESS are both set (meaning the erase resume command just written hasn't yet taken effect). It does make sense that if the resume takes time to drop SR.7 (ready) that it would also take time to drop SR.6 (ESS).
>
> I like that. I'll add that to the set of tests today and report back.
>
> -Mike
>




More information about the linux-mtd mailing list