Numonyx NOR and chip->mutex bug?

Thu Feb 3 18:18:44 EST 2011

Hi

Now I have a patch that fix my problem. I did not yet do a lot of tests 
and I did
not yet analyzed correctness in all cases.
I check in the inval_cache_and_wait_for_operation() of the status is 
status_OK
the if the chip->state and the chip_state (the current chip state and 
the chip state
when the function is called) are equal or not. If equal break and exit 
(set the
chip->state= FL_STATUS) and if not equal add the thread in the wait queue.

I think we are not finished when not equal so we have do not have to 
take the mutex.

Now why this code worked with the older Flash chips. I don't know, but 
eventually
they are slower after the write buffer command 0xe8 and they are still 
busy when
inval_cache_and_wait_for_operation() checks if status_OK.
Tomorrow I'll check if this is the case. Now I do not have access to the HW.

Regards Stefan

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c 
b/drivers/mtd/chips/cfi_cmdset_0001.c
index 28fc5c2..723985c 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c

@@ static int inval_cache_and_wait_for_operation(
      for (;;) {
          status = map_read(map, cmd_adr);
-        if (map_word_andequal(map, status, status_OK, status_OK))
+        if (map_word_andequal(map, status, status_OK, status_OK)) {
+            if (chip->state != chip_state) {
+                goto unequal_states;
+            }
              break;
+        }

          if (!timeo) {

@@  static int inval_cache_and_wait_for_operation(
              cond_resched();
              timeo--;
          }
          mutex_lock(&chip->mutex);

+unequal_states:
          while (chip->state != chip_state) {


Am 02/03/2011 05:38 PM, schrieb Stefan Bigler:
> Hi
>
> I did some more anaysis of the write probem when creating a ubivolume.
> I come to the conclusion that there must be a problem in the fuction
> inval_cache_and_wait_for_operation().
>
> Why I think this. In the log below you see that in the first part the
> do_erase_oneblock() start the erasing. In the function
> inval_cache_and_wait_for_operation() this thread is interrupted.
>
> A do_write_buffer() suspends the erase by chip_ready() and runs until
> cmd 0xe8 and waits for completion in WAIT_TIMEOUT.
>
> Here again the do_erase_oneblock() continues to run and runs a
> erase resume sequence to the end of erase the block.
> --> THIS IS WRONG.
>
> Now the do_write_buffer() continue but it ends in a write error.
>
>
> When reading the inval_cache_and_wait_for_operation() there are 2
> places were the mutex is unlocked
>
> 1) while INVALIDATE_CACHED_RANGE()
> 2) if status is not status_OK
>
> My measurement shows that the mutex is taken in the first place.
>
> Now I do expect that this method should make sure that the erasing
> thread get "suspended" and the writing task can continue.
> But this is not the case.
> There is code were we compare the chip_state and chip->state.
> But we are not hitting this code.
>
> This is the summary of my analysis for today.
> I was asking me why this code is working on other chips?
>
> Regards
> Stefan
>
>
> do_erase_oneblock() begin
> [1507][0209] do_erase_oneblock start adr=0x00000000 len=0x20000
> [1507][0209] map_write 0x50 to 0x00000000
> [1507][0209] map_write 0x20 to 0x00000000
> [1507][0209] map_write 0xd0 to 0x00000000
> [1507][0209] do_erase_oneblock before INVAL_CACHE_AND_WAIT 
> adr=0x00000000 len=0x20000
> [1507][0209] inval_cache_and_wait_for_operation short unlock
>
> do_write_buffer() begin
> [1507][0465] map_write 0xb0 to 0x03fe4c00
> [1515][0465] erase suspend 1         adr=0x03fe4c00
> [1515][0465] map_write 0x70 to 0x03fe4c00
> [1515][0465] map_write 0xe8 to 0x03fe4c00
> [1515][0465] buffer write before 0xe8 WAIT_TIMEOUT
> [1515][0465] inval_cache_and_wait_for_operation short unlock
>
> do_erase_oneblock() continue & finish
> [1515][0209] inval_cache_and_wait_for_operation short lock
> [1515][0209] do_erase_oneblock after  INVAL_CACHE_AND_WAIT 
> adr=0x00000000 len=0x20000
> [1515][0209] map_write 0x70 to 0x00000000
> [1515][0209] map_write 0x50 to 0x00000000
> [1515][0209] map_write 0xd0 to 0x00000000
> [1515][0209] map_write 0x70 to 0x00000000
> [1523][0209] erase resumed 2b        adr=0x00000000
> [1527][0209] do_erase_oneblock end   adr=0x00000000 len=0x20000
>
> do_write_buffer() continue & error
> [1527][0465] inval_cache_and_wait_for_operation short lock
> [1531][0465] buffer write after  0xe8 WAIT_TIMEOUT
> [1531][0465] buffer write data to buf words=511
> [1531][0465] buffer write data buf done
> [1531][0465] map_write 0xd0 to 0x03fe4c00
> [1531][0465] 50000000.flash: buffer write before INVAL_CACHE_AND_WAIT 
> adr=0x03fe5000 len=0x400
> [1531][0465] inval_cache_and_wait_for_operation short unlock
> [1531][0465] inval_cache_and_wait_for_operation short lock
> [1531][0465] inval_cache_and_wait_for_operation long unlock
> [1543][0465] inval_cache_and_wait_for_operation long lock
>
> [2503][0465] inval_cache_and_wait_for_operation long unlock
> [2511][0465] inval_cache_and_wait_for_operation long lock
> [2511][0465] 50000000.flash: buffer write after  INVAL_CACHE_AND_WAIT 
> adr=0x03fe5000 len=0x400
> [2511][0465] map_write 0x50 to 0x03fe4c00
> [2511][0465] map_write 0x70 to 0x03fe4c00
> [2519][0465] 50000000.flash: buffer write error status 0xb0 
> adr=0x03fe5000 len=0x400
>
> Am 02/03/2011 04:24 PM, schrieb Michael Cashwell:
>> On Feb 3, 2011, at 4:50 AM, Joakim Tjernlund wrote:
>>
>>>>> My failures were caused by the erase-resume status read "seeing" a 
>>>>> non-busy WSM. Adding a 20µs delay before that read (actually after 
>>>>> the command write, but it amounts to the same thing) prevents this 
>>>>> from happening. By the time awakened "wait-for-completion" code 
>>>>> does its first status read it sees the WSM as busy as it should 
>>>>> until the erase actually finishes.
>>> Perhaps you can test for ESS(SR.6) too?
>>> something like:
>>>   if DWS&&  !ESS
>> Hmmm. So instead of a blind delay it would loop (with a timeout) 
>> while DWS&&  ESS are both set (meaning the erase resume command just 
>> written hasn't yet taken effect). It does make sense that if the 
>> resume takes time to drop SR.7 (ready) that it would also take time 
>> to drop SR.6 (ESS).
>>
>> I like that. I'll add that to the set of tests today and report back.
>>
>> -Mike
>>
>