Probs with port to 2.4.18

Thu Dec 4 09:55:09 EST 2003

On Wed, 2003-12-03 at 12:23, Joan Dyer wrote:

> I had first asked about write failures (mtd/chips/cfi_cmdset_0002)

I can't find where you asked about cfi_cmdset_0002.c - only this thread:

http://lists.infradead.org/pipermail/linux-mtd/2003-November/008843.html

It looks like it might be a reply to another thread, but it isn't
connected.  Mailer breaking In-Reply-To?

>  and 
> took the current, updated, version of this file.

Which version is "current?"

>   I get a timeout on the 
> write (timeout vals increased) with the "Wacky" message (unable to decode 
> failure state). 
> 
> However what seems to be the underlying problem is a non-coordinated 
> write, while/during erase.

Does the device you are using support erase suspend?

>    With my port, wbuf.c (which references the 
> erase_completion lock) is not used. 
> 
> Should I protect the write in writev.c with a call on the lock?

I can't comment on this, unfortunately.  I do not have applications that
do concurrent erase/writes and so I can't test this.  There was a rather
large restructuring of cfi_cmdset_0002.c in version 1.92.  It is
possible that some of the erase suspend/write logic isn't sound.

> Is there any output or information that I can provide?   Here is a portion 
> of dmesg:
> 
> inocache for ino #29631 is all gone now. Freeing
> Removed nodes in range 0x00190000-0x001a0000 from ino #29632
> inocache for ino #29632 is all gone now. Freeing
> Removed nodes in range 0x00190000-0x001a0000 from ino #29633
> inocache for ino #29633 is all gone now. Freeing
> Removed nodes in range 0x00190000-0x001a0000 from ino #29634
> inocache for ino #29634 is all gone now. Freeing
> Removed nodes in range 0x00190000-0x001a0000 from ino #29635
> Erase completed successfully at 0x00190000
> MTD do_write_oneword(): Wacky!  Unable to decode failure status 
> uWriteTimeout=1001  GUESSDELAYT=100
> MTD do_write_oneword(): 0x00186524(0x00001985): 0x0000ffff 0x0000ffff 
> 0x0000ffff  0x0000ffff

0x1985 was supposed to be written to 0x186524.  do_write_oneword()
likely thought the operation was complete because it didn't see the
status bit toggle.  The problem is detected because 0xffff != 0x1985. 
In fact the last four times that the location was read the value was
0xffff (which is an erase value and the four values being identical
indicate the status was not toggling).

> Write3 of 48 bytes at 0x00186524 failed. returned -5, retlen 0
> jffs2_write_dirent returning node at c3803970
> Write4: Not marking the space at 0x00186524 as dirty because the flash 
> driver returned retlen zero
> raw: next_in_ino=c3803980, nextphys=00000000, flashoff=186524, totlen=30
> Verifying erase at 0x00190000
> Writing erased marker to block at 0x00190000
> jffs2_erase_pending_blocks completed
> 
> In order to produce this I've run an app that keeps writing to flash, very 
> small files, writing a temp then renaming it.

More information would be helpful (a patch even better =).  Right now
the best I can do is read through the code and think real hard about
where it might not work - something that doesn't always have good
results.  I just don't have a test case that I can exam with the
hardware/applications I have at my disposal 8-(.

-- 
Thayne Harbaugh
Linux Networx
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.infradead.org/pipermail/linux-mtd/attachments/20031204/05b93483/attachment.bin