Atmel Nand PMECC UBI ECC issue
Olivier Schonken
olivier.schonken at gmail.com
Tue Mar 27 01:28:52 PDT 2018
Hi Richard
The bytes in question is at offset 36c90.
Attached is the dump without the OOB.
Regards
Olivier
On Mon, Mar 26, 2018 at 10:07 PM, Richard Weinberger <richard at nod.at> wrote:
> Oliver,
>
> Am Montag, 26. März 2018, 16:56:17 CEST schrieb Olivier Schonken:
>> Sorry for the resend, seems my gmail editor was in HTML mode which got
>> rejected by the mailing list. Humble apologies.
>>
>> I have run into an issue with the Atmel nand controller on the
>> SAMA5D36, which I am struggling to debug.
>>
>> We are using custom hardware based on the SAMA5D36. With Micron
>> MT29F8G08ABBCAH4 NAND flash. Kernel version is 4.14.29 - mainline
>> from kernel.org. ECC strength is 24 bits with 1024 byte sector size.
>> The PMECC settings was calculated as per
>> https://www.at91.com/linux4sam/bin/view/Linux4SAM/PmeccConfigure, with
>> the nand HEADER value at 0xc0e18e05.
>>
>> The system works, and only some units present the error, the baffling
>> part of it, is that a unit can work properly for a long while, and
>> then suddenly the error presents itself. (Once traced it to a glibc
>> library file, which means it isn't even due to heavy writing on the
>> filesystem.) I have noticed that most of the time the PEB in which the
>> error occurs is the same. Even after reprogramming the device via
>> ubiformat, or SAM-BA.
>>
>> In the attached log output, you will see that there is a UBIFS error,
>> where it detects a bitflip, which I confirmed by comparing the binary
>> sequence to the Buildroot generated ubi file.
>>
>> Using Atmel's SAM-BA to read back the contents of the NAND flash,
>> yields the correct contents for the page causing the ECC error.
>>
>> 31 18 10 06 00 FE A2 74 FB CF 00 00 00 00 00 00 C5 05 00 00 01 00 00
>> 00 AB 0C 00 00
>
> At which offset it this?
>
>> Starting up linux again results in the same issue.
>> This extract shows the ubifs magic number with the bitflip. The rest
>> of the binary sequence matches a unique part of the ubi image.
>>
>> [ 75.140000] 7fe0: b6f8f8e4 becf7a40 b6be5788 b6e9c000 60000010 ffffffff
>> [ 75.150000] UBIFS error (ubi0:0 pid 1): ubifs_check_node: bad magic
>> 0x6101830, expected 0x6101831
>> [ 75.160000] UBIFS error (ubi0:0 pid 1): ubifs_check_node: bad node
>> at LEB 325:216208
>> [ 75.160000] Not a node, first 24 bytes:
>> [ 75.160000] 00000000: 30 18 10 06 00 fe a2 74 fb cf 00 00 00 00 00
>> 00 c5 05 00 00 01 00 00 00
>> 0......t................
>> [ 75.180000] CPU: 0 PID: 1 Comm: systemd Not tainted 4.14.29+ #706
>>
>> mtdinfo for the partition in question
>> Type: nand
>> Eraseblock size: 262144 bytes, 256.0 KiB
>> Amount of eraseblocks: 2048 (536870912 bytes, 512.0 MiB)
>> Minimum input/output unit size: 4096 bytes
>> Sub-page size: 4096 bytes
>> OOB size: 224 bytes
>> Character device major/minor: 90:10
>> Bad blocks are allowed: true
>> Device is writable: true
>>
>> Device tree entry:
>> nand_controller: nand-controller {
>> status = "okay";
>>
>> nand at 3 {
>> reg = <0x3 0x0 0x800000>;
>> atmel,rb = <0>;
>> nand-bus-width = <8>;
>> nand-ecc-mode = "hw";
>> nand-ecc-strength = <24>;
>> nand-ecc-step-size = <1024>;
>> nand-on-flash-bbt;
>> label = "atmel_nand";
>> };
>> };
>>
>> Attached are the dmesg traces with the ECC issue. A nanddump of the
>> block with the ECC error, including OOB contents as per "nanddump -f
>> nandblock-withoob.ubi /dev/mtd5 -s 0x51c0000 -o -l 262144 &>
>> nanddump-cmdline-output.txt"
>
> Can you please share the dump without OOB?
> UBI does not use OOB, so we don't need it and can use offsets as seen by UBI
> and UBIFS as-is. :)
>
> Thanks,
> //richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nandblock.ubi
Type: application/octet-stream
Size: 262144 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20180327/bca9357c/attachment-0001.obj>
More information about the linux-mtd
mailing list