[PATCH v6] mtd: gpmi: Deal with bitflips in erased regions regions

Elie De Brauwer eliedebrauwer at gmail.com
Fri Jan 3 05:34:54 EST 2014


On Fri, Jan 3, 2014 at 10:44 AM, Huang Shijie <b32955 at freescale.com> wrote:
> On Fri, Jan 03, 2014 at 10:43:41AM +0100, Elie De Brauwer wrote:
>> On Thu, Dec 19, 2013 at 6:10 AM, Huang Shijie <b32955 at freescale.com> wrote:
>> > On Wed, Dec 18, 2013 at 08:07:12PM +0100, Elie De Brauwer wrote:
>> >> The BCH block typically used with a GPMI block on an i.MX28/i.MX6 is only
>> >> able to correct bitflips on data actually streamed through the block.
>> >> When erasing a block the data does not stream through the BCH block
>> >> and therefore no ECC data is written to the NAND chip. This causes
>> >> gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
>> >> found in an erased page. Typically causing problems at higher levels
>> >> (ubifs corrupted empty space warnings). This problem was also observed
>> >> when using SLC NAND devices.
>> >>
>> >> This patch configures the BCH block to mark a block as 'erased' if
>> >> not too much bitflips are found. Next HW_BCH_STATUS0:ALLONES
>> >> is used to check if the data read were all ones, indicating a read of a
>> >> properly erased chunk was performed. If this was not the case a slow path
>> >> is entered where bitflips are counted and corrected in software,
>> >> allowing the upper layers to take proper actions.
>> >>
>> >> Signed-off-by: Elie De Brauwer <eliedebrauwer at gmail.com>
>> >> Acked-by: Peter Korsgaard <peter at korsgaard.com>
>> >
>> > thanks a lot!
>> >
>> > Acked-by: Huang Shijie <b32955 at freescale.com>
>> >
>>
>> Hello Huang, all,
>>
>> Huang, you suggested earlier after checking with your hw guys to make
>> use of the ALLONES bit
>> in the HW_BCH0_STATUS0 register, this looked like a nice solution
>> since this could introduce a fast
>> path. However the documentation did not mention how the ALLONES bit is
>> implemented, including whether
>> or not it takes the erase threshold into account. Obviously V6 of my
>> patch will only function if ALLONES function if _all bits are
>> physically one_ and fail to function if allones something like the AND
>> of all statusses of the chunk.
>>
>> I've been given a board with another bitflip (jay). It propgates as
>> the following error:
>> "UBIFS error (pid 36): ubifs_recover_master_node: failed to recover master node"
>>
>> And if I look at the data I see the LEB consists out of 55 valid
>> master nodes, but two pages after the last valid master node there is
>> a bitflip which causes in ubifs/recovery.c get_master_node() to
>> trigger:
>>                 if (!is_empty(buf, len))
>>                         goto out_err;
>>
>> When dumping the data read I get to see a bitflip:
>> [    4.094719] c8a32e30: ff ff ff fd ff ff ff ff ff ff ff ff ff ff ff
>> ff  ................
>>
>> This leads me to believe that the ALLONES bit is actually not behaving
>> the way I think it was. Implying that it actually is taking the
>> ERASE_THRESHOLD into account. (I verified this by dumping the status
>> register which was 0xff10 and counting the bitflips which actually
>> showed allones to be set while a bitflip was found) Making the allones
>
> I confirmed with the hardware guy just now.
> the hardware guy ate his word!
>
> If we do not apply this patch, does it mean the UBIFS can not work?
>
>

Thanks for doublechecking ! I'll provide a new version of this patch
in the weekend.

If this patch does not get applied you will get bugreports related to
"corrupt empty space"
issues when using UBIFS, for the simple reason that ubifs only claims
to be tolerant to
corruption due to powercuts, and ubifs relies on the mtd layer that
bitflips are detected
and corrected.  In my situation I have already seen two distinct units
(on a 20 unit population
over a timespan of about two months) refuse to boot because of this
issue, which were
both solved by this patch. So ubifs might seem to work, but people
using it will likely bump
into nasty field issues.

my 2 cents
E.
-- 
Elie De Brauwer



More information about the linux-mtd mailing list