[PATCH] mtd: nand: default bitflip-reporting threshold to 75% of correction strength

Tue Jan 13 10:51:40 PST 2015

Brian,

Am 13.01.2015 um 19:48 schrieb Brian Norris:
> Hi Richard,
> 
> On Tue, Jan 13, 2015 at 02:25:30PM +0100, Richard Weinberger wrote:
>> Am 12.01.2015 um 21:51 schrieb Brian Norris:
>>> The MTD API reports -EUCLEAN only if the maximum number of bitflips
>>> found in any ECC block exceeds a certain threshold. This is done to
>>> avoid excessive -EUCLEAN reports to MTD users, which may induce
>>> additional scrubbing of data, even when the ECC algorithm in use is
>>> perfectly capable of handling the bitflips.
>>>
>>> This threshold can be controlled by user-space (via sysfs), to allow
>>> users to determine what they are willing to tolerate in their
>>> application. But it still helps to have sane defaults.
>>>
>>> In recent discussion [1], it was pointed out that our default threshold
>>> is equal to the correction strength. That means that we won't actually
>>> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
>>> are almost too many to handle. It was determined that 3/4 of the
>>> correction strength is probably a better default.
>>>
>>> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
>>
>> I like this change but I have one question.
>>
>> UBI will treat a block as bad if it shows bitflips (EUCLEAN) right
>> after erasure.
> 
> Can you elaborate? When "after erasure"? The closest I see is that UBI
> will mark a block bad if it sees an -EIO failure from sync_erase() in
> erase_worker(). If you have extra debug checks on, then
> ubi_self_check_all_ff() could potentially give you bitflip problems
> after the erase, but that's an odd corner case anyway, which many
> drivers have been handling in hacked together ad-hoc ways anyway (search
> for "bitflips in erase pages").
> 
> So I can't pinpoint what you're talking about, exactly.

See torture_peb()
out:
        mutex_unlock(&ubi->buf_mutex);
        if (err == UBI_IO_BITFLIPS || mtd_is_eccerr(err)) {
                /*
                 * If a bit-flip or data integrity error was detected, the test
                 * has not passed because it happened on a freshly erased
                 * physical eraseblock which means something is wrong with it.
                 */
                ubi_err(ubi, "read problems on freshly erased PEB %d, must be bad",
                        pnum);
                err = -EIO;
        }

>> For SLC NAND this works very well.
>> Does this also hold for MLC NAND? If one or two bit flips are okay
>> even for a freshly erased MLC NAND this change could cause UBI to
>> mark good blocks as bad depending on the ECC strength.
> 
> I would typically assume that MLC NAND users must be using significantly
> stronger ECC (e.g., 12-bit / 512-byte, at least), so "one or two
> bitflips" would still fall well under 75% of 12 bits.

Same here. I just want to make sure that UBI does not assume a perfect NAND world. :)

Thanks,
//richard