Regarding latest EUCLEAN/bitflip_threshold patchset

Brian Norris computersforpeace at gmail.com
Wed May 16 02:49:17 EDT 2012


Hi,

On Sat, May 12, 2012 at 1:13 PM, Shmulik Ladkani
<shmulik.ladkani at gmail.com> wrote:
> On Sat, 12 May 2012 11:37:37 -0700 Mike Dunn <mikedunn at newsguy.com> wrote:
>> On 05/11/2012 04:51 AM, Artem Bityutskiy wrote:
>> > From nand_base.c:
>> >
>> >     if (mtd->ecc_stats.failed - stats.failed)
>> >             return -EBADMSG;
>> >
>> >     return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
>> >
>> > - May drivers increment mtd->ecc_stats.{corrected,failed} during their
>> >   ecc.read_oob() call?
>>
>> Currently no nand drivers increment stats.corrected for oob-only reads.  Since
>> nand_do_read_oob() does not read page data, stats never increment and -EUCLEAN
>> is never returned.  To avoid complicating the issue, I ignored the case of
>> reading oob-only.

My out-of-tree driver increments ecc_stats.corrected.

>> > - If so, can we (should we?) report EUCLEAN according to the
>> >   bitflip_threshold in this case?
>>
>> I guess it depends on how widespread is the desire or capability of performing
>> ecc on oob-only reads.  The new diskonchip devices (docg3, docg4) are capable of
>> performing ecc on oob-only data.  These can do one bit corrections over 15 (of
>> the 16 total) oob bytes using the hamming algorithm (though neither driver
>> supports it currently).  But since in this case only one bitflip can be
>> corrected, it will always be below bitflip_threshold.  Then there's the question
>> of how do you interpret uncorrectible bitflips vis-a-vis eraseblock health when
>> using a weaker ecc algorithm for oob-only.
>
> I see.
> So the current bitflip_threshold scheme is probably not applicable to
> 'nand_do_read_oob' - because the strength over the OOB would probably
> differ from the page's ECC strength.
>
>> These questions are currently all theoretical.  I think the threshold test
>> should be removed, and replaced with 'return 0', at least for now.
>
> Well, I was also surprised to see that 'nand_do_read_oob' may return
> EUCLEAN or EBADMSG at all.
>
> Digging further, I found out it was a relatively recent addition:
> [041e4575 mtd: nand: handle ECC errors in OOB] by Brian Norris.
>
> Brian, care to elaborate regarding 041e4575, and comment how do you
> think it should be ported to the new bitflip_threshold mechanism, if at
> all?

Hmm, well 041e4575 was designed without much of a window into how
others really needed it, as I didn't know of others who had the same
features. My hardware has its own threshold features that can be used
to mask bitflips; it has ECC that covers OOB at the same time as the
page data; when reading OOB only, it actually reads the page data as
well, in order to perform ECC properly. So when I report bitflips from
read_oob, I'm reporting the bitflips for the entire page+OOB sector.
But due to my hardware-based threshold, this only is reported for a
high number of bitflips.

So, I'm not sure how to properly reconcile the new threshold code, the
nand_do_read_oob() EUCLEAN and EBADMSG, and various schemes for
OOB-only ECC (or the common case of no ECC for OOB-only). I'll try to
give this some more thought and get back to you. But please comment if
my feedback so far stirs any ideas with you guys. Perhaps 041e4575 was
not as clean as I thought in the first place.

Brian



More information about the linux-mtd mailing list