bad block markers + ONFI

Thu Nov 10 19:52:37 EST 2011

Hello,

I've wondered for a while what the MTD community expects from ONFI
NAND with respect to bad block markers and bad block scanning. I'll
try to enumerate some observations, followed by some recommendations
and rationale. Please comment, and maybe I'll code one of these
options shortly. I'd like to settle the high level points first
though.

Observations:

(A) ONFI spec says the host should scan the 1st OOB byte of the 1st
and last pages of each block. See reference [1] for exact text.
(B) There are some ONFI parts whose data sheets do not list bad block
scanning specifications. Presumably these are inheriting the ONFI
definition as stated in (A).
(C) There are many ONFI parts whose data sheets list their own bad
block scanning specifications that do not match (A) exactly. See
reference [3] for examples.

Currently, we don't follow (A) for NAND that reports ONFI
compatibility. In fact, we do not even have a flag that gives the
option for scanning 1st and last pages of a device (this can be
overcome pretty easily). Instead, after ONFI detection, nand_base
proceeds to its regular BBM code. This causes different manufacturers'
chips to be scanned according to their non-ONFI rules.

Now, I was considering trying to implement (A) more strictly, so that
if the chip reports ONFI compatibility, we scan 1st and last pages.
This would help define the otherwise ambiguous behavior for parts from
(B), which might otherwise default to the rules already in
nand_base.c. On the other hand, this would also modify the current
"established" behavior as well as violate the contradicting
definitions (as in (C)).

So I came up with a few options:
(a) Implement (A) for all ONFI-capable NAND
(b) Implement a flag for (A) without enforcing it for all ONFI NAND
(allow driver to specify, perhaps?)
(c) Make no change

We can rationalize (a) by the ONFI standard and claim that it makes
little breakage, since:
* all of the exceptions (in reference [3]) allow at least the 1st
page, 1st OOB byte scans. Last page is not a big addition
* the "1st and 2nd page" scans are only intended for when it wasn't
possible to scan the first page
* the "1st or 6th byte" scans can be safely treated with 1st-byte-only
scans (discussed in another thread recently)

Rationale for (b) is to totally prevent breakage while allowing
deterministic behavior for drivers that want to use the exact ONFI
specification.

Rationale for (c) is laziness or "selective effort" (whichever you
prefer). It seems that there are very few chips that actually follow
ONFI's BBM guidelines properly, so it may not really be worth it to
try to implement them and deal with the breakage. However, this leaves
no deterministic solution for chips that fall under (B).

FWIW, the motivating example for these questions (point (B): Hynix
H27U4G8F2DTR-BC) defaults to scanning the last page of each block in
the current nand_base.c. This may not be significantly different than
1st and last page.

Comments are appreciated. If you've read this far, you probably have
something to say :)

Brian

[1] ONFI 1.0 specification, section 3.2.2
A defective block is indicated by a byte value equal to 00h for 8-bit
access or a word value equal to 0000h for 16-bit access being present
at the first byte/word location in the defect area of either the first
page or last page of the block. The host shall check the first
byte/word of the defect area of both the first and last past page of
each block to verify the block is valid prior to any erase or program
operations on that block.

[2] Example of (B): Hynix H27U4G8F2DTR-BC

[3] http://www.linux-mtd.infradead.org/nand-data/nanddata.html
There are a range of data sheets that say to scan:
1st and 2nd page (1st byte in OOB)
1st page (1st byte in OOB)
1st page (1st or 6th byte in OOB)