[PATCH v3 0/6] NAND BBM + BBT updates

Fri Jan 13 17:12:11 EST 2012

On Thu, 2012-01-12 at 09:58 +0200, Shmulik Ladkani wrote:
> On Thu, 12 Jan 2012 00:28:45 +0200 Artem Bityutskiy <dedekind1 at gmail.com> wrote:
> > In my view, OOB BB markers is the primary, reliable, and simple
> > mechanism. And BBT is just an additional optimization to speed up system
> > startup.
> > 
> > So in general I support Brian's efforts
> 
> I'm in favor of this approach as well.
> However IMO it should (1) be 'bbt_options' configurable;

Why does it have to be configurable? Do you have some example in mind?

>  (2) should
> properly address OOB vs BBT out-of-sync issues.

This is reasonable.

> 
> > Indeed, when we mark a block as bad?
> > 
> > 1. When we get erase error. Well, if SW erases a block, it does not care
> > of the contents. This means that if after the reboot SW will re-try
> > erasing it. And if the block is bad, and previously the erasure failed,
> > it will fail again, and SW will mark it as bad again.
> > 
> > 2. When we get a write error. The SW recovers useful data from the
> > eraseblock, then tries to mark it bad. Well, UBI will first try to
> > torture it, but this is a not essential detail. Anyway, if we get a
> > power cut - the situation is the same - SW will try to erase this block
> > and write to it, will get errors again and will mark it as bad.
> 
> So your new scheme for 'nand_default_block_markbad' is as follows:
>   (1) mark BBM in OOB
>   (2) update on-flash BBT.
> Where existing scheme (for NAND_BBT_USE_FLASH devices) is:
>   update on-flash BBT.
> 
> And hence, if power-cut occurs between (1) and (2) in the new scheme,
> it is equivalent to a power-cut that occurred just an instant prior
> actually performing the BBT update in the old scheme.
> 
> Meaning: the system, being NAND_BBT_USE_FLASH based, will simply won't
> be aware of the bad block (although already OOB marked).
> Is that right?

Yes. And the idea is that it will discover it when starting doing I/O on
this eraseblock. Indeed, if it found out that it is bad before the power
cut (it exhibited I/O errors), it should discover it again by getting
I/O errors.

> > I guess we also need to read oob before writing it when we are marking a
> > block as bad - just in case it is already marked as bad in OOB.
> 
> I assume you mean using 'chip->block_bad' within the new implementation
> of 'nand_default_block_markbad' prior executing (1). Is that right?

Probably yes.

> 
> > Comments? If this does not make sense - I have a good excuse - it is
> > late and I am very sleepy :-)
> 
> I guess it's reasonable :)
> 
> The only argument I have is that this scheme, although working,
> contradicts your view of "OOB BB markers being the primary mechanism".
> That's because 'nand_block_checkbad' prefers the info from the BBT
> (for NAND_BBT_USE_FLASH devices).

My point is that in case of a power cut between (1) and (2) the upper
layers will detect the bad block again and mark it as bad again, both in
OOB and BBT. So OOB and BBT will be in sync.

The other approach would be to have an additional bit per eraseblock in
the in-ram BBT for lazy checking. And actually compare the OOB bad block
marker with the BBT on the first erase or write operation, and bring OOB
and BBM in sync.

Artem.