Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?

Miquel Raynal miquel.raynal at bootlin.com
Mon Mar 14 08:45:05 PDT 2022


Hi Daniel,

Sorry for the delay.

dg at emlix.com wrote on Thu, 24 Feb 2022 19:17:43 +0100:

> Hi Miquel,
> 
> Am 24.02.22 um 17:03 schrieb Miquel Raynal:
> > dg at emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:  
> >> Am 24.02.22 um 16:29 schrieb Miquel Raynal:  
> >>> dg at emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:    
> >>>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please describe more details about what kind of error, how to    
> >>>>> reproduce it and on which kernel version?      
> >>>>
> >>>> You need a flash that has one bad block where programming the BBM sets
> >>>> NAND_STATUS_FAIL in its status register. The latest kernels should still
> >>>> have problems when this happens in a UBI.    
> >>>
> >>> I believe we should try to tackle "why" this happens more than try to
> >>> workaround its consequences. Can you give more details about why we get
> >>> this status?    
> >>
> >> Uhm, the block is bad, broken. It shows the same behavior even after
> >> power cycling. The other blocks are ok. I don't think it is our fault
> >> that it died so early.  
> > 
> > But why after a power cycle are we trying to write the BBM?  
> 
> I did not want to imply that Linux tries to write the block after every
> power cycle. UBI notices that the block is broken once and manages to
> mark it as bad in the BBT, so after power cycle it will not try to write
> to that block again. What I wanted to say is that manual testing of the
> block after power cycling shows that the block remains unusable.
> 
> The problem is that UBI switches to read-only mode after it marked the
> block as bad in the BBT because the redundant BBM in the OOB of the
> block could not be written.

I think I understand better your situation now.

So here is our problem : why can't we write the OOB? If there is a good
reason this cannot happen, then we can provide the NAND_BBT_NO_OOB_BBM
flag. Otherwise we should find the root cause.

> And we don't want to get into a situation
> where we have to reboot the system, especially if it is because of
> something we don't need.
> 
> We could change nand_block_markbad_lowlevel to return success as long
> as updating the BBT succeeds, if you think that this is the correct
> approach.

That is not a correct approach if we did not asked to bypass writing
BBMs explicitly.

> > Is it that there are too many ECC errors and so when reading the block it
> > is declared bad and the system tries to set the BBM/BBT bit? Or is it
> > already marked bad somewhere and something silly happens which at
> > some point tries to re-write the BBM?  
> 
> I guess when programming the BBM fails with an error in the status
> register

Why would a (without-ECC) program operation fail? I guess this is what
we should understand first.

> the same probably happened when UBI tried to write data to the
> block.
> 
> > Are you using fastmap? do you use a BBT?  
> 
> Yes and yes. The fact that we use a BBT is why we want to set
> NAND_BBT_NO_OOB_BBM.
> 
> Best regards,
> 
>   Daniel
> 

Thanks,
Miquèl



More information about the linux-mtd mailing list