[BUG] pxa3xx: wait time out when scanning for bb

Greg Cook greg at morpheus.ws
Mon Dec 11 23:30:03 PST 2017


On 12 December 2017 at 15:09, Ezequiel Garcia
<ezequiel at vanguardiasur.com.ar> wrote:
> On 12 December 2017 at 03:01, Greg Cook <greg at morpheus.ws> wrote:
>> On 12 December 2017 at 05:16, Boris Brezillon
>> <boris.brezillon at free-electrons.com> wrote:
>>> On Mon, 11 Dec 2017 13:24:56 -0300
>>> Ezequiel Garcia <ezequiel at vanguardiasur.com.ar> wrote:
>>>
>>>> Greg,
>>>>
>>>> On 11 December 2017 at 10:18, Greg Cook <greg at morpheus.ws> wrote:
>>>> > Sean,
>>>> >
>>>> > I am not completely up-to-date on this, but everything in your traces reads
>>>> > like the same issue I was having on bringup for Armada 385 nand (under 4.9).
>>>> > I've been stuck on another project, so I haven't had time to follow up
>>>> > further, but I just diffed against linux-stable v4.12 pxa3xx_nand.c and it
>>>> > looks like the problem is still there.
>>>> >
>>>> > As far as I can see, the driver is broken for OOB reads when BCH is enabled
>>>> > because the setup in prepare_set_command() results in drain_fifo() not
>>>> > reading enough words from the read fifo in the nfc2 IP block.
>>>
>>> May I ask how you you get to this conclusion? What makes you think
>>> there's still unread data in the FIFO?
>>
>> Because I added some debug lines to count the number of bytes read and
>> I could  see handle_data_pio() was reading 2048 bytes, followed by 32
>> bytes of spare. If you have Marvell MV-S109094-00 REV C handy, you can
>> see in Table 36 that this is not correct. The correct number of spare
>> bytes for the NFCv2 IP block is
>> - 64 bytes when SPARE_EN==1 and ECC_EN==0
>> - 32 bytes when SPARE_EN==1 and ECC_EN==1
>>
>>>> >
>>>>
>>>> Yes, this sounds just like the bug I was expecting for OOB reads.
>>>>
>>>> > The patch we are using is below. I have the following in my DTS.
>>>> > nand-keep-config is commented out because I was having some issues with
>>>> > u-boot at the time and it may no longer be relevant:
>>>>
>>>> Probably.
>>>>
>>>> > flash at d0000 {
>>>> > status = "okay";
>>>> > num-cs = <1>;
>>>> > //marvell,nand-keep-config;
>>>> > marvell,nand-enable-arbiter;
>>>> > nand-on-flash-bbt;
>>>> > nand-ecc-strength = <4>;
>>>> > nand-ecc-step-size = <512>;
>>>> > };
>>>> >
>>>> > --- /home/user/build/linux-stable/drivers/mtd/nand/pxa3xx_nand.c
>>>> > +++
>>>> > /home/user/build/beam/openwrt/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/linux-mvebu/linux-4.9.34/drivers/mtd/nand/pxa3xx_nand.c
>>>> > @@ -668,7 +669,7 @@
>>>> >
>>>> >  static void drain_fifo(struct pxa3xx_nand_info *info, void *data, int len)
>>>> >  {
>>>> > - if (info->ecc_bch) {
>>>> > + if (info->use_ecc && info->ecc_bch) {
>>>
>>> This one might explain timeouts occurring when you drain the FIFO for
>>> an operation that does not enable the ECC engine (like
>>> READ_PARAM_PAGE).
>>>
>>> So this fix is indeed valid, but I'm almost sure it won't fix Sean's
>>> problem.
>>
>> From memory, I don't *think* specific line wasn't causing me any
>> issues, but I noticed it was not correct because of the way use_ecc
>> and ecc_bch flags are initialised and then used. As you say, without
>> this patch, you can get timeouts on non-ECC operations when you have
>> BCH ECC enabled generally.
>>
>>>
>>>> >   u32 val;
>>>> >   int ret;
>>>> >
>>>> > @@ -1012,7 +1014,11 @@
>>>> >
>>>> >   if (info->cur_chunk < info->nfullchunks) {
>>>> >   info->step_chunk_size = info->chunk_size;
>>>> > - info->step_spare_size = info->spare_size;
>>>> > + if (info->use_ecc) {
>>>> > + info->step_spare_size = info->spare_size;
>>>> > + } else {
>>>> > + info->step_spare_size = info->spare_size + info->ecc_size;
>>>> > + }
>>>
>>> The only case this change would fix is when you try to read/write pages
>>> in raw mode, and I'm pretty sure this driver does not support raw
>>> accesses.
>>
>> Why do you say it would only affect raw mode? This code is not
>> specific to raw mode, nor
>> is the related code in handle_data_pio(), which uses the value of
>> step_spare_size to drain
>> the hardware FIFO.
>>
>
> AFAICS, this patch affects READOOB commands, which are
> issued when there is no BBT to detect bad blocks.

Agree that this only really affects READOOB, because READ0 will always
use ECC. I guess this is why it's only catching people at bringup
because we are booting into Linux with a blank device.

>
> It "fixes" the timeout, but I wouldn't go as far as assuring
> it makes the on-flash bad block scheme work, given
> OOB writes will still be done with ECC enabled.
>
> It seems this drivers mandates a BBT to work properly,
> so perhaps a better patch would be to (a) force BBT usage,
> and avoid OOB operations altogether, avoiding nasty
> timeouts and unhappy board-bringupers.
>
> Or (b) fix the READOOB and then
> WARN/pr_warning about OOB I/O being broken.
>
> The new driver under discussion, apparently makes AOOB
> actually usable (arguably, without much utility as OOB
> is mostly used for ECC anyway).
>
> Ideas?

Does mandating BBT usage avoid the bug? Aren't you always going to hit
it on a blank NAND device straight off the production line?

> --
> Ezequiel García, VanguardiaSur
> www.vanguardiasur.com.ar



More information about the linux-mtd mailing list