[BUG] pxa3xx: wait time out when scanning for bb

Mon Dec 11 23:09:45 PST 2017

On 12 December 2017 at 03:01, Greg Cook <greg at morpheus.ws> wrote:
> On 12 December 2017 at 05:16, Boris Brezillon
> <boris.brezillon at free-electrons.com> wrote:
>> On Mon, 11 Dec 2017 13:24:56 -0300
>> Ezequiel Garcia <ezequiel at vanguardiasur.com.ar> wrote:
>>
>>> Greg,
>>>
>>> On 11 December 2017 at 10:18, Greg Cook <greg at morpheus.ws> wrote:
>>> > Sean,
>>> >
>>> > I am not completely up-to-date on this, but everything in your traces reads
>>> > like the same issue I was having on bringup for Armada 385 nand (under 4.9).
>>> > I've been stuck on another project, so I haven't had time to follow up
>>> > further, but I just diffed against linux-stable v4.12 pxa3xx_nand.c and it
>>> > looks like the problem is still there.
>>> >
>>> > As far as I can see, the driver is broken for OOB reads when BCH is enabled
>>> > because the setup in prepare_set_command() results in drain_fifo() not
>>> > reading enough words from the read fifo in the nfc2 IP block.
>>
>> May I ask how you you get to this conclusion? What makes you think
>> there's still unread data in the FIFO?
>
> Because I added some debug lines to count the number of bytes read and
> I could  see handle_data_pio() was reading 2048 bytes, followed by 32
> bytes of spare. If you have Marvell MV-S109094-00 REV C handy, you can
> see in Table 36 that this is not correct. The correct number of spare
> bytes for the NFCv2 IP block is
> - 64 bytes when SPARE_EN==1 and ECC_EN==0
> - 32 bytes when SPARE_EN==1 and ECC_EN==1
>
>>> >
>>>
>>> Yes, this sounds just like the bug I was expecting for OOB reads.
>>>
>>> > The patch we are using is below. I have the following in my DTS.
>>> > nand-keep-config is commented out because I was having some issues with
>>> > u-boot at the time and it may no longer be relevant:
>>>
>>> Probably.
>>>
>>> > flash at d0000 {
>>> > status = "okay";
>>> > num-cs = <1>;
>>> > //marvell,nand-keep-config;
>>> > marvell,nand-enable-arbiter;
>>> > nand-on-flash-bbt;
>>> > nand-ecc-strength = <4>;
>>> > nand-ecc-step-size = <512>;
>>> > };
>>> >
>>> > --- /home/user/build/linux-stable/drivers/mtd/nand/pxa3xx_nand.c
>>> > +++
>>> > /home/user/build/beam/openwrt/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/linux-mvebu/linux-4.9.34/drivers/mtd/nand/pxa3xx_nand.c
>>> > @@ -668,7 +669,7 @@
>>> >
>>> >  static void drain_fifo(struct pxa3xx_nand_info *info, void *data, int len)
>>> >  {
>>> > - if (info->ecc_bch) {
>>> > + if (info->use_ecc && info->ecc_bch) {
>>
>> This one might explain timeouts occurring when you drain the FIFO for
>> an operation that does not enable the ECC engine (like
>> READ_PARAM_PAGE).
>>
>> So this fix is indeed valid, but I'm almost sure it won't fix Sean's
>> problem.
>
> From memory, I don't *think* specific line wasn't causing me any
> issues, but I noticed it was not correct because of the way use_ecc
> and ecc_bch flags are initialised and then used. As you say, without
> this patch, you can get timeouts on non-ECC operations when you have
> BCH ECC enabled generally.
>
>>
>>> >   u32 val;
>>> >   int ret;
>>> >
>>> > @@ -1012,7 +1014,11 @@
>>> >
>>> >   if (info->cur_chunk < info->nfullchunks) {
>>> >   info->step_chunk_size = info->chunk_size;
>>> > - info->step_spare_size = info->spare_size;
>>> > + if (info->use_ecc) {
>>> > + info->step_spare_size = info->spare_size;
>>> > + } else {
>>> > + info->step_spare_size = info->spare_size + info->ecc_size;
>>> > + }
>>
>> The only case this change would fix is when you try to read/write pages
>> in raw mode, and I'm pretty sure this driver does not support raw
>> accesses.
>
> Why do you say it would only affect raw mode? This code is not
> specific to raw mode, nor
> is the related code in handle_data_pio(), which uses the value of
> step_spare_size to drain
> the hardware FIFO.
>

AFAICS, this patch affects READOOB commands, which are
issued when there is no BBT to detect bad blocks.

It "fixes" the timeout, but I wouldn't go as far as assuring
it makes the on-flash bad block scheme work, given
OOB writes will still be done with ECC enabled.

It seems this drivers mandates a BBT to work properly,
so perhaps a better patch would be to (a) force BBT usage,
and avoid OOB operations altogether, avoiding nasty
timeouts and unhappy board-bringupers.

Or (b) fix the READOOB and then
WARN/pr_warning about OOB I/O being broken.

The new driver under discussion, apparently makes AOOB
actually usable (arguably, without much utility as OOB
is mostly used for ECC anyway).

Ideas?
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar