[PATCH next 2/2] spi: spi-qpic-snand: add support for 8 bits ECC strength

Miquel Raynal miquel.raynal at bootlin.com
Wed May 21 00:52:31 PDT 2025


On 21/05/2025 at 11:08:02 +0530, Md Sadre Alam <quic_mdalam at quicinc.com> wrote:

> Hi,
>
> On 5/16/2025 7:44 PM, Miquel Raynal wrote:
>> 
>>>>> Interestingly enough, it reports the correct number of bit errors now.
>>>>> For me it seems, that the hardware reports the number of the corrected
>>>>> *bytes* instead of the corrected *bits*.
>>>> I doubt that, nobody counts bytes of errors.
>>>> You results are surprising. I initially though in favour of a software
>>>> bug, but then it looks even weirder than that. Alam?
>>> I have checked with HW team , the QPIC ECC HW engine reports the bit
>>> error byte wise not bit wise.
>>>
>>> e.g
>>>      Byte0 --> 2-bitflips --> QPIC ECC counts 1 only
>>>      Byte1 --> 3-bitflips --> QPIC ECC counts 1 only
>>>      Byte2 --> 1-bitflips --> QPIC ECC counts 1 only
>>>      Byte3 --> 4-bitflips --> QPIC ECC counts 1 only (in 8-bit ecc)
>>>      Byte4 --> 6-bitflips --> QPIC ECC counts 1 only (in 8-bit ecc)
>>>
>>> Hope this can clearify the things now.
>> o_O ????
>> How is that even useful? This basically means UBI will never refresh
>> the
>> data because we will constantly underestimate the number of bitflips! We
>> need to know the actual number, this averaging does not make any sense
>> for Linux. Is there another way to get the raw number of bitflips?
> I have re-checked with HW team, unfortunately currently there is no
> register fields available to get the raw number of bit flips. But
> for newer chipset they have fixed this issue. But currently the QPIC
> QPIC_NANDC_BUFFER_STATUS | 0x79B0018 register bit-8 will get set if
> there is uncorrectable bitflips happened.
>
> For 4-bit ECC if 5-bit raw bit flips happened then bit-8 will get set in
> QPIC_NANDC_BUFFER_STATUS.
>
> similar for 8-bit ECC if 9-bit raw bit flips happened then bit-8 will
> get set in QPIC_NANDC_BUFFER_STATUS.

I believe the unrecoverable situation is handled correctly. What is not
is the fact that we care about the number of bitflips before having a
failure because if it reaches a certain threshold (typically 2/3 of the
strength) the upper layer is responsible of moving the data around to
avoid loosing it.

You need to identify the hardware revision that fixed it and provide a
warning otherwise, or at least a comment in the code...

Thanks,
Miquèl



More information about the linux-mtd mailing list