[PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read
Arseniy Krasnov
avkrasnov at sberdevices.ru
Thu May 4 05:31:52 PDT 2023
On 04.05.2023 15:17, Miquel Raynal wrote:
> Hi Arseniy,
>
> avkrasnov at sberdevices.ru wrote on Thu, 4 May 2023 14:37:45 +0300:
>
>> On 03.05.2023 13:23, Arseniy Krasnov wrote:
>>>
>>>
>>> On 03.05.2023 11:03, Miquel Raynal wrote:
>>>> Hi Arseniy,
>>>>
>>>> avkrasnov at sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300:
>>>>
>>>>> On 02.05.2023 16:05, Miquel Raynal wrote:
>>>>>> Hi Arseniy,
>>>>>>
>>>>>> avkrasnov at sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300:
>>>>>>
>>>>>>> On 02.05.2023 15:17, Miquel Raynal wrote:
>>>>>>>> Hi Arseniy,
>>>>>>>>
>>>>>>>> Richard, your input is welcome below :-)
>>>>>>>>
>>>>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see:
>>>>>>>>>>>>>>>> 0) First attempt to mount JFFS2.
>>>>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done.
>>>>>>>>>>>>>>>> 2) Umount JFFS2. Done.
>>>>>>>>>>>>>>>> 3) Second attempt to mount JFFS2.
>>>>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done.
>>>>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get:
>>>>>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error
>>>>>>>>>>>>>>>> 6) Mount failed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is
>>>>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error.
>>>>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then
>>>>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2
>>>>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page.
>>>>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken
>>>>>>>>>>>>> ECC codes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts
>>>>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without
>>>>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work
>>>>>>>>>>>>> separately.
>>>>>>>>>>>>
>>>>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just
>>>>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a
>>>>>>>>>>>> single time, no matter what portion you write. In some cases, it is
>>>>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may
>>>>>>>>>>>> be split into several areas which cover a partial in-band area *and* a
>>>>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas
>>>>>>>>>>>> of a given subpage, you *cannot* write the other part later without
>>>>>>>>>>>
>>>>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries
>>>>>>>>>>> to write page after writing clean markers to it before? In the old vendor's
>>>>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works
>>>>>>>>>>> correctly.
>>>>>>>>>>
>>>>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks
>>>>>>>>>> strange to you and the old vendor hack)
>>>>>>>>>
>>>>>>>>> Here is version of the old vendor's driver:
>>>>>>>>>
>>>>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260
>>>>>>>>>
>>>>>>>>> In my version there is no BUG() there, but it is same driver for the same chip.
>>>>>>>>>
>>>>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first
>>>>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so
>>>>>>>>> may be it is unexpected behaviour of JFFS2?
>>>>>>>>
>>>>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here.
>>>>>>>>
>>>>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device?
>>>>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because
>>>>>>>> cleanmarker seem to be discarded when using a NAND device, and
>>>>>>>> recognizing the device as a NAND device requires the above option to be
>>>>>>>> set apparently.
>>>>>>>
>>>>>>> Yes, I have
>>>>>>>
>>>>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y
>>>>>>>
>>>>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This
>>>>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing:
>>>>>>> cleanmarkers are not discarded with NAND device.
>>>>>>
>>>>>> Excellent. So when cleanmarker_size == 0, it means there is no
>>>>>> cleanmarker. But if it is a NAND device, we write the marker anyway.
>>>>>>
>>>>>> Well I guess it used to work on old controllers using a Hamming ECC
>>>>>> engine not protecting any user OOB bytes, so writing the clean markers
>>>>>> would simply not lead to ECC bytes being produced/written. Or it might
>>>>>> have worked as well on controller drivers not enabling the ECC engine
>>>>>> when performing OOB-only writes. It also requires the chip to be old
>>>>>> enough to support multiple writes on the same (sub)page as long as the
>>>>>> written bits do not overlap?
>>>>>
>>>>> Yes, with controller which supports such modes there will be no problem here!
>>>>> What i see, is that this controller doesn't support multiple writes to the
>>>>> same page in ECC mode(e.g. it can't update ECC correctly).
>>>>
>>>> I don't think this is a controller limitation. The NAND chip cannot
>>>> write ECC bytes a first time and then overwrite other ECC bytes, that
>>>> cannot work. The fact that we write ECC bytes in the first place is
>>>> because the ECC engine covers the free OOB bytes used by JFFS2 to write
>>>> its cleanmarkers.
>>>>
>>>>> So in v2 i've added
>>>>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly.
>>>>
>>>> I am sorry but the above sentence is not clear to me. I believe you
>>>> meant the free OOB bytes are moved outside of the area protected by the
>>>> ECC engine. In this case I guess it should be fine.
>>>
>>> Exactly, free bytes which are reported by OOB layout callbacks were moved out of
>>> ECC area.
>>>
>>>>
>>>>> So for me main question here is:
>>>>>
>>>>> How JFFS2 should work with controllers where we can't update data and OOB
>>>>> independently? Driver of this filesystem knows nothing about this features of
>>>>> the controller.
>>>>>
>>>>> Or JFFS2 works incorrectly in my case when it tries to call write page callback
>>>>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above).
>>>>>
>>>>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this
>>>>> driver as in vendor's driver?
>>>>
>>>> I would assume using the unprotected free OOB bytes to store the
>>>> cleanmarkers should work. But that's a bit fragile and very filesystem
>>>> oriented. I don't like this much. But on the other side JFFS2 is
>>>> legacy, you should use UBI (which does not play with OOB areas) :-)
>>>
>>> Problem here is that we can't use UBI in this case, because it does not support
>>> small fs images. So the only way to make JFFS2 work is to move free OOB bytes to
>>> non protected area. Otherwise i think we have strange situation that JFFS2 can't
>>> work correctly on specific type on NAND controller. We already had same problem
>>> on another NAND controller, and solution was to move OOB free bytes no non-protected
>>> area:
>>>
>>> https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/
>>>
>>> Thanks, Arseniy
>>
>> Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip.
>> This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is
>> ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0.
>> What do You think?
>>
>> Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled
>> by default.
>
> I prefer having a single ooblayout where we expose unprotected user OOB
> bytes only. As of today, the only upstream user of user OOB bytes is
> JFFS2 anyway.
Ok, so i'll prepare v3 on the next week with this update (it is already in v2, but i'm going
to modify another patch with updates from @Liang).
Thanks, Arseniy
>
> Thanks,
> Miquèl
More information about the linux-amlogic
mailing list