[PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read
Arseniy Krasnov
avkrasnov at sberdevices.ru
Tue May 2 09:13:38 PDT 2023
On 02.05.2023 16:05, Miquel Raynal wrote:
> Hi Arseniy,
>
> avkrasnov at sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300:
>
>> On 02.05.2023 15:17, Miquel Raynal wrote:
>>> Hi Arseniy,
>>>
>>> Richard, your input is welcome below :-)
>>>
>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see:
>>>>>>>>>>> 0) First attempt to mount JFFS2.
>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done.
>>>>>>>>>>> 2) Umount JFFS2. Done.
>>>>>>>>>>> 3) Second attempt to mount JFFS2.
>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done.
>>>>>>>>>>> 5) It reads page N in ECC mode, and i get:
>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error
>>>>>>>>>>> 6) Mount failed.
>>>>>>>>>>>
>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is
>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers.
>>>>>>>>>
>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error.
>>>>>>>>> does it have to use raw write/read on step 1) and 4)?
>>>>>>>>>
>>>>>>>>
>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then
>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2
>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page.
>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken
>>>>>>>> ECC codes.
>>>>>>>>
>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts
>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without
>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work
>>>>>>>> separately.
>>>>>>>
>>>>>>> The fact that there might be helpers just for writing OOB areas or just
>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a
>>>>>>> single time, no matter what portion you write. In some cases, it is
>>>>>>> possible to perform subpage writes if the chip supports it. Pages may
>>>>>>> be split into several areas which cover a partial in-band area *and* a
>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas
>>>>>>> of a given subpage, you *cannot* write the other part later without
>>>>>>
>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries
>>>>>> to write page after writing clean markers to it before? In the old vendor's
>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works
>>>>>> correctly.
>>>>>
>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks
>>>>> strange to you and the old vendor hack)
>>>>
>>>> Here is version of the old vendor's driver:
>>>>
>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260
>>>>
>>>> In my version there is no BUG() there, but it is same driver for the same chip.
>>>>
>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first
>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so
>>>> may be it is unexpected behaviour of JFFS2?
>>>
>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here.
>>>
>>> Are you sure you flash is recognized by JFFS2 as being a NAND device?
>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because
>>> cleanmarker seem to be discarded when using a NAND device, and
>>> recognizing the device as a NAND device requires the above option to be
>>> set apparently.
>>
>> Yes, I have
>>
>> CONFIG_JFFS2_FS_WRITEBUFFER=y
>>
>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This
>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing:
>> cleanmarkers are not discarded with NAND device.
>
> Excellent. So when cleanmarker_size == 0, it means there is no
> cleanmarker. But if it is a NAND device, we write the marker anyway.
>
> Well I guess it used to work on old controllers using a Hamming ECC
> engine not protecting any user OOB bytes, so writing the clean markers
> would simply not lead to ECC bytes being produced/written. Or it might
> have worked as well on controller drivers not enabling the ECC engine
> when performing OOB-only writes. It also requires the chip to be old
> enough to support multiple writes on the same (sub)page as long as the
> written bits do not overlap?
Yes, with controller which supports such modes there will be no problem here!
What i see, is that this controller doesn't support multiple writes to the
same page in ECC mode(e.g. it can't update ECC correctly). So in v2 i've added
patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly.
So for me main question here is:
How JFFS2 should work with controllers where we can't update data and OOB
independently? Driver of this filesystem knows nothing about this features of
the controller.
Or JFFS2 works incorrectly in my case when it tries to call write page callback
after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above).
Or may be it is better to suppress OOB write callback (or set it to NULL) in this
driver as in vendor's driver?
Thanks, Arseniy
>
> Perhaps that's what the hack in the old driver is for. But that's
> IMHO broken in case of unexpected reboot :-)
>
> Miquèl
More information about the linux-amlogic
mailing list