[PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read
Miquel Raynal
miquel.raynal at bootlin.com
Thu May 4 05:17:50 PDT 2023
Hi Arseniy,
avkrasnov at sberdevices.ru wrote on Thu, 4 May 2023 14:37:45 +0300:
> On 03.05.2023 13:23, Arseniy Krasnov wrote:
> >
> >
> > On 03.05.2023 11:03, Miquel Raynal wrote:
> >> Hi Arseniy,
> >>
> >> avkrasnov at sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300:
> >>
> >>> On 02.05.2023 16:05, Miquel Raynal wrote:
> >>>> Hi Arseniy,
> >>>>
> >>>> avkrasnov at sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300:
> >>>>
> >>>>> On 02.05.2023 15:17, Miquel Raynal wrote:
> >>>>>> Hi Arseniy,
> >>>>>>
> >>>>>> Richard, your input is welcome below :-)
> >>>>>>
> >>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see:
> >>>>>>>>>>>>>> 0) First attempt to mount JFFS2.
> >>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done.
> >>>>>>>>>>>>>> 2) Umount JFFS2. Done.
> >>>>>>>>>>>>>> 3) Second attempt to mount JFFS2.
> >>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done.
> >>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get:
> >>>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error
> >>>>>>>>>>>>>> 6) Mount failed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is
> >>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error.
> >>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)?
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then
> >>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2
> >>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page.
> >>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken
> >>>>>>>>>>> ECC codes.
> >>>>>>>>>>>
> >>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts
> >>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without
> >>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work
> >>>>>>>>>>> separately.
> >>>>>>>>>>
> >>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just
> >>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a
> >>>>>>>>>> single time, no matter what portion you write. In some cases, it is
> >>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may
> >>>>>>>>>> be split into several areas which cover a partial in-band area *and* a
> >>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas
> >>>>>>>>>> of a given subpage, you *cannot* write the other part later without
> >>>>>>>>>
> >>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries
> >>>>>>>>> to write page after writing clean markers to it before? In the old vendor's
> >>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works
> >>>>>>>>> correctly.
> >>>>>>>>
> >>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks
> >>>>>>>> strange to you and the old vendor hack)
> >>>>>>>
> >>>>>>> Here is version of the old vendor's driver:
> >>>>>>>
> >>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260
> >>>>>>>
> >>>>>>> In my version there is no BUG() there, but it is same driver for the same chip.
> >>>>>>>
> >>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first
> >>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so
> >>>>>>> may be it is unexpected behaviour of JFFS2?
> >>>>>>
> >>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here.
> >>>>>>
> >>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device?
> >>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because
> >>>>>> cleanmarker seem to be discarded when using a NAND device, and
> >>>>>> recognizing the device as a NAND device requires the above option to be
> >>>>>> set apparently.
> >>>>>
> >>>>> Yes, I have
> >>>>>
> >>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y
> >>>>>
> >>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This
> >>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing:
> >>>>> cleanmarkers are not discarded with NAND device.
> >>>>
> >>>> Excellent. So when cleanmarker_size == 0, it means there is no
> >>>> cleanmarker. But if it is a NAND device, we write the marker anyway.
> >>>>
> >>>> Well I guess it used to work on old controllers using a Hamming ECC
> >>>> engine not protecting any user OOB bytes, so writing the clean markers
> >>>> would simply not lead to ECC bytes being produced/written. Or it might
> >>>> have worked as well on controller drivers not enabling the ECC engine
> >>>> when performing OOB-only writes. It also requires the chip to be old
> >>>> enough to support multiple writes on the same (sub)page as long as the
> >>>> written bits do not overlap?
> >>>
> >>> Yes, with controller which supports such modes there will be no problem here!
> >>> What i see, is that this controller doesn't support multiple writes to the
> >>> same page in ECC mode(e.g. it can't update ECC correctly).
> >>
> >> I don't think this is a controller limitation. The NAND chip cannot
> >> write ECC bytes a first time and then overwrite other ECC bytes, that
> >> cannot work. The fact that we write ECC bytes in the first place is
> >> because the ECC engine covers the free OOB bytes used by JFFS2 to write
> >> its cleanmarkers.
> >>
> >>> So in v2 i've added
> >>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly.
> >>
> >> I am sorry but the above sentence is not clear to me. I believe you
> >> meant the free OOB bytes are moved outside of the area protected by the
> >> ECC engine. In this case I guess it should be fine.
> >
> > Exactly, free bytes which are reported by OOB layout callbacks were moved out of
> > ECC area.
> >
> >>
> >>> So for me main question here is:
> >>>
> >>> How JFFS2 should work with controllers where we can't update data and OOB
> >>> independently? Driver of this filesystem knows nothing about this features of
> >>> the controller.
> >>>
> >>> Or JFFS2 works incorrectly in my case when it tries to call write page callback
> >>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above).
> >>>
> >>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this
> >>> driver as in vendor's driver?
> >>
> >> I would assume using the unprotected free OOB bytes to store the
> >> cleanmarkers should work. But that's a bit fragile and very filesystem
> >> oriented. I don't like this much. But on the other side JFFS2 is
> >> legacy, you should use UBI (which does not play with OOB areas) :-)
> >
> > Problem here is that we can't use UBI in this case, because it does not support
> > small fs images. So the only way to make JFFS2 work is to move free OOB bytes to
> > non protected area. Otherwise i think we have strange situation that JFFS2 can't
> > work correctly on specific type on NAND controller. We already had same problem
> > on another NAND controller, and solution was to move OOB free bytes no non-protected
> > area:
> >
> > https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/
> >
> > Thanks, Arseniy
>
> Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip.
> This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is
> ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0.
> What do You think?
>
> Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled
> by default.
I prefer having a single ooblayout where we expose unprotected user OOB
bytes only. As of today, the only upstream user of user OOB bytes is
JFFS2 anyway.
Thanks,
Miquèl
More information about the linux-amlogic
mailing list