UBIFS errors when file-system is full

Wed Aug 12 00:27:36 PDT 2015

Stefan,

Am 12.08.2015 um 09:01 schrieb Stefan Agner:
> Hi Richard,
> 
> [also added Brian to the discussion, since he had a look into that
> driver before]

Good idea, maybe Brian has an idea.

> On 2015-08-07 14:37, Richard Weinberger wrote:
>> Hi!
>>
>> Am 06.08.2015 um 12:31 schrieb Bhuvanchandra DV:
>>>>> The tests ran on ubi partition after isolating it from U-Boot completly.
>>>>> Formatted the ubi partition and then boot with SD card (4.1.2 kernel fastmap enabled/disabled, fm_debug enabled).
>>>>> Please find the below log of ubi-tests:
>>>>>
>>>>> [io_paral] write_thread():222: written and read data are different
>>>> *blink*
>>>
>>> Tried to run the io_paral test multiple times seperately with few debug prints added to see what exact
>>> differences with read and write buffers, so far we could see one complete page is read twice even though
>>> it is written once. I'm now confused is the issue happen while reading or while writing. Can you give us
>>> some pointers so that we can narrow down the cause for this failure.
>>
>> The test verifies that the data has been written correctly to the block.
>> (Maybe a buffer problem in your MTD driver?)
>>
>> You can also enable UBI's IO checks.
>> i.e. echo 1 > /sys/kernel/debug/ubi/ubi0/chk_io
>>
>> It will also verify it's writes. Maybe it can give you a clue.
> 
> According to Bhuvan's test, it really seems that we have an issue on
> write path (this error is reproduceable):
> root at colibri-vf:~/ubi-tests-bin# ./io_paral /dev/ubi0 2>&1 | tee
> ~/io-parl4.log
> [ 6451.223087] ubi0 error: self_check_write: self-check failed for PEB
> 843:4096, len 126976
> [ 6451.231650] ubi0: data differ at position 61440
> [ 6451.236325] ubi0: hex dump of the original buffer from 61440 to
> 126976
> [ 6451.331045] ubi0: hex dump of the read buffer from 61440 to 126976
> [ 6451.426703] CPU: 0 PID: 1182 Comm: io_paral Not tainted
> 4.1.4-00704-g2631972 #21
> [ 6451.434506] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)

Thanks for letting me know. :)

> This 4.1.4 with v10 of the driver applied:
> http://thread.gmane.org/gmane.linux.drivers.devicetree/130300
> 
> 
> I worked on the driver since quite some time, currently v10 is in
> review. With this issue in mind, I went through the driver however I
> currently can't see an issue.
> 
> The error position is always page aligned, but at different pages. We
> printed the reread buffers once: It seems that one page lands on flash
> twice. My guess is that the second page doesn't get transmitted
> properly, while the new column/row gets transmitted and
> NAND_CMD_PAGEPROG executed... Hence the same buffer would be written to
> the device again.
> 
> The NFC IP in Vybrid (vf610) has a higher level programming model which
> takes care of the command sequencing. Therefore some callbacks are not
> actually sending a command to the device (e.g. NAND_CMD_SEQIN) since
> this will be done one command later, on in NAND_CMD_PAGEPROG. Now, of
> course, the driver relies heavily on not being interrupted by other
> requests in between, (also not read!) but I thought that this is taken
> care of by the MTD subsystem? So for me it is a bit hard to spot the
> error since I'm always unsure whether the assumptions regarding
> locking/exclusiveness between the calls is really guaranteed...

NAND access is serialized using nand_get_device() and nand_release_device()
in nand_base.c, so serialization should be fine.

HTH,
//richard