gpmc-nand broken since v4.12
Roger Quadros
rogerq at ti.com
Mon Oct 16 05:12:38 PDT 2017
Hi Boris,
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 16/10/17 14:34, Boris Brezillon wrote:
> Hi Roger,
>
> On Mon, 16 Oct 2017 13:22:04 +0300
> Roger Quadros <rogerq at ti.com> wrote:
>
>>
>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>
>> On 13/10/17 23:29, Boris Brezillon wrote:
>>> On Fri, 13 Oct 2017 22:24:58 +0200
>>> Boris Brezillon <boris.brezillon at free-electrons.com> wrote:
>>>
>>>> On Fri, 13 Oct 2017 14:50:33 +0200
>>>> Boris Brezillon <boris.brezillon at free-electrons.com> wrote:
>>>>
>>>>> On Fri, 13 Oct 2017 14:57:29 +0300
>>>>> Roger Quadros <rogerq at ti.com> wrote:
>>>>>
>>>>>> Hi Boris,
>>>>>>
>>>>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
>>>>>>
>>>>>> Behaviour follows through in v4.13 and v4.14-rc as well.
>>>>>>
>>>>>> Do you have any idea about this?
>>>>
>>>> More info on what has changed in 4.12: we no longer allocate the
>>>> nand_buffer struct + its internal buffer using a single big kmalloc
>>>> call, which means the nand_buffer struct is now likely to be allocated
>>>> in a small object slab (sizeof(nand_buffers) = 12). If you have a
>>>> use-after-free bug somewhere in the kernel, it might corrupt the
>>
>> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
>> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
>>
>>>
>>> I meant buffer-overflow, not use-after-free.
>>
>> Your analysis seems correct.
>>
>>>
>>>> nand_buffers object, which might explain the bug you see here.
>>>>
>>>>>
>>>>> Can you try with this patch [1] applied and paste me the values printed
>>>>> just before the crash?
>>>>>
>>>>> [1]http://code.bulix.org/lc8xk0-209746
>>
>> == unmounting volume
>> [ 36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc = (null) oob_poi = ed096800
>> [ 36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src = (null)
>> [ 36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>
>>
>> Running the following patch
>> https://hastebin.com/ulogurutuz.php
>> shows
>>
>> [ 37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
>> [ 37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
>> [ 37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
>> [ 37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
>> [ 37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
>> [ 37.260846] omap_calculate_ecc_bch
>> [ 37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc = (null)
>> [ 37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc = (null)
>> [ 37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc = (null)
>> [ 37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc = (null)
>> [ 37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc = (null)
>> [ 37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc = (null) oob_poi = ed117800
>> [ 37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src = (null)
>>
>> which means omap_calculate_ecc_bch() it the culprit.
>>
>> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
>> for just one sector (sub-page).
>
> Yes, looks like you find the root cause.
>
>>
>> Is my understanding correct
>> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
>> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
>> OR
>> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).
>
> Second solution sounds simpler because the ECC sector information is
> not passed to ecc->calculate() hook, which means you'd have to extract
> it from the ecc_calc pointer:
>
> (uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size
I don't think we need ECC sector number at all if we're always going to do a transfer
of one sector of data after calling ecc.hwctl(NAND_ECC_READ/WRITE) and before calling ecc.calculate.
My understanding is that the ECC calculators sector 0 registers will always contain
the right content irrespective of which sector we transfer as long as we do,
ecc.hwctl(mtd, NAND_ECC_READ/WRITE);
transfer one sector;
ecc.calculate();
Why isn't there a nand_read_subpage_hwecc()? In the current form a subpage read won't work if
if ecc->mode is NAND_ECC_HW and controllers expect a ecc.hwctl(NAND_ECC_READ) before
calling ecc.calculate().
For the OMAP case I can override both subpage functions.
Is there a good way to test if subpage read/writes are working as they should?
>
> and doing arithmetic on pointers is usually not a good idea.
>
> Anyway, I'd be fine with both solutions, so pick the one you prefer.
>
> Regards,
>
> Boris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
cheers,
-roger
More information about the linux-mtd
mailing list