Race-free NAND device removal

Richard Weinberger richard at nod.at
Mon Jul 4 02:44:03 PDT 2016



Am 04.07.2016 um 11:16 schrieb Boris Brezillon:
> On Sun, 3 Jul 2016 15:38:42 +0200
> Richard Weinberger <richard at nod.at> wrote:
> 
>> Hi!
>>
>> While working on nandsim I realized that nand_release() ignores the return
>> value from mtd_device_unregister().
>>
>> That means NAND devices cannot removed in a race-free manner.
>> Consider a NAND driver that registers ->_get_device() and ->_put_device()
>> callbacks for refcounting. In its removal function it will return -EBUSY
>> whenever the refcount is > 0.
>> But when device is claimed while removing it, it can happen that the refcount
>> increments after the check.
>> MTD can deal with that and mtd_device_unregister() will return EBUSY.
>> But nand_release() won't notice and the NAND driver continues with the tear down
>> process.
> 
> Yes, I already noticed that, and apparently all NAND controller drivers
> seem to assume that nand_release() always succeed. It's definitely a
> bug, since the MTD device will still be exposed, but the underlying
> NAND structure (and the associated data + implementation) will be
> gone :-/.

Well, in most cases it will work since the module refcounting kicks in.
And no NAND drivers create/remove MTDs during runtime.

>>
>> Would be a change like the following one acceptable or is a NAND driver
>> allowed to call mtd_device_unregister() itself?
>> AFAICT the additional call to mtd_device_unregister() in nand_release() would
>> be an nop then.
> 
> This patch looks good, but NAND controller drivers will keep ignoring
> the nand_release() return code and release their own private data, so
> implementations are still buggy ;).
> 
> This whole NAND dev registration/deregistration is unsafe, and I plan
> to rework it when moving to a controller <-> chips infrastructure.
> 
> Are you fixing a real bug or just a potential one? Cause I'm not sure
> doing that is any safer if we don't patch all the NAND controller
> drivers...

I'm facing a real issue on nandsim.
Currently I'm heavily reworking nandsim.
One of the new features is that you can add/remove NAND MTDs during runtime
using a userspace tool. It works like losetup.

$ nandsimctl --backend file /home/rw/work/XXX/broken_mtd.raw --id-bytes 0x....

While getting this race free I found that issue.

Thanks,
//richard



More information about the linux-mtd mailing list