MTD RAID

Fri Aug 19 01:20:16 PDT 2016

On Fri, 19 Aug 2016 15:08:35 +0800
Dongsheng Yang <dongsheng.yang at easystack.cn> wrote:

> On 08/19/2016 02:49 PM, Boris Brezillon wrote:
> > Hi Dongsheng,
> >
> > On Fri, 19 Aug 2016 14:34:54 +0800
> > Dongsheng Yang <dongsheng081251 at gmail.com> wrote:
> >  
> >> Hi guys,
> >>      This is a email about MTD RAID.
> >>
> >> *Code:*
> >>      kernel:
> >> https://github.com/yangdongsheng/linux/tree/mtd_raid_v2-for-4.7  
> > Just had a quick look at the code, and I see at least one major problem
> > in your RAID-1 implementation: you're ignoring the fact that NAND blocks
> > can be or become bad. What's the plan for that?  
> 
> Hi Boris,
>      Thanx for your quick reply.
> 
>      When you are using RAID-1, it would erase the all mirrored blockes 
> when you are erasing.
> if there is a bad block in them, mtd_raid_erase will return an error and 
> the userspace tool
> or ubi will mark this block as bad, that means, the 
> mtd_raid_block_markbad() will mark the all
>   mirrored blocks as bad, although some of it are good.
> 
> In addition, when you have data in flash with RAID-1, if one block 
> become bad. For example,
> when the mtd0 and mtd1 are used to build a RAID-1 device mtd2. When you 
> are using mtd2
> and you found there is a block become bad. Don't worry about data 
> losing, the data is still
> saved in the good one mirror. you can replace the bad one device with 
> another new mtd device.

Okay, good to see you were aware of this problem.

> 
> My plan about this feature is all on the userspace tool.
> (1). mtd_raid scan mtd2 <---- this will show the status of RAID device 
> and each member of it.
> (2). mtd_raid replace mtd2 --old mtd1 --new mtd3.   <---- this will 
> replace the bad one mtd1 with mtd3.
> 
> What about this idea?

Not sure I follow you on #2. And, IMO, you should not depend on a
userspace tool to detect address this kind of problems.

Okay, a few more questions.

1/ What about data retention issues? Say you read from the main MTD, and
it does not show uncorrectable errors, so you keep reading on it, but,
since you're never reading from the mirror, you can't detect if there
are some uncorrectable errors or if the number of bitflips exceed the
threshold used to trigger a data move. If suddenly a page in your main
MTD becomes unreadable, you're not guaranteed that the mirror page will
be valid :-/.

2/ How do you handle write atomicity in RAID1? I don't know exactly
how RAID1 works, but I guess there's a mechanism (a journal?) to detect
that data has been written on the main MTD but not on the mirror, so
that you can replay the operation after a power-cut. Do handle this
case correctly?

On a general note, I don't think it's wise to place the RAID layer at
the MTD level. How about placing it at the UBI level (pick 2 ubi
volumes to create one UBI-RAID element)? This way you don't have to
bother about bad block handling (you're manipulating logical blocks
which can be anywhere on the NAND).

One last question? What's the real goal of this MTD-RAID layer? If
that's about addressing the MLC/TLC NAND reliability problems, I don't
think it's such a good idea.

Regards,

Boris