MTD RAID

Mon Aug 22 07:54:06 PDT 2016

On Mon, 2016-08-22 at 18:55 +0800, Dongsheng Yang wrote:
> 
> 
> On Mon, Aug 22, 2016 at 3:27 PM, Artem Bityutskiy <dedekind1 at gmail.co
> m> wrote:
> > On Mon, 2016-08-22 at 11:22 +0800, Dongsheng Yang wrote:
> > > As I explained above, MTD RAID is not just a solution for
> > reliability
> > > problem for MLC/TLC. 
> > 
> > Could you please answer these questions.
> > 
> > 1. Does MTD raid work on MLC or is it SLC-only?
> 
> Good question. No, it is based on MTD layer, so it should be fine for
> any MTD devices in theory, although we are using nand flash in our
> production.
> > 
> > 2. If I am building RAID-0, I have 2 flash chips, one has every
> > even
> > block bad, the other has every odd block bad. What happens?
> 
>  All blocks would be marked as bad. Because we are combining the
> striping related blocks as a larger block, so if one of the blocks is
> bad, we will
> mark the virtual large block as bad. Example as RAID0 in my first
> email, I build a RAID0 device by 4 devices with block size of 16K.
> then the block
> size of the new RAID0 device is 64K. I am combining these 4 devices
> and do a striping on the new/larger block, for a better performance. 
> > 
> > 3. Same question, but for RAID-1.
> 
> Same result but with different reason, In RAID1, we don't need to
> enlarge the block size. But we should make the blockes mirrored. If
> block of the
> master or mirror is bad, then the block for the RAID1 device should
> be marked as bad, because we can't promise requested number of copies
> of data in this case. 
> > 
> > 4. Suppose I have RAID-1 like in this picture:
> > 
> > https://en.wikipedia.org/wiki/Standard_RAID_levels#/media/File:RAID
> > _1.svg
> > 
> > Just assume we have flash chips, not disks, and eraseblocks, not
> > sectors.
> > 
> > Suppose eraseblock A1 goes bad. What happens next?
> 
> If A1 in disk0 goes bad, the data will be read out from disk1. But
> there would be a warning about this problem. Then we should noticed
> that our
> data in this section is not safe enough now. Then there are different
> scenario. 

> (1) you are using ubi on this device, I mentioned once last week, we
> can enhance ubi to notice this kind of problem, then do a data
> migration. 

So to make my data become mirrored again, MTD RAID needs help from
upper layers.

In case of RAID, you do not need this kind of help. All you need is
change the disk when it starts getting bad sectors.

I mean, this is a bit like: here is our MTD RAID which is not a RAID
unless it works with special SW like UBI on top of it.

Are you sure you want to call this MTD RAID?

If this was on top of UBI, the RAID would probably be a closer match,
because then you could assume you are on top of a "reliable" media.

Note, I am not insisting, just asking.

> (2) you are using some other application on this device, Then you can
> check the status of this MTD RAID device now by "mtd_raid scan <dev>"
> if this command report there is some blocks are bad, you can do a
> "mtd_raid replace <>" to replace the bad one.

How will the replacement work. What gets replaced - the entire flash
chip? Probably not, because bad blocks are natural for raw NAND. Then
how do you replace the blocks if there is an FS on top?

Artem.