[PATCH 1/2] ubi: wl: Put source PEB into correct list if trying locking LEB failed

Zhihao Cheng chengzhihao1 at huawei.com
Mon Aug 26 21:54:09 PDT 2024


在 2024/8/27 11:51, Ryder Wang 写道:
> Hi Zhihao,
> 
>> During wear-leveing work, the source PEB will be moved into scrub list
>> when source LEB cannot be locked in ubi_eba_copy_leb(), which is wrong
>> for non-scrub type source PEB.
> 
> My understanding it's just the expected design of UBI wear leveling. When ubi_eba_copy_leb() is called for non-scrub type source PEB, it also indicates that this non-scrub PEB's ease counter has be over UBI_WL_THRESHOLD, so the WL should be performed on it. If ubi_eba_copy_leb() fails with MOVE_RETRY, this PEB is moved to scrub tree for further WL operation. This should also be sane, right? 

Now, e1 is the source PEB with smaller erase counter, it will be picked 
as source PEB again in next WL if it not becomes free yet.

We don't put it into scrub tree, because the scrub type PEB means that 
biflips ever happen during reading, the data is not stable anymore in 
that PEB, so UBI moves data from scrub type PEB just take the 
opportunity of WL process.

IOW, in this case, the WL won't lost keeping ec balanced whether we keep 
e1 in used tree or move it into scrub tree. It's the same thing as the 
MOVE_TARGET_WR_ERR/MOVE_TARGET_RD_ERR case.

> Or you have any statistical data or evidence where the current design behavior has any real bad effect to the life time of flash?

Yes, see patch 0:
Before applying patches (PEB erase counters Histogram):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:     1024      646      684      988
1000     ..     9999:        0        0        0        0
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     1024      646      684      988

After applying patches (PEB erase counters Histogram):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:     1024      629      666      960
1000     ..     9999:        0        0        0        0
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     1024      629      666      960 (0~64)

> 
> ________________________________________
> From: linux-mtd <linux-mtd-bounces at lists.infradead.org> on behalf of Zhihao Cheng <chengzhihao1 at huawei.com>
> Sent: Monday, August 19, 2024 11:26
> To: richard at nod.at; miquel.raynal at bootlin.com; daniel at makrotopia.org
> Cc: linux-mtd at lists.infradead.org
> Subject: [PATCH 1/2] ubi: wl: Put source PEB into correct list if trying locking LEB failed
> 
> During wear-leveing work, the source PEB will be moved into scrub list
> when source LEB cannot be locked in ubi_eba_copy_leb(), which is wrong
> for non-scrub type source PEB. The problem could bring extra and
> ineffective wear-leveing jobs, which makes more or less negative effects
> for the life time of flash. Specifically, the process is divided 2 steps:
> 1. wear_leveling_worker // generate false scrub type PEB
>       ubi_eba_copy_leb // MOVE_RETRY is returned
>         leb_write_trylock // trylock failed
>       scrubbing = 1;
>       e1 is put into ubi->scrub
> 2. wear_leveling_worker // schedule false scrub type PEB for wl
>       scrubbing = 1
>       e1 = rb_entry(rb_first(&ubi->scrub))
> 
> The problem can be reproduced easily by running fsstress on a small
> UBIFS partition(<64M, simulated by nandsim) for 5~10mins
> (CONFIG_MTD_UBI_FASTMAP=y,CONFIG_MTD_UBI_WL_THRESHOLD=50). Following
> message is shown:
>   ubi0: scrubbed PEB 66 (LEB 0:10), data moved to PEB 165
> 
> Since scrub type source PEB has set variable scrubbing as '1', and
> variable scrubbing is checked before variable keep, so the problem can
> be fixed by setting keep variable as 1 directly if the source LEB cannot
> be locked.
> 
> Fixes: e801e128b220 ("UBI: fix missing scrub when there is a bit-flip")
> CC: stable at vger.kernel.org
> Signed-off-by: Zhihao Cheng <chengzhihao1 at huawei.com>
> ---
>   drivers/mtd/ubi/wl.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index a357f3d27f2f..8a26968aba11 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -846,7 +846,14 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
>                          goto out_not_moved;
>                  }
>                  if (err == MOVE_RETRY) {
> -                       scrubbing = 1;
> +                       /*
> +                        * For source PEB:
> +                        * 1. The scrubbing is set for scrub type PEB, it will
> +                        *    be put back into ubi->scrub list.
> +                        * 2. Non-scrub type PEB will be put back into ubi->used
> +                        *    list.
> +                        */
> +                       keep = 1;
>                          dst_leb_clean = 1;
>                          goto out_not_moved;
>                  }
> --
> 2.39.2
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> .
> 




More information about the linux-mtd mailing list