Lots of fastmap writes
Zhihao Cheng
chengzhihao1 at huawei.com
Mon Jun 17 06:21:24 PDT 2024
在 2024/6/17 19:20, Rickard x Andersson 写道:
> On 6/14/24 14:28, Zhihao Cheng wrote:
>> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>>> On 6/4/24 03:52, Zhihao Cheng wrote:
>>
>> [...]
>>>>
>>>> BTW, after applying the patches, the kernel should run on a new
>>>> flash, the improved wear-leveling algorithm cannot rescue the worn
>>>> out image.
>>>>
>>>
>>> Thanks for the patches!
>>>
>>> I have backported the patches to Linux kernel 6.1. Do you think the
>>> patches are safe to apply to Linux kernel 6.1?
>>
>> Yes, it's okay. I have backported the patches to our product(kernel
>> v5.10) and it works fine.
>
> Thanks! I backported the patches to Linux 6.1 and did run my own stress
> test for a few days. (On another device with fresh flash memory.) It
> seems like the wear of the fastmap physical blocks (0-63) is a lot less
> now with the patches applied, which is good.
>
> However I got this problem after almost 3 days of stress testing (file
> system is set to read only mode):
>
>
> [ 7885.036577][ T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data moved
> to PEB 627
> [83721.724621][ T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data moved
> to PEB 7
> [83721.832521][ T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data moved
> to PEB 5
> [83784.750714][ T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved to
> PEB 2
> [165812.657934][ T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data moved
> to PEB 18
> [166748.055242][ T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved to
> PEB 837
> [166834.742451][ T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved to
> PEB 43
Looks like that some of PEBs have met the bitflip errors.
> [239986.496840][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan:
> corrupt empty space at LEB 3519:101376
> [239986.506809][T31387] UBIFS error (ubi2:0 pid 31387):
> ubifs_scanned_corruption: corruption at LEB 3519:101376
> [239986.519742][T31387] UBIFS error (ubi2:0 pid 31387):
> ubifs_scanned_corruption: first 8192 bytes from LEB 3519:101376
> [239986.532052][T31387] 00000000: fffffffe ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
The data content(0xfffffffe) is weird, shouldn't it be '0xffffffff'? One
bit flips. and there is no ECC error messages!
> [239986.532230][T31387] 00000020: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532450][T31387] 00000040: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532607][T31387] 00000060: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532732][T31387] 00000080: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
> ...
>
> [239986.603283][T31387] 00001000: fffffffe ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
Here is too.
> [239986.603667][T31387] 00001020: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
> ...
>
> [239986.707743][T31387] 00001fe0: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
>
> [239986.707894][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan: LEB
> 3519 scanning failed
> [239986.724625][T31387] UBIFS error (ubi2:0 pid 31387): do_commit:
> commit failed, error -117
> [239986.734335][T31387] UBIFS warning (ubi2:0 pid 31387):
> ubifs_ro_mode.part.0: switched to read-only mode, error -117
> [239986.748276][T31387] CPU: 0 PID: 31387 Comm: sync Kdump: loaded Not
> tainted 6.1.55-axis9-devel #1
> [239986.757327][T31387] Hardware name: Freescale i.MX6 SoloX (Device Tree)
> [239986.764095][T31387] unwind_backtrace from show_stack+0x18/0x1c
> [239986.770208][T31387] show_stack from dump_stack_lvl+0x24/0x2c
> [239986.776215][T31387] dump_stack_lvl from do_commit+0xc0/0x528
> [239986.782167][T31387] do_commit from ubifs_sync_fs+0x84/0x98
> [239986.787991][T31387] ubifs_sync_fs from iterate_supers+0x9c/0x118
> [239986.794268][T31387] iterate_supers from ksys_sync+0x54/0x8c
> [239986.800175][T31387] ksys_sync from sys_sync+0x10/0x18
> [239986.805492][T31387] sys_sync from ret_fast_syscall+0x0/0x64
> [239986.811394][T31387] Exception stack(0xc81b5fa8 to 0xc81b5ff0)
> [239986.817314][T31387] 5fa0: 00000072 be8b5d44
> 00000001 be8b5d44 00000000 004e5299
> [239986.826423][T31387] 5fc0: 00000072 be8b5d44 00000000 00000024
> 004a12cd b6f74ce8 00000000 004f806c
> [239986.835530][T31387] 5fe0: 004f8f14 be8b5bac 004e529f b6ef4e58
>
> Is the above error something you have seen before?
I met this kind of error(corrupt empty space) for several times (both
v4.4 and v5.10), to be honest, I have no idea how it happens. it looks
like that something wrong happens on flash(eg. uncorrected bitfilps).
>
>>>
>>> Another thing, would it not be possible to rescue that particular
>>> worn out device by simply turning fastmap off on that device?
>>>
>>
>> Can I regard the rescuing as making erase counters become normal
>> again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all
>> PEBs can be rescued, according to get_peb_for_wl().
>> For example: PB, PC cannot be rescued, unless PA is taken for writing
>> and then wl is just right scheduled.
>>
>> ubi->free tree:
>> 29600(PB)
>> 1(PA) 29600(PC)
>
> I mean that I think that the badly worn device could be made usable
> again by turning off fastmap. I mean would it not work properly? I do
> however understand that the first 64 physical erase blocks would not be
> used in practice since the erase counts of those blocks are very high.
> But would not the filsystem work OK? Or am I missing something?
>
I think the first 64 PEBs could be used when the number of free PEBs
belows 64, for example UBI runs out of space or there are many erasing
works not being executed before getting a free PEB.
More information about the linux-mtd
mailing list