UBI: Race between fastmap_write and wear_leveling_worker
Anders Olofsson
pingu at mazeda.se
Wed Aug 24 04:37:05 PDT 2016
After enabling fastmap I sometimes get the following warning at boot:
[ 2.612424] ubi1: default fastmap pool size: 8
[ 2.616802] ubi1: default fastmap WL pool size: 4
[ 2.621412] ubi1: attaching mtd3
[ 2.739113] ------------[ cut here ]------------
[ 2.743623] WARNING: CPU: 0 PID: 102 at drivers/mtd/ubi/fastmap.c:832
ubi_scan_fastmap+0x145c/0x1464()
[ 2.752929] Modules linked in:
[ 2.755932] CPU: 0 PID: 102 Comm: ubiattach Not tainted 4.4.19 #24
[ 2.762077] Stack : 80916aaa 00000036 80910000 00000000 00000000
00000000 80095c08 8081fca7
8fef19d8 00000066 00000000 80904f60 8f49c400 8f592000
00000000 80095c38
00000000 803b0078 00000002 00000000 807659f8 8f5a7c14
8f5a7bfc 8012b6c0
00000009 80044808 8f49c400 8f592000 8f5a7c14 00000082
00000200 8075ccd8
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
...
[ 2.797492] Call Trace:
[ 2.799939] [<80020894>] show_stack+0x94/0xb0
[ 2.804277] [<803a17d8>] dump_stack+0x88/0xc0
[ 2.808633] [<8004497c>] warn_slowpath_common+0xa4/0xe0
[ 2.813822] [<800449dc>] warn_slowpath_null+0x24/0x30
[ 2.818860] [<80454e98>] ubi_scan_fastmap+0x145c/0x1464
[ 2.824077] [<80451754>] ubi_attach+0x2e8/0x4b4
[ 2.828585] [<80441400>] ubi_attach_mtd_dev+0x9f4/0x10fc
[ 2.833872] [<80441e10>] ctrl_cdev_ioctl+0x100/0x204
[ 2.838837] [<8018d010>] do_vfs_ioctl+0xa4/0x6e4
[ 2.843420] [<8018d6b0>] SyS_ioctl+0x60/0xa0
[ 2.847680] [<800293c8>] syscall_common+0x30/0x54
[ 2.852353]
[ 2.853906] ---[ end trace 02b9cbc0208a022b ]---
[ 2.858486] ubi1 error: ubi_scan_fastmap: Attach by fastmap failed,
doing a full scan!
After some analysis I've found that the fault is not a lost PEB as the
comment above the warning seems to indicate but instead a duplicate
entry. One of the PEBs is present both in the free list and the WL pool.
The PEB is free so scan_pool() puts it in the free list again with the
result that count_fastmap_pebs() returns a higher count than expected
due to the now duplicated entry in the free list.
I've done some debugging of how the fastmap is written and have found
the following chain of events causing the duplicate entry:
- ubi_update_fastmap() calls ubi_refill_pools() resulting in the pools
having only unused PEBs.
- ubi_update_fastmap() continues to run without wl_lock while it
allocates a PEB for the fastmap.
- wear_leveling_worker() wakes up and tries to move a PEB but fails.
I've seen both "PEB x has no VID header" and ubi_eba_copy_leb()
returning MOVE_RETRY, but both result in a goto to out_not_moved which
will call do_sync_erase() that results in the PEB being added to the
free list.
- ubi_write_fastmap() takes wl_lock and creates a fastmap which includes
a pool with used entries and the used PEB also being in the free list.
That's how far I've followed this. I guess either the worker thread
needs to stay away from the pools until ubi_update_fastmap() has
completed or this error needs to checked for and handled either in
ubi_write_fastmap() or ubi_scan_fastmap().
I do not know why I'm seeing this so often (happens roughly 1/5 to 1/10
of every boot on my device) or why it only happens with one of the two
UBI partitions, but it seems to always include the same PEB being
duplicated. According to my debug prints, the wear leveling worker is
very fond of trying and failing to move things to PEB 85 and thus
erasing it again at every boot even though no data is ever written to it
(except the EC header).
ubinfo for the partition which I'm seeing the problem on:
# ubinfo /dev/ubi1
ubi1
Volumes count: 2
Logical eraseblock size: 129024 bytes, 126.0 KiB
Total amount of logical eraseblocks: 160 (20643840 bytes, 19.7 MiB)
Amount of available logical eraseblocks: 48 (6193152 bytes, 5.9 MiB)
Maximum count of volumes 128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 40
Current maximum erase counter value: 55
Minimum input/output unit size: 2048 bytes
Character device major/minor: 250:0
Present volumes: 15, 24
# ubinfo /dev/ubi1_15
Volume ID: 15 (on ubi1)
Type: dynamic
Alignment: 1
Size: 1 LEBs (129024 bytes, 126.0 KiB)
State: OK
Name: bootblockpar
Character device major/minor: 250:16
# ubinfo /dev/ubi1_24
Volume ID: 24 (on ubi1)
Type: dynamic
Alignment: 1
Size: 65 LEBs (8386560 bytes, 8.0 MiB)
State: OK
Name: persistent
Character device major/minor: 250:25
I'm seeing this problem on multiple systems (all with same partition
layout) running both 32bit MIPS and 32bit ARM, most of them with muliple
cores but fault has also been seen at least once on a single core system.
Currently using Linux 4.4.19 but haven't seen any changes in 4.8-rc3
that should make any difference for this problem.
/Anders
More information about the linux-mtd
mailing list