a UBIFS image makes task pdflush blocked > 120 seconds
Artem Bityutskiy
dedekind1 at gmail.com
Fri Nov 13 10:52:10 EST 2009
On Fri, 2009-11-13 at 16:09 +0100, Norbert van Bolhuis wrote:
> OK, so now I understand commit_sem is a RW semaphore which can be acquired multiple
> times by readers but only once for writers. I failed to see that initially.
> I think this problem is not related to the commit_sem though.
OK.
> I changed the logging a bit and added the PID info.
> The problem is indeed caused by a third task holding the "MTD chip lock". This
> blocks the application which is blocking pdflush.
OK.
> This 3rd task is the UBI background thread. It starts to erase many NOR PEBs very soon.
The only reason I see why it would do this is because you attach an
empty flash to UBI. In this case, UBI has to format it. And it is doing
the formatting asynchronously, in the background thread.
The idea was that we can allow using the UBI volume even if it is not
fully formatted, and format in parallel. For NAND with its fast erase it
works fine. With NOR it appears to be not so goot.
Please, attach UBI, then wait until the UBI background thread is done,
and then mount and start using UBIFS.
The other option is to format the flash _properly_ before attaching it
to UBI. In this case the background thread will not have to do that job.
Use ubiformat for this:
http://www.linux-mtd.infradead.org/faq/ubi.html#L_format_mtd
Please, confirm that you do not have this problem in that case.
> To be absolutely certain I also added log entries for the do_erase_oneblock
> function.
> To better reproduce this problem I decreased the warning threshold from 120 sec to 60 secs.
>
> Here's the detailed info.
>
> log entries (LE:xyz is the log entry number):
> ---------------------------------------------
> LE:0235, TIME=3984.244, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0236, TIME=3984.836, pid=1236, M(xy=04)=commit_sem, make_reservation, Req DOWN 1 <- application creating a file
> LE:0237, TIME=3984.836, pid=1236, M(xy=05)=commit_sem, make_reservation, DOWN 1
> LE:0238, TIME=3984.836, pid=1236, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
> LE:0239, TIME=3984.836, pid=1236, M(xy=03)=io_mutex, reserve_space, LOCK 1
> LE:0240, TIME=3985.100, pid=0918, M(xy=01)=do_erase_oneblock, END <- ubi_bgt0d continuing erasing LEBs
> LE:0241, TIME=3985.100, pid=0918, M(xy=00)=do_erase_oneblock, START
> <omitted many do_erase_oneblock END/START entries>
> LE:0275, TIME=3998.608, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0276, TIME=3999.256, pid=0108, M(xy=04)=commit_sem, make_reservation, Req DOWN 1
> LE:0277, TIME=3999.256, pid=0108, M(xy=05)=commit_sem, make_reservation, DOWN 1
> LE:0278, TIME=3999.256, pid=0108, M(xy=02)=io_mutex, reserve_space, Req LOCK 1 <- pdflush blocks here
> LE:0279, TIME=3999.444, pid=0918, M(xy=01)=do_erase_oneblock, END
> <omitted many do_erase_oneblock END/START entries>
> LE:0433, TIME=4062.400, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0434, TIME=4062.400, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0435, TIME=4063.124, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0436, TIME=4063.124, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0437, TIME=4063.308, pid=0004, M(xy=0a)=pdflush error report
> LE:0438, TIME=4066.120, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0439, TIME=4066.120, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0440, TIME=4066.876, pid=1236, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0441, TIME=4066.876, pid=1236, M(xy=07)=io_mutex, release_head, UNLOCK <- application unlocks io_mutex
> LE:0442, TIME=4066.876, pid=1236, M(xy=08)=commit_sem, finish_reservation, Req UP
> LE:0443, TIME=4066.876, pid=1236, M(xy=09)=commit_sem, finish_reservation, UP
> LE:0444, TIME=4066.880, pid=0108, M(xy=03)=io_mutex, reserve_space, LOCK 1 <- pdflush can continue
> LE:0445, TIME=4066.880, pid=0108, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0446, TIME=4066.880, pid=0108, M(xy=07)=io_mutex, release_head, UNLOCK
> LE:0447, TIME=4066.880, pid=0108, M(xy=08)=commit_sem, finish_reservation, Req UP
> LE:0448, TIME=4066.880, pid=0108, M(xy=09)=commit_sem, finish_reservation, UP
>
>
>
> relevant processes
> ------------------
> pdflush D 00000000 0 108 2
> Call Trace:
> [ceb51b50] [7f64db78] 0x7f64db78 (unreliable)
> [ceb51c10] [c0008b24] 0xc0008b24 __switch_to
> [ceb51c30] [c0291e40] 0xc0291e40 schedule
> [ceb51c80] [c0292d60] 0xc0292d60 __mutex_lock_slowpath
> [ceb51cb0] [c013d5fc] 0xc013d5fc make_reservation
> [ceb51d20] [c013ddf8] 0xc013ddf8 ubifs_jnl_write_inode
> [ceb51d60] [c014531c] 0xc014531c ubifs_write_inode
> [ceb51d80] [c01410ec] 0xc01410ec ubifs_writepage
> [ceb51db0] [c005b124] 0xc005b124 __writepage
> [ceb51dc0] [c005b948] 0xc005b948 write_cache_pages
> [ceb51e60] [c005babc] 0xc005babc do_writepages
> [ceb51e70] [c00a3638] 0xc00a3638 __writeback_single_inode
> [ceb51ec0] [c00a3bfc] 0xc00a3bfc generic_sync_sb_inodes
> [ceb51f00] [c00a439c] 0xc00a439c writeback_inodes
> [ceb51f20] [c005c7e0] 0xc005c7e0 wb_kupdate
> [ceb51f80] [c005cfdc] 0xc005cfdc pdflush
> [ceb51fd0] [c003bd88] 0xc003bd88 kthread
> [ceb51ff0] [c00114a0] 0xc00114a0 kernel_thread
>
>
> ubi_bgt0d R running 0 918 2
> Call Trace:
> [ce6e1b80] [00001032] 0x001032 (unreliable)
> [ce6e1c40] [c0008b24] 0xc0008b24 __switch_to
> [ce6e1c60] [c0291e40] 0xc0291e40 schedule
> [ce6e1cb0] [c02926d4] 0xc02926d4 schedule_timeout
> [ce6e1cf0] [c002f684] 0xc002f684 msleep
> [ce6e1d00] [c01ee900] 0xc01ee900 do_erase_oneblock
> [ce6e1d60] [c01e98a0] 0xc01e98a0 cfi_varsize_frob
> [ce6e1da0] [c01ecdc4] 0xc01ecdc4 cfi_amdstd_erase_varsize
> [ce6e1db0] [c01e30b8] 0xc01e30b8 part_erase
> [ce6e1dc0] [c01e5800] 0xc01e5800 concat_erase
> [ce6e1e20] [c01e30b8] 0xc01e30b8 part_erase
> [ce6e1e30] [c01f8134] 0xc01f8134 do_sync_erase
> [ce6e1ec0] [c01f83f0] 0xc01f83f0 ubi_io_sync_erase
> [ce6e1f40] [c01f9658] 0xc01f9658 erase_worker
> [ce6e1f70] [c01f8cc0] 0xc01f8cc0 do_work
> [ce6e1f90] [c01fa078] 0xc01fa078 ubi_thread
> [ce6e1fd0] [c003bd88] 0xc003bd88 kthread
> [ce6e1ff0] [c00114a0] 0xc00114a0 kernel_thread
>
> mainctrl D 0fbd3f00 0 1236 1215
> Call Trace:
> [ce7fd9f0] [00000001] 0x000001 (unreliable)
> [ce7fdab0] [c0008b24] 0xc0008b24 __switch_to
> [ce7fdad0] [c0291e40] 0xc0291e40 schedule
> [ce7fdb20] [c01ea090] 0xc01ea090 get_chip
> [ce7fdb90] [c01eb8dc] 0xc01eb8dc cfi_amdstd_write_buffers
> [ce7fdc20] [c01e2da8] 0xc01e2da8 part_write
> [ce7fdc30] [c01e4128] 0xc01e4128 concat_write
> [ce7fdc70] [c01e2da8] 0xc01e2da8 part_write
> [ce7fdc80] [c01f7854] 0xc01f7854 ubi_io_write
> [ce7fdcb0] [c01f63d4] 0xc01f63d4 ubi_eba_write_leb
> [ce7fdd10] [c01f546c] 0xc01f546c ubi_leb_write
> [ce7fdd30] [c0148c6c] 0xc0148c6c ubifs_wbuf_write_nolock
> [ce7fdd70] [c013dd0c] 0xc013dd0c write_head
> [ce7fdd90] [c013e9e0] 0xc013e9e0 ubifs_jnl_update
> [ce7fde00] [c0143030] 0xc0143030 ubifs_create
> [ce7fde50] [c008caf8] 0xc008caf8 vfs_create
> [ce7fde70] [c0090a20] 0xc0090a20 do_filp_open
> [ce7fdf20] [c008155c] 0xc008155c do_sys_open
> [ce7fdf40] [c001167c] 0xc001167c ret_from_syscall
>
>
> relevant process (priorities)
> -----------------------------
> F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
> 1 S root 108 108 2 0 80 0 - 0 5cf64 22:16 ? 00:00:00 [pdflush]
> 1 S root 918 918 2 0 75 -5 - 0 1fa2dc 22:16 ? 00:00:00 [ubi_bgt0d]
> 0 S root 1236 1236 1221 0 90 10 - 12617 49354 22:17 ttyS1 00:00:00 mainctrl
>
>
>
> So the problem is that the application holding the io_mutex is blocked by the ubi bg task
> because of the "MTD chip lock".
Right, and the task is formatting your flash. Formatting means that for
each eraseblock it
1. erases (very slow on NOR!)
2. writes the UBI headers.
Your write task (pid=1236) is then blocked in CFI. If it is unlucky, it
may even wait for several block erasures, and then be able to take the
lock.
> I'm happy I understand the problem now, I guess it's not *that* serious.
> It does cause unnecessary delays however which I'd like to get rid of.
Try the formatting.
> I guess it wouldn't make sense to somehow release the per-journalhead io_mutex whenever
> blocked by the "MTD chip lock" ?
> How can I prevent the UBI background thread to erase so many NOR LEBs so soon ?
> (I hope the only option is not by creating a 156MB UBIFS image (preprogrammed by manufacturing) for
> our 156 MB NOR partition).
No. The option is to use ubiformat. You can use this utility for
formatting, and also for flashing images, in which case it will flash
data, and the rest of the eraseblocks will be formatted.
> Would it make sense to lower the UBI background thread task prio ?
That would be a hack. You can even add a 'yield()' there, that would
also be a hack.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
More information about the linux-mtd
mailing list