Hi Thomas:
OK. I'll re-do the patch, and post it to linux-mtd and linux-mm in several days.
Then wait for comments.
Thanks
Dengchao
Thomas.Betker at rohde-schwarz.com
2015-11-12 17:47 To
deng.chao1 at zte.com.cn,
cc
chao.deng at linaro.org, Joakim Tjernlund <joakim.tjernlund at transmode.se>, "'Li Jiaxin'" <lijiaxin at top-vision.cn>, linux-mtd <linux-mtd at lists.infradead.org>, linux-mtd <linux-mtd-bounces at lists.infradead.org>, Ming Liu <liu.ming50 at gmail.com>, lizhenwei <lizhenwei at top-vision.cn>, wangzaiwei <wangzaiwei at top-vision.cn>
Subject
Re: Another JFFS2 deadlock, kernel 3.4.11
Hello Deng:
> My patch makes jffs2_garbage_collect_pass return 0 NOT error when
> it can not get page lock. This means "try again".
> No matter where jffs2_garbage_collect_pass is called, it will
> always loop until it gets its goal.
> However wangzaiwei's doubt is somehow reasonable.
> jffs2_garbage_collect_pass is not only called by gc thread but also
> by jffs2_reserve_space, this will introduce a living lock in a rare
situation.
> Consider this:
> The disk is almost full, this means jffs2_reserve_space may call
> jffs2_garbage_collect_pass to get free space when performing write
operation.
> Then Thread A has acquired the page lock when it is now writing,
> and its priority is low.
> Thread B is a rt thread, and its priority is higher than A, is
> also writing. If B preempts A when A is holding the page lock, and
> in the same time B calls jffs2_reserve_space->jffs2_collect_pass to
> acquire the page lock, living lock occurs: B will always loop to
> wait A to release the page lock which is preemptted by B itself.
>
> To solve this, I make jffs2_reserve_space to sleep a while when it
> finds jffs2_garbage_collect_pass cannot fetch the page lock.
>
> Still,I agree with Thomas that my patch is too heavy. It will be
> much better if we find way to just modify jffs2_garbage_collect_pass
> to avoid the original deadlock.
> But I think the fix is too tricky to me, I have not got any idea yet.
I would still think it's a goood idea if you sent your current patch to
linux-mtd and linux-mm. At the moment, it's the only solution we got, and
perhaps somebody on linux-mm will find a better way. And your (old) patch
has been running happily on our devices for almost two years now, so I can
provide a Tested-by:.
@wangzaiwei: Your test scripts have actually run for 24 hours on my device
without any problems (I had to stop it this morning).
Best regards,
Thomas Betker