JFFS2 deadlock, kernel 3.4.11

Tue Oct 2 14:04:36 EDT 2012

> >
> > Hello all,
> >
> > I have encountered multiple times a deadlock between two JFFS2 threads:
>
> [SNIP]
>
> >
> > The target system is an SoC with a dual ARMv7 (Cortex-A9), and we are
> > running the long-term 3.4.11 kernel (whose fs/jffs2/ seems to be pretty
> > close to the latest mainline kernel). The deadlock occurred when using scp
> > to copy files from a host system to the target system.
> >
> > The GC thread hangs in lock_page(page), the write thread hangs in the
> > first mutex_lock(&f->sem). The cause seems to be an AB-BA deadlock:
> >
> > jffs2_garbage_collect_live
> >     mutex_lock(&f->sem)                         (A)
> >     jffs2_garbage_collect_dnode [inlined]
> >         jffs2_gc_fetch_page
> >             read_cache_page_async
> >                 do_read_cache_page
> >                     lock_page(page) [inlined]
> >                         __lock_page             (B) ***
> >
> > jffs2_write_begin
> >     grab_cache_page_write_begin
> >         find_lock_page
> >             lock_page(page)                     (B)
> >     mutex_lock(&f->sem)                         (A) ***
> >
> > I have manually analyzed the stacks and confirmed that both threads sit on
> > the theme 'struct page'.
> >
>
> hmm, not something I have seen but your analysis seems spot on. With any luck
> you only need to move the mutex_lock in the write begin before lock_page. I
> am only guessing now though.

I had a look at jffs2_write_begin() and it looks fishy:
It can write a hole frag sucessfully but still fail in:
	if (!PageUptodate(pg)) {
		mutex_lock(&f->sem);
		ret = jffs2_do_readpage_nolock(inode, pg);
		mutex_unlock(&f->sem);
		if (ret)
			goto out_page;
	}
which seems a bit strange.

Further up we have this:
		ri.isize = cpu_to_je32(max((uint32_t)inode->i_size, pageofs));
		...
		ri.dsize = cpu_to_je32(pageofs - inode->i_size);
Why max(..) when pageofs  must be > inode->i_size for ri.dsize
to make sense?

 Jocke