[PATCH 1/2] UBIFS: fix free log space calculation

Mon Sep 8 02:43:17 PDT 2014

On Mon, 2014-08-11 at 11:21 +0800, hujianyang wrote:
> > ---
> >  fs/ubifs/log.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
> > index ed24422..7e818ec 100644
> > --- a/fs/ubifs/log.c
> > +++ b/fs/ubifs/log.c
> > @@ -106,10 +106,14 @@ static inline long long empty_log_bytes(const struct ubifs_info *c)
> >  	h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
> >  	t = (long long)c->ltail_lnum * c->leb_size;
> >  
> > -	if (h >= t)
> > +	if (h > t)
> >  		return c->log_bytes - h + t;
> > -	else
> > +	else if (h != t)
> >  		return t - h;
> > +	else if (c->lhead_lnum != c->ltail_lnum)
> > +		return 0;
> > +	else
> > +		return c->log_bytes;
> >  }
> >  
> >  /**
> > 
> 
> Hi Artem,
> 
> After almost one month test, the error I've reported
> 
> UBIFS: background thread "ubifs_bgt0_0" started, PID 2222
> UBIFS: recovery needed
> UBIFS error (pid 2220): replay_log_leb: first log node at LEB 4:0 is not CS node
> UBIFS error (pid 2220): replay_log_leb: log error detected while replaying the log at LEB 4:0
>  magic          0x6101831
>  crc            0x81e22cd5
>  node_type      8 (reference node)
>  group_type     0 (no node group)
>  sqnum          2375337
>  len            64
>  lnum           5273
>  offs           122880
>  jhead          2
> UBIFS: background thread "ubifs_bgt0_0" stops
> 
> never came out again.
> 
> I think this patch really solve that problem. Now I want to show my
> analysis and wish you add "Cc stable" for this patch.
> 
> With the help of ubidump tool, I got the MST NODEs and LOG NODEs of
> the image which is broken.
> 
> master node:
> 
> look at LEB 1:20480 (106496 bytes left)
> scanning master node at LEB 1:20480
> 	magic          0x6101831
> 	crc            0x38ff5c67
> 	node_type      7 (master node)
> 	group_type     0 (no node group)
> 	sqnum          15799802
> 	len            512
> 	highest_inum   161
> 	commit number  7954
> 	flags          0x3
> 	log_lnum       4
> 	root_lnum      2258
> 	root_offs      7048
> 	root_len       148
> 	gc_lnum        2324
> 	ihead_lnum     2258
> 	ihead_offs     8192
> 	index_size     2511408
> 	lpt_lnum       15
> 	lpt_offs       113571
> 	nhead_lnum     15
> 	nhead_offs     114688
> 	ltab_lnum      15
> 	ltab_offs      110592
> 	lsave_lnum     0
> 	lsave_offs     0
> 	lscan_lnum     2930
> 	leb_cnt        6844
> 	empty_lebs     3362
> 	idx_lebs       23
> 	total_free     430188544
> 	total_dirty    56069504
> 	total_used     377968720
> 	total_dead     7032
> 	total_dark     31196320
> look at LEB 1:20992 (105984 bytes left)
> scanning padding node at LEB 1:20992
> 1508 bytes padded at LEB 1:20992, offset now 22528
> look at LEB 1:22528 (104448 bytes left)
> hit empty space at LEB 1:22528
> stop scanning LEB 1 at offset 126976
> 
> log nodes:
> 
> leb number     first sqnum   first node type    full/not full
>    3            15801163         ref node           full
>    4            15801288         ref node           full
>    5            15801433         ref node           full
>    6            15801598         ref node           full
>    7            15801966         ref node         not full
>    8            15800414         ref node           full
>    9            15800538         ref node           full
>   10            15800664         ref node           full
>   11            15800789         ref node           full
>   12            15800915         ref node           full
>   13            15801039         ref node           full
> 
> The min_io_size of this device is 2048 and the leb_size of it is
> 126976. Each of leb is full except 7 means no frequently power
> cutting happen because log leb will change into next one if the
> volume is remount.
> 
> We know cs node must in leb 4 as @log_lnum is 4 in mst node. But
> it is a ref node as we see it in leb 4. What happened?
> 
> Leb 7 has the highest sqnum and it is not full, that means it is
> the last log leb before an unclean power off. So we know leb 8 is
> the oldest log leb exist. But it has a higher sqnum than mst node,
> 15799802, that means mst node is written before the oldest node
> in current log area.
> 
> Each time we write a cs node, we write a new mst node. The sqnum
> of this mst node should higher than cs node. So the cs node in
> leb 4 is overlapped.
> 
> We keep an empty leb in log area by @min_log_bytes to start-up a
> new commit. When commit start, we may use this leb and set
> @min_log_bytes to zero.
> 
> Suppose old ltail_lnum is 5 and lhead_lnum is 3, leb 3 is full and
> a new commit start. We write a new cs node at the beginning of leb
> 4 and set @min_log_bytes to zero. Now, as empty_log_bytes() is wrong,
> ubifs_add_bud_to_log() may wrap leb 5, leb 6... At last, overlap
> the leb 4 which has the newest cs node and mst_node would point to,
> why not? we don't have any assertion or any protection to void this
> except the check of empty_log_bytes() with an error shows above.
> 
> Now an unfortunately power cut happens, this volume is broken.
> 
> I think this patch fixed this problem by correctly protect the log
> area. So maybe you can add "Cc stable" before you send it to mainline.
> 
> By the way, commit 8989cd69b9d2 "UBIFS: fix a race condition" fix
> a real potential race and useful for avoiding errors in log area.
> Could you please add "Cc stable" for it too?

OK, I am adding these tags:

    Cc: stable at vger.kernel.org
    Tested-by: hujianyang <hujianyang at huawei.com>

And putting the patches to the linux-next queue.

-- 
Best Regards,
Artem Bityutskiy