UBIFS, Memory fragmentation problem

Tomasz Stanislawski t.stanislaws at samsung.com
Mon Apr 26 09:39:15 EDT 2010


Dear Mr. Artem Bityutskiy,
Recently, I was developing a platform that utilizes UBIFS. An 
interesting problem
was encountered. During booting, UBIFS generates error messages and it sets
file system into read only mode. Please look to appendix A. I have 
investigated
problem and according to my analysis observed problems are caused by severe
memory fragmentation. The platform is based on kernel 2.6.29.

Few kernel errors where found in system logs. Please look to appendix A. 
It looks
that this failure was caused by memory fragmentation. Look at two 
following lines:

DMA: 186*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
   0*2048kB 0*4096kB 0*8192kB = 760kB
DMA: 998*4kB 146*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
   0*2048kB 0*4096kB 0*8192kB = 5160kB

There is no contiguous memory block of size 16 kB. Please consider 
following
scenario:

  Assume that program reached line 916 in file.c (look to Appendix B).
Now do_writepage function is executed.  Now function 
ubifs_jnl_write_data is
called. Inside function ubifs_jnl_write_data operation kmalloc is called
(Appendix C, line 697). Driver tries to allocate slightly more than 8 kiB.
Unfortunately, allocator finds no contiguous memory block long enough.  It
wakes up kswapd daemon. The daemon tries to drop page cache to disk.  
Storing
data to ubifs partition calls ubifs_jnl_write_data again.  Once again it 
tries
to allocate slightly more than 8 KiB. Allocator detects allocation 
operation
from procedure called to retain memory. In order to avoid endless loop 
stack
dump is generated and kmalloc fails. Failure of ubifs_jnl_write_data causes
failure of kswapd action. Ubifs driver executes code in lines 926-929 
setting
UBI partition in read-only mode. Since now system becomes unstable, all 
writing
operation to root file system are denied.

  Simple workaround for this problem is changing all kmalloc/kfree to
vmalloc/vfree. This functions creates virtual memory mapping, so there 
is no
need to find contiguous memory blocks. Such a patch was attached in the 
file
'kmalloc2vmalloc.patch'. Function kmalloc is used often is ubifs driver 
so it is
possible that the problem might appear somewhere else. Static buffer cannot
be used because function ubifs_jnl_write_data is both reentrant and in some
sense 'recursive'. Function ubifs_jnl_write_data calls kmalloc, which calls
__alloc_pages_internal.  This function wakes up kswapd daemon. In order 
to drop
page cache or buffers it tries to initiate UBIFS operations which include
calling ubifs_jnl_write_data.

  Proposed solution is not sufficient IMHO. A failure of kmalloc 
operations may occur
sooner or later, causing system crash or malfunction. There are numerous
calls to kmalloc inside UBIFS code. I wanted to ask you if it makes 
sense to change
all of them to vmalloc interface. Have you run into such problems with 
memory
fragmentation?

I hope you find this information useful.

Yours sincerely,
Tomasz Stanislawski


  * Appendix A *

<4>[ 1018.882720] <4>kswapd0: page allocation failure. order:2, mode:0x4050
<4>[ 1018.887611] [<c047fec4>] (dump_stack+0x0/0x14) from [<c01a1b30>] 
(__alloc_pages_internal+0x3a8/0x3d4)
<4>[ 1018.896906] [<c01a1788>] (__alloc_pages_internal+0x0/0x3d4) from 
[<c01a1bdc>] (__get_free_pages+0x20/0x68)
<4>[ 1018.906296] [<c01a1bbc>] (__get_free_pages+0x0/0x68) from 
[<c0259b8c>] (ubifs_jnl_write_data+0x30/0x1a4)
<4>[ 1018.915751] [<c0259b5c>] (ubifs_jnl_write_data+0x0/0x1a4) from 
[<c025b3a4>] (do_writepage+0x9c/0x188)
<4>[ 1018.924943] [<c025b308>] (do_writepage+0x0/0x188) from 
[<c025b5fc>] (ubifs_writepage+0x16c/0x190)
<4>[ 1018.933838]  r7:c4258000 r6:000009e3 r5:00000000 r4:c0730fa0
<4>[ 1018.939426] [<c025b490>] (ubifs_writepage+0x0/0x190) from 
[<c01a6f70>] (shrink_page_list+0x3e4/0x7c4)
<4>[ 1018.948680] [<c01a6b8c>] (shrink_page_list+0x0/0x7c4) from 
[<c01a75ec>] (shrink_list+0x29c/0x5ac)
<4>[ 1018.957513] [<c01a7350>] (shrink_list+0x0/0x5ac) from [<c01a7b8c>] 
(shrink_zone+0x290/0x344)
<4>[ 1018.965909] [<c01a78fc>] (shrink_zone+0x0/0x344) from [<c01a817c>] 
(kswapd+0x3c4/0x560)
<4>[ 1018.973907] [<c01a7db8>] (kswapd+0x0/0x560) from [<c0171318>] 
(kthread+0x54/0x80)
<4>[ 1018.981504] [<c01712c4>] (kthread+0x0/0x80) from [<c015fdbc>] 
(do_exit+0x0/0x640)
<4>[ 1018.988826]  r5:00000000 r4:00000000
<4>[ 1018.992331] Mem-info:
<4>[ 1018.994590] DMA per-cpu:
<4>[ 1018.997173] CPU    0: hi:   18, btch:   3 usd:  16
<4>[ 1019.001891] DMA per-cpu:
<4>[ 1019.004440] CPU    0: hi:   90, btch:  15 usd:  76
<4>[ 1019.009249] <4>[ 1019.009281] <4>[ 1019.009315] Active_anon:10728 
active_file:3486 inactive_anon:10797
 inactive_file:6444 unevictable:3641 dirty:1 writeback:1 unstable:0
 free:1480 slab:2573 mapped:12344 pagetables:1118 bounce:0
<4>[ 1019.029203] DMA free:760kB min:548kB low:684kB high:820kB 
active_anon:608kB inactive_anon:688kB active_file:52kB inactive_file:0kB 
unevictable:516kB present:80264kB pages_scanned:2 all_unreclaimable? no
<4>[ 1019.047162] lowmem_reserve[]: 0 0 0
<4>[ 1019.050559] DMA free:5160kB min:1780kB low:2224kB high:2668kB 
active_anon:42304kB inactive_anon:42500kB active_file:13892kB 
inactive_file:25776kB unevictable:14048kB present:260096kB 
pages_scanned:0 all_unreclaimable? no
<4>[ 1019.070274] lowmem_reserve[]: 0 0 0
<4>[ 1019.073514] DMA: 186*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 760kB
<4>[ 1019.084267] DMA: 998*4kB 146*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 5160kB
<4>[ 1019.095285] 19144 total pagecache pages
<4>[ 1019.099144] 4569 pages in swap cache
<4>[ 1019.102719] Swap cache stats: add 66771, delete 62202, find 8700/13664
<4>[ 1019.109204] Free swap  = 13540kB
<4>[ 1019.112412] Total swap = 99992kB
<4>[ 1019.133396] 85760 pages of RAM
<4>[ 1019.135006] 1943 free pages
<4>[ 1019.137860] 32559 reserved pages
<4>[ 1019.140990] 2129 slab pages
<4>[ 1019.143814] 41254 pages shared
<4>[ 1019.146801] 4569 pages swap cached
<4>[ 1019.150247] <3>UBIFS error (pid 395): do_writepage: cannot write 
page 1 of inode 13633, error -12
<4>[ 1019.159087] <4>UBIFS warning (pid 395): ubifs_ro_mode: switched to 
read-only mode, error -12
<4>[ 1019.306567] <3>UBIFS error (pid 395): make_reservation: cannot 
reserve 529 bytes in jhead 2, error -30
<4>[ 1019.314216] <3>UBIFS error (pid 395): do_writepage: cannot write 
page 0 of inode 13633, error -30


  * Appendix B *

File fs/ubifs/file.c:910
  910         addr = kmap(page);
  911         block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT;
  912         i = 0;
  913         while (len) {
  914                 blen = min_t(int, len, UBIFS_BLOCK_SIZE);
  915                 data_key_init(c, &key, inode->i_ino, block);
  916                 err = ubifs_jnl_write_data(c, inode, &key, addr, 
blen);
  917                 if (err)
  918                         break;
  919                 if (++i >= UBIFS_BLOCKS_PER_PAGE)
  920                         break;
  921                 block += 1;
  922                 addr += blen;
  923                 len -= blen;
  924         }
  925         if (err) {
  926                 SetPageError(page);
  927                 ubifs_err("cannot write page %lu of inode %lu, 
error %d",
  928                           page->index, inode->i_ino, err);
  929                 ubifs_ro_mode(c, err);
  930         }

  * Appendix C *

File fs/ubifs/journal.c:684
  684 int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode 
*inode,
  685                          const union ubifs_key *key, const void 
*buf, int len)
  686 {
  687         struct ubifs_data_node *data;
  688         int err, lnum, offs, compr_type, out_len;
  689         int dlen = UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * 
WORST_COMPR_FACTOR;
  690         struct ubifs_inode *ui = ubifs_inode(inode);
  691          692         dbg_jnl("ino %lu, blk %u, len %d, key %s",
  693                 (unsigned long)key_inum(c, key), key_block(c, 
key), len,
  694                 DBGKEY(key));
  695         ubifs_assert(len <= UBIFS_BLOCK_SIZE);
  696                  697         data = kmalloc(dlen, GFP_NOFS);
  698         if (!data)
  699                 return -ENOMEM;
  700          701         data->ch.node_type = UBIFS_DATA_NODE;



More information about the linux-mtd mailing list