[RFC] Removing FIXMEs in mtdchar.c: mtd_{read,write} (was Re: Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures)
Grant Erickson
marathon96 at gmail.com
Fri Apr 1 11:26:41 EDT 2011
On 3/16/11 10:28 AM, Grant Erickson wrote:
> I've an OMAP3 ARM-based embedded system with 256 MiB of NAND flash and 64 MiB
> of RAM on Linux 2.6.32 in which both sys_mount (via mount) and sys_read (via
> fw_setenv) occasionally fail with "page allocation failure. order:5,
> mode:0xd0".
>
> In the analysis I've done so far, sys_mount funnels down to jffs2_scan_medium
> which eventually calls kmalloc with a size of 128 KiB and flag GFP_KERNEL:
>
> sw/tps/linux/linux/fs/jffs2/scan.c:
> ...
> 120 /* Respect kmalloc limitations */
> 121 if (buf_size > 128*1024)
> 122 buf_size = 128*1024;
> 123
> 124 D1(printk(KERN_DEBUG "Allocating readbuf of %d bytes\n",
> buf_size));
> 125 flashbuf = kmalloc(buf_size, GFP_KERNEL);
> 126 if (!flashbuf)
> 127 return -ENOMEM;
> 128 }
> ...
>
> The sys_read case winds down to mtd_read which eventually calls kmalloc with a
> size of 128 KiB (to cover a single NAND erase blcok) and flag GFP_KERNEL:
>
> sw/tps/linux/linux/drivers/mtd/mtdchar:
> ...
> 161 if (count > MAX_KMALLOC_SIZE)
> 162 kbuf=kmalloc(MAX_KMALLOC_SIZE, GFP_KERNEL);
> 163 else
> 164 kbuf=kmalloc(count, GFP_KERNEL);
> ...
>
> Ostensibly this occurs because of memory fragmentation where any of the lower
> order blocks are are available must be non-contiguous.
>
> ...
>
> The system is currently configured with the SLAB allocator. Has anyone found
> better fragmentation and low-memory performance with the default SLUB or
> embedded SLOB allocators? How about tweaking:
>
> vm.min_free_kbytes
>
> Anyone met with success there?
For anyone following this thread, FWIW, I was able to reduce statistically
but not eliminate the likelihood of this issue occurring with set in
sysctl.conf:
vm.min_free_kybtes = 2048
However, this problem is all about fragmentation, so a solution such as this
will never really guarantee that this problem goes away entirely.
In the meantime, I've been contemplating a more permanent solution in
mtdchar.c:mtd_{read,write} and, before pressing ahead with either, wanted to
get some feedback on which might have upstream integration support.
1) Simpler but Use More Memory
This approach keeps the code, more or less, as is; however, rather than
failing outright, it continues to attempt to allocate smaller and smaller
blocks (dividing by two each time) until either it succeeds or hits the
minimum (either count or PAGE_SIZE).
2) More Complex but Use Less Memory
This approach seems to be what was intimated in the "FIXME" comments from
2005 found in this code and maps in and pins user pages for the read or
write request using get_user_pages. Where possible, adjacent pages are
grouped together into iovec extents and then mtd_{read,write} are called in
a loop for each iovec covering that extent of mapped pages in a manner
similar to {read,write}v having been called from user space.
Comments welcomed.
Regards,
Grant
More information about the linux-mtd
mailing list