[PATCH 0/3] ARM 4Kstacks: introduction

Tim Bird tim.bird at am.sony.com
Sun Oct 23 15:25:10 EDT 2011

On 10/22/2011 6:36 AM, Russell King - ARM Linux wrote:
> On Sat, Oct 22, 2011 at 04:50:15PM +0800, Ming Lei wrote:
>> On Wed, Oct 19, 2011 at 6:51 PM, Arnd Bergmann<arnd at arndb.de>  wrote:
>>> On Tuesday 18 October 2011 17:26:44 Tim Bird wrote:
>>>> Even inside Sony, usage of 4K stacks is limited
>>>> to some very special cases, where memory is exceedingly
>>>> tight (we have one system with 4M of RAM).  And we
>>>> don't mind lopping off features or coding around
>>>> problem areas to support our special case.
>>> I would imagine that in those cases, you can gain more by reducing the
>>> number of threads in the system. What is the highest number of
>>> concurrent threads that you expect in a limited use case with no
>>> networking or block devices?
We have about 50 hard real-time threads, that are part of a
software stack for digital cameras that was ported over from
micro-itron.  It has taken a _LONG_ time (on the order of a
few years) to tune the Linux system using RT-preempt to run
these threads as is.  It would be very painful to re-architect this
part of the system.

Note that these threads don't have any of the issues that
people have raised about filesystem stack depth or printk
recursion, since they avoid a whole range of Linux syscalls
to avoid real-time issues (and stack size issues).

I'm looking at possibly implementing a mixed stack
size system, but I don't know if that will work, or whether
it would be acceptable upstream.

>>> If system run for some time, sometimes it may be difficult for
>>> memory allocator to allocate 2 continuous page frames even  there are
>>> many spare page frames in system because of
>>> fragment issue, so the patch does make sense.
> If memory fragmentation is an issue for this, it probably means that we
> need to switch to a software page size of 8K (or maybe 16K) rather than
> stick with the hardware 4K size.  That would be a much more reliable
> solution, especially as the L1 page table is 16K (if you're suffering
> from memory fragmentation, the first thing which'd get you is the L1
> page table allocation, not the kernel stack allocation.)
>> Anyway, it provides one option for user to apply 4k stack to avoid
>> such kind of process creation failure.
> I refer you to the comments made by people who've tried running with 4K
> stacks on x86, and their _vast_ experience of doing this.  If they say
> that it causes stack overflows, then it's a problem.
I really don't think anyone on x86 has any experience whatsoever
with anything like this.  This is on a digital camera, with a flash
filesystem, with a 10M memory budget, with just about every
in and out-of-tree patch from Linux-tiny applied and configured.
Many of the things that bloat up the kernel just aren't there at

> The possibility of a kernel stack overflow is not something that should
> be taken lightly

This kind of development is done with extensive in-house testing,
and an absolutely fixed users space. This may sound like an
isolated case, but I know Sony is not alone and that lots of
embedded products are developed like this.  I'm pretty sure
others would benefit from this patch.

We've already shipped tens of thousands of cameras
with this, with no problems, so it's certainly possible to get it

Whether to include this comes down to a question of whether
the ability of someone to get it wrong should preclude
allowing the *option* into the kernel.
   -- Tim

More information about the linux-arm-kernel mailing list