[PATCH v2 0/5] minitty: a minimal TTY layer alternative for embedded systems

Tue Apr 4 11:31:36 PDT 2017

On Tue, 4 Apr 2017, Andy Shevchenko wrote:

> On Tue, Apr 4, 2017 at 8:59 PM, Tom Zanussi <tom.zanussi at linux.intel.com> wrote:
> > On Tue, 2017-04-04 at 20:08 +0300, Andy Shevchenko wrote:
> >> On Tue, Apr 4, 2017 at 7:59 PM, Tom Zanussi <tom.zanussi at linux.intel.com> wrote:
> >> > On Tue, 2017-04-04 at 00:05 +0300, Andy Shevchenko wrote:
> 
> >> > I was focused at that point mainly on the kernel static size, and using
> >> > a combination of Josh Triplett's tinification tree, Andi Kleen's LTO and
> >> > net-diet patches, and my own miscellaneous patches that I was planning
> >> > on eventually upstreaming, I ended up with a system that I could boot to
> >> > shell with a 455k text size:
> >> >
> >> > Memory: 235636K/245176K available (455K kernel code, 61K rwdata,
> >> > 64K rodata, 132K init, 56K bss, 3056K reserved, 0K cma-reserved)
> 
> >> Thanks for sharing your experience. The question closer to this
> >> discussion what did you do against TTY/UART/(related) layer(s)?
> >>
> >
> > I'd have to go back and take a look, but nothing special AFIAR.
> >
> > No patches or hacks along those lines, and the only related thing I see
> > as far as config is:
> >
> >         cfg/pty-disable.scc \
> >
> > which maps to:
> >
> >         # CONFIG_UNIX98_PTYS is not set
> 
> But on your guestimation how much can we squeeze TTY/UART layer if we
> do some compile-time configuration?
> Does it even make sense or better to introduce something like minitty
> special layer instead?

For the record I more or less came along the same path as Tom, playing 
with LTO, gc-sections, syscall removal, module_param() removal, etc. At 
the end of the day you still have that 45K of TTY code just to send 
debug out, 100K of VFS even if using only ramfs, 54K of timer code even 
if there's only one simple timer available, 28K of IRQ support code even 
if there is only one type of interrupt used, etc. LTO / gc-section 
cannot automatically get rid of those unused functions because they're 
runtime selected callbacks and optimization tools no longer can do their 
magic.

At some point there is no way other than having a parallel 
implementation specifically for a limited scope to reduce both code 
footprint and runtime RAM consumption. Who need a multicore scalable VFS 
cache when there's only 256K of RAM and a single user space process 
running?

Nicolas