[PATCH v2 0/5] minitty: a minimal TTY layer alternative for embedded systems

Tom Zanussi tom.zanussi at linux.intel.com
Tue Apr 4 12:58:44 PDT 2017


On Tue, 2017-04-04 at 21:04 +0300, Andy Shevchenko wrote:
> On Tue, Apr 4, 2017 at 8:59 PM, Tom Zanussi <tom.zanussi at linux.intel.com> wrote:
> > On Tue, 2017-04-04 at 20:08 +0300, Andy Shevchenko wrote:
> >> On Tue, Apr 4, 2017 at 7:59 PM, Tom Zanussi <tom.zanussi at linux.intel.com> wrote:
> >> > On Tue, 2017-04-04 at 00:05 +0300, Andy Shevchenko wrote:
> 
> >> > I was focused at that point mainly on the kernel static size, and using
> >> > a combination of Josh Triplett's tinification tree, Andi Kleen's LTO and
> >> > net-diet patches, and my own miscellaneous patches that I was planning
> >> > on eventually upstreaming, I ended up with a system that I could boot to
> >> > shell with a 455k text size:
> >> >
> >> > Memory: 235636K/245176K available (455K kernel code, 61K rwdata,
> >> > 64K rodata, 132K init, 56K bss, 3056K reserved, 0K cma-reserved)
> 
> >> Thanks for sharing your experience. The question closer to this
> >> discussion what did you do against TTY/UART/(related) layer(s)?
> >>
> >
> > I'd have to go back and take a look, but nothing special AFIAR.
> >
> > No patches or hacks along those lines, and the only related thing I see
> > as far as config is:
> >
> >         cfg/pty-disable.scc \
> >
> > which maps to:
> >
> >         # CONFIG_UNIX98_PTYS is not set
> 
> But on your guestimation how much can we squeeze TTY/UART layer if we
> do some compile-time configuration?
> Does it even make sense or better to introduce something like minitty
> special layer instead?
> 
> I believe you did some research during time of that project…
> 

Yes, as a matter of fact I did, and just found some notes I took at the
time.  I didn't dive into the code in detail - that level of analysis
was supposed to come later but I did have these notes mentioning that I
thought it would show the largest savings for a single item (outside of
networking) 'if we could do it':

"- Largest is still drivers

- drivers/tty and serial is the biggest obvious win if we can do it
  - break down into granular config options
    - leave simplest possible tty/serial functionality
    - allow tailoring to specific hardware
  - also helps in effort to get rid of char devices
  - 65740/815190"

Basically 65k out of an 800k text size could be partially or mostly
saved by addressing that one item, which looks like it pretty much
matches Nicolas' numbers...

So no doubt it would be worthwhile to address one way or the other.
Whether to do that by refactoring the tty layer or partial refactoring
and creation of a parallel minimal version would best be left up to
someone who actually understands it I would think...

BTW, since I'm quoting my own notes on the subject, I thought I'd just
include the whole thing, which covers a bunch of other areas possibly
ripe for tinification, in case anyone might be interested (some of it
should be taken with a grain of salt though ;-)

Tom

--------

galileo SMALLEST_SIZE

$ size vmlinux
   text	          data			       bss	    dec	    hex	filename
 699668		   186432		       2271592	    3157692  302ebc	vmlinux

Not using this, because
 $ size xxx.o shows all 0s with LTO

----

Using this:

galileo SMALLEST_SIZE with LTO off

$ size vmlinux
   text	          data			       bss	    dec	    hex	filename
 815190		   165696		       2272760	    3253646  31a58e	vmlinux

This corresponds to LTO size:

$ size vmlinux
   text	          data			       bss	    dec	    hex	filename
 677183		   179528		       1207280	    2063991  1f7e77	vmlinux

$ ls -al arch/x86/boot/bzImage 
-rw-r--r--. 1 427264 Mar 12 22:34 arch/x86/boot/bzImage

And booted size:

Memory: 235388K/245240K available (534K kernel code, 100K rwdata, 52K rodata, 14
8K init, 64K bss, 3172K reserved, 0K cma-reserved)
virtual kernel memory layout:
    fixmap  : 0xfffa4000 - 0xfffff000   ( 364 kB)
    vmalloc : 0xd05f0000 - 0xfffa2000   ( 761 MB)
    lowmem  : 0xc0000000 - 0xcfdf0000   ( 253 MB)
      .init : 0xc10af000 - 0xc10d4000   ( 148 kB)
      .data : 0xc1085b9c - 0xc10ad120   ( 157 kB)
      .text : 0xc1000000 - 0xc1085b9c   ( 534 kB)

------
Totals - details below
------

- make ptrace configurable - this should help the hw breakpoints and x86 perf disable patches upstream
  - 5k
- remove things not needed for CONFIG_SMP
  - 5k
- support configuring out kswapd
  - about 5k in vmscan
- support configuring out vmstat
  - 0
- kernel capabilities
  - 1k
- exec domains
  - 1k
- tsc
     3030	    284	     40	   3354	    d1a	./arch/x86/kernel/tsc.o
    332		          0        0	        332	    14c	./arch/x86/kernel/tsc_msr.o
- support configuring out signals
  11852	       36           4	  11892	   2e74	./kernel/signal.o
   3188	             1	          0	      3189	    c75	./arch/x86/kernel/signal.o
  - about 15k
- kernel/pid.o simplification - more for dynamic memory - simpler pidhash
  1868	    160	      4	   2032	    7f0	./kernel/pid.o
  - about 2k
- remove kernel/exit.o
  - assume processes never exit
- remove lib/kfifo
  - about 2k
- remove kernel/irq/spurious
  - about 1k
- make sys configurable
  - about 7k
- remove xattr
  - about 4k
- /drivers total possible savings, some percentage of:
  - 136000/815190
- /kernel savings
  - say 30000/815190 savings
- /fs savings
  - 30000/815190 savings
- /arch/x86 savings
  - 20000/815190
- /mm
  - 5000/815190
- /lib
  - 10000/815190

Totals without mmu:
  146k + (2/3)*136k = 235k

  235k/815190 = 30% savings

- x86 nommu
  - about 50k

Totals with mmu:

  285k/815190 = 35% savings


Applied to the 534k boot figure, we end up with text size of:

  374k mmu
  347k nommu

We could probably go lower with more fine-grained analysis, but we may
also need to add drivers, etc.

-----
NONET details
-----

- Largest is still drivers

- drivers/tty and serial is the biggest obvious win if we can do it
  - break down into granular config options
    - leave simplest possible tty/serial functionality
    - allow tailoring to specific hardware
  - also helps in effort to get rid of char devices
  - 65740/815190 

- pci is next largest
  - assume we can break down into granular config options
    - leave simplest possible pci functionality
    - allow tailoring to specific hardware e.g. no discovery
  - 47144/815190

- drivers/base
  - simplify driver core for a small set of drivers
    - simple_char: New infrastructure to simplify chardev management
  - 25389/815190

- total possible savings, some percentage of:
  - 136000/815190

 206992	  29331	   6556	 242879	  3b4bf	./drivers/built-in.o

 65740	  16888	   3132	  85760	  14f00	./drivers/tty/built-in.o
 32077	  16680	   2688	  51445	   c8f5	./drivers/tty/serial/built-in.o
 21628	  15892	   2644	  40164	   9ce4	./drivers/tty/serial/8250/built-in.o
  47144	   1172	   2100	  50416	   c4f0	./drivers/pci/built-in.o
  25389	   1324	    112	  26825	   68c9	./drivers/base/built-in.o
  15733	    636	     20	  16389	   4005	./drivers/spi/built-in.o
  11504	    136	     28	  11668	   2d94	./drivers/clk/built-in.o
   9605	    460	     72	  10137	   2799	./drivers/thermal/built-in.o
   5066	    624	    912	   6602	   19ca	./drivers/char/built-in.o
   8531	    480	     36	   9047	   2357	./drivers/i2c/built-in.o

- 2nd largest is kernel

  - should be able to cut *something* from time and sched
    - we have a handful of processes at most
    - we have very simple time needs
  - say 30000/815190 savings

 150742	   6376	   8209	 165327	  285cf	./kernel/built-in.o

  40951	   1105	   4720	  46776	   b6b8	./kernel/time/built-in.o
  21760	   1318	    112	  23190	   5a96	./kernel/sched/built-in.o
   9800	    388	   1328	  11516	   2cfc	./kernel/irq/built-in.o
   4956	      4	      4	   4964	   1364	./kernel/locking/built-in.o
   1847	     88	    184	   2119	    847	./kernel/printk/built-in.o
   1757	     33	      0	   1790	    6fe	./kernel/rcu/built-in.o
   1408	    356	     44	   1808	    710	./kernel/power/built-in.o

- next is fs

  - completely turn off proc
    - requires userspace changes to cope with it
    - 22046/815190, 100% of this

  - simplify/featurize some core vfs?
    - e.g. namei, small set of file names, no need for complexity

  - disable vfs completely?
    - init reads executables directly from storage
    - all state in memory, no need to save anything

 133526	   1506	   1552	 136584	  21588	./fs/built-in.o
  22046	    140	     40	  22226	   56d2	./fs/proc/built-in.o

- next is arch/x86, mostly in arch/x86/kernel
  - not much to save here, maybe 10 here and there
  - maybe 3k in boot: video*
  - maybe 5k in cpu: amd, transmeta, cachinfo, etc
  - cut about 10k in arch/x86/mm for nommu

 120755	  50209	  52712	 223676	  369bc	./arch/x86/built-in.o

 100201	  29261	  19828	 149290	  2472a	./arch/x86/kernel/built-in.o

  21713	   8693	    720	  31126	   7996	./arch/x86/kernel/cpu/built-in.o
  17480	   5486	   6324	  29290	   726a	./arch/x86/kernel/apic/built-in.o
  10385	   4365	    532	  15282	   3bb2	./arch/x86/kernel/cpu/mcheck/built-in.o

  18237	    208	  30776	  49221	   c045	./arch/x86/mm/built-in.o
  14276	    412	    256	  14944	   3a60	./arch/x86/pci/built-in.o
   1345	      8	     28	   1381	    565	./arch/x86/platform/intel-quark/built-in.o
   1345	      8	     28	   1381	    565	./arch/x86/platform/built-in.o
    590	   8228	     16	   8834	   2282	./arch/x86/vdso/built-in.o
    379	  12500	      8	  12887	   3257	./arch/x86/realmode/built-in.o
    477	      0	      0	    477	    1dd	./arch/x86/lib/built-in.o

- next is mm
 
  - cut about 5k for percpu
  - cut about 40k for nommu

 119008	  13688	   1824	 134520	  20d78	./mm/built-in.o

   1358	      0        0	     1358	    54e	./mm/gup.o
  10612	     32	      24       10668	       29ac	./mm/memory.o
   1072	      0        0     1072	           430	./mm/mincore.o
   2453	      0        0		      2453	    995	./mm/mlock.o
   9918	    176        8	        10102	       2776	./mm/mmap.o
   1403	      0	      0		   1403	           57b	./mm/mprotect.o
   2155	      0          0	      2155	       86b	./mm/mremap.o
    520	      0          0       520	           208	./mm/msync.o
   4358	      0	        8    4366	      110e	./mm/rmap.o
   6355	      57	     28		         6440	   1928	./mm/vmalloc.o
    710	      0          0		     710       2c6	./mm/pagewalk.o
     92	      0	         0	          92        5c	./mm/pgtable-generic.o

- next is lib

  - no need for vsprintf if printk off, 10k

  30654	  24647	      5	  55306	   d80a	./lib/built-in.o

   9964	      0	      0	   9964	   26ec	./lib/zlib_inflate/built-in.o

-next is init

   8456	  16437	     81	  24974	   618e	./init/built-in.o



----
Net sizes, maybe later...

galileo SMALLEST_SIZE_NET with LTO off

- this is without ipv4 net-diet
- includes ipv6

$ size vmlinux
   text	          data			       bss	    dec	    hex	filename
1368973		   181184		       2288560	    3838717  3a92fd	vmlinux

---
NET details
---


- net now largest, larger than drivers (and drivers goes up too)

 465384	  13818	  17364	 496566	  793b6	./net/built-in.o

 183144	   5409	   7948	 196501	  2ff95	./net/ipv4/built-in.o
 128583	   4648	   6432	 139663	  2218f	./net/ipv6/built-in.o
 108158	   2092	   2804	 113054	  1b99e	./net/core/built-in.o
  15268	    264	      0	  15532	   3cac	./net/packet/built-in.o
  14787	    465	    148	  15400	   3c28	./net/netlink/built-in.o
   4011	    676	      0	   4687	   124f	./net/sched/built-in.o
    967	     12	      0	    979	    3d3	./net/ethernet/built-in.o

- drivers second largest

 255026	  30512	   6604	 292142	  4752e	./drivers/built-in.o

    359	     20	      0	    379	    17b	./drivers/reset/built-in.o
   2155	    152	     32	   2339	    923	./drivers/pps/built-in.o
   8870	    580	      0	   9450	   24ea	./drivers/net/phy/built-in.o
  42421	    861	      8	  43290	   a91a	./drivers/net/built-in.o
  30650	    233	      8	  30891	   78ab	./drivers/net/ethernet/stmicro/stmmac/built-in.o
  30650	    233	      8	  30891	   78ab	./drivers/net/ethernet/stmicro/built-in.o
  30650	    233	      8	  30891	   78ab	./drivers/net/ethernet/built-in.o
  47144	   1172	   2100	  50416	   c4f0	./drivers/pci/built-in.o
  11504	    136	     28	  11668	   2d94	./drivers/clk/built-in.o
  25389	   1324	    112	  26825	   68c9	./drivers/base/built-in.o
  15733	    636	     20	  16389	   4005	./drivers/spi/built-in.o
   5066	    624	    912	   6602	   19ca	./drivers/char/built-in.o
   9931	    548	     76	  10555	   293b	./drivers/thermal/built-in.o
   4927	    224	     36	   5187	   1443	./drivers/ptp/built-in.o
  65740	  16888	   3132	  85760	  14f00	./drivers/tty/built-in.o
  32077	  16680	   2688	  51445	   c8f5	./drivers/tty/serial/built-in.o
  21628	  15892	   2644	  40164	   9ce4	./drivers/tty/serial/8250/built-in.o
   8531	    480	     36	   9047	   2357	./drivers/i2c/built-in.o

- kernel next

 157407	   6376	   8209	 171992	  29fd8	./kernel/built-in.o

   9800	    388	   1328	  11516	   2cfc	./kernel/irq/built-in.o
  40951	   1105	   4720	  46776	   b6b8	./kernel/time/built-in.o
   6665	      0	      0	   6665	   1a09	./kernel/bpf/built-in.o
   1408	    356	     44	   1808	    710	./kernel/power/built-in.o
  21760	   1318	    112	  23190	   5a96	./kernel/sched/built-in.o
   4956	      4	      4	   4964	   1364	./kernel/locking/built-in.o
   1757	     33	      0	   1790	    6fe	./kernel/rcu/built-in.o
   1847	     88	    184	   2119	    847	./kernel/printk/built-in.o

- fs next

 134562	   1534	   1552	 137648	  219b0	./fs/built-in.o

   1395	    276	      4	   1675	    68b	./fs/ramfs/built-in.o
  22743	    168	     40	  22951	   59a7	./fs/proc/built-in.o
   1446	     44	      8	   1498	    5da	./fs/devpts/built-in.o

- arch/x86 next

 120755	  50209	  52712	 223676	  369bc	./arch/x86/built-in.o

    379	  12500	      8	  12887	   3257	./arch/x86/realmode/built-in.o
  14276	    412	    256	  14944	   3a60	./arch/x86/pci/built-in.o
    590	   8228	     16	   8834	   2282	./arch/x86/vdso/built-in.o
  18237	    208	  30776	  49221	   c045	./arch/x86/mm/built-in.o
    477	      0	      0	    477	    1dd	./arch/x86/lib/built-in.o
   1345	      8	     28	   1381	    565	./arch/x86/platform/intel-quark/built-in.o
   1345	      8	     28	   1381	    565	./arch/x86/platform/built-in.o
  17480	   5486	   6324	  29290	   726a	./arch/x86/kernel/apic/built-in.o
  21713	   8693	    720	  31126	   7996	./arch/x86/kernel/cpu/built-in.o
  10385	   4365	    532	  15282	   3bb2	./arch/x86/kernel/cpu/mcheck/built-in.o
 100201	  29261	  19828	 149290	  2472a	./arch/x86/kernel/built-in.o

- mm next

 119008	  13688	   1824	 134520	  20d78	./mm/built-in.o

- lib next

  33042	  24647	      5	  57694	   e15e	./lib/built-in.o

   9964	      0	      0	   9964	   26ec	./lib/zlib_inflate/built-in.o

- crypto next

  30068	    284	      0	  30352	   7690	./crypto/built-in.o

- init next

   8456	  16437	     81	  24974	   618e	./init/built-in.o




More information about the linux-arm-kernel mailing list