[LEDE-DEV] What optimizations does low-end MIPS userspace need?

Fri Apr 14 16:00:50 PDT 2017

I’ve been fooling around with various userspace toolchain hacks for mips, trying to make things faster globally. But then I realized: I don’t know what “faster” is. Or what “better” is, in general.

I want my Python/Firmata apps to launch faster on mt7688, but there is a clear metric for winning there; I don’t need your help. :-)

The regular operation that takes the most time for me on low-end boxes is reloading dhcp, especially from LuCI. Like, 56 hosts is noticeable.[1] I could instrument /etc/init.d/dhcp. That seems crazy, but there is a little method to that madness: anything that speeds up init.d/dhcp unmodified is probably also speeding up busybox and/or libc.

(I barely notice dnsmasq reload on the MT7621: 0.75s wall time. Nice CPU for $30-40. Shame mine are still crashy...)

Making busybox or libc faster for low-end machines has some benefit. But is speed there really an issue?

There was a very useful discussion about free space on jffs2 in terms of erase blocks, since some machines are out of them. 32M/4M machines are on the chopping block, right? So anything that can get the squashfs down to an erase block boundary or two might be a win. For example, I got musl building in mips16 mode, and it was a lot less hack-y than I expected. But mips->mips16 executables in squashfs are not that much smaller. libc went from 600k to 450k, but after xz -0, it was 236k -> 211k. All that for 15k in squashfs. Of course, that’s still 150k less code to be faulted in.

I’m looking at some MTD mappings. I don’t understand if the kernel partition is fixed-length on important platforms. If it isn’t, then I can start saving memory there; I always wanted to see if you could realistically run MIPS16 in parts of the kernel...

--
Jay Carlson
nop at nop.com

[1]: Yes, I could put them all those extra hosts in a /etc/dnsmasq.conf, but what’s the fun in that?

==== stop reading here to avoid my repeated craziness ====

For the little routers, I still think it would be fun to borrow a trick from AIX, and use MIPS “wired” TLB entries to map 512k or 1M of physical memory into every process; then you put the text of libc, libgcc, and/or busybox in there. There’d be less special handling than you’d think; you’d just have to special-case the mmaping of text segments of those libraries, since we already have a copy. Once that text is fixed in memory, we’ve also set the load address for the DSO. mmap the .data normally.

The big motivation is you will never take a page fault or even a TLB refill for code in libc/busybox. With faster processors with bigger/more associative cache, this shouldn’t be as big a deal though.

The system-wide nature of the page makes it a little like the vdso. A really *big* vdso.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20170414/b524d783/attachment-0001.sig>