[PATCH v3 0/4] ARM: kernel: module PLT optimizations

Ard Biesheuvel ard.biesheuvel at linaro.org
Thu Aug 18 03:02:39 PDT 2016


As reported by Jongsung, the O(n^2) search in the PLT allocation code may
disproportionately affect module load time for modules with a larger number
of relocations.

Since the existing routines rather naively take branch instructions into
account that are internal to the module, we can improve the situation
significantly by checking the symbol section index first, and disregarding
symbols that are defined in the same module. Also, we can reduce the
algorithmic complexity to O(n log n) by sorting the reloc section before
processing it, and disregarding zero-addend relocations in the optimization.

Patch #1 merge the core and init PLTs, since the latter is virtually empty
anyway.

Patch #2 implements the optimization to only take SHN_UNDEF symbols into
account.

Patch #3 sort the reloc section, so that the duplicate check can be done by
comparing an entry with the previous one. Since REL entries (as opposed to
RELA entries) do not contain the addend, simply disregard non-zero addends
in the optimization since those are rare anyway.

Patch #4 replaces the brute force search for a matching existing entry in
the PLT generation routine with a simple check against the last entry that
was emitted. This is now sufficient since the relocation section is sorted,
and presented at relocation time in the same order.

Note that this implementation is now mostly aligned with the arm64 version
(with the exception that the arm64 implementation stashes the address of the
PLT entry in the symtab instead of comparing the last emitted entry)

v3: 
- move the SHN_UNDEF check into the switch statement, so that we only
  dereference the symbol for relocations we care about (#2)
- compare the undecoded addend values bitwise when checking for zero addends,
  rather than fully decoding the offsets and doing an arithmetic comparison
  against '-8' (or '-4' for Thumb)
- added patch #4
   
v2:
- added patch #3

Ard Biesheuvel (4):
  ARM: kernel: merge core and init PLTs
  ARM: kernel: allocate PLT entries only for external symbols
  ARM: kernel: sort relocation sections before allocating PLTs
  arm64: kernel: avoid brute force search on PLT generation

 arch/arm/include/asm/module.h |   6 +-
 arch/arm/kernel/module-plts.c | 246 ++++++++++++--------
 arch/arm/kernel/module.lds    |   3 +-
 3 files changed, 147 insertions(+), 108 deletions(-)

-- 
2.7.4




More information about the linux-arm-kernel mailing list