[PATCH v4 1/1] riscv: optimize ELF relocation function in riscv

Charlie Jenkins charlie at rivosinc.com
Mon Dec 11 17:51:06 PST 2023


On Thu, Dec 07, 2023 at 05:02:16PM -0800, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> > The patch can optimize the running times of insmod command by modify ELF
> > relocation function.
> > In the 5.10 and latest kernel, when install the riscv ELF drivers which
> > contains multiple symbol table items to be relocated, kernel takes a lot
> > of time to execute the relocation. For example, we install a 3+MB driver
> > need 180+s.
> > We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> > type items relocation function in the arch\riscv\kernel\module.c and
> > find that there are two-loops in the function. If we modify the begin
> > number in the second for-loops iteration, we could save significant time
> > for installation. We install the same 3+MB driver could just need 2s.
> > 
> > Signed-off-by: Amma Lee <lixiaoyun at binary-semi.com>
> > Signed-off-by: Maxim Kochetkov <fido_max at inbox.ru>
> > ---
> > Changes in v4:
> > - use 'while' loop instead of 'for' loop to avoid code duplicate
> > ---
> >  arch/riscv/kernel/module.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 7c651d55fcbd..8c9b644ebfdb 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  	Elf_Sym *sym;
> >  	u32 *location;
> >  	unsigned int i, type;
> > +	unsigned int j_idx = 0;
> >  	Elf_Addr v;
> >  	int res;
> >  
> > @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  		v = sym->st_value + rel[i].r_addend;
> >  
> >  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> > -			unsigned int j;
> > +			unsigned int j = j_idx;
> > +			bool found = false;
> >  
> > -			for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > +			do {
> >  				unsigned long hi20_loc =
> >  					sechdrs[sechdrs[relsec].sh_info].sh_addr
> >  					+ rel[j].r_offset;
> > @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  					hi20 = (offset + 0x800) & 0xfffff000;
> >  					lo12 = offset - hi20;
> >  					v = lo12;
> > +					found = true;
> >  
> >  					break;
> >  				}
> > -			}
> > -			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > +
> > +				j++;
> > +				if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> > +					j = 0;
> Very interesting algorithm here. Assuming the hi relocation is after the
> previous one seems to be a good heuristic. However I think we can do
> better. In GNU ld, a hashmap of all of the hi relocations is stored and
> a list of all of the lo relocations. After all of the other relocations
> have been parsed, it iterates through all of the lo relocations and
> looks up the associated hi relocation in the hashmap.
> 
> There is more memory overhead here but I suspect it will be faster. I
> had started to mock up a hashmap implementation to see if it was faster
> but decided I should mention it here first in case somebody had some
> additional insight. 

Turns out this is a fantastic heuristic. Using a hashmap is
significantly faster than the default implementation but this algorithm
above is significantly faster than the hashmap. Using the amdgpu driver
(which is actually a collection of drivers) and is a size of about 469M
I found that the hashmap implementation is about 30% faster than the
current implementation, but this patch is 50% faster than the current
implementation. It is probably possible to write an ELF header with the
relocations sufficiently scrambled to make the hashmap faster, but I
suspect that for all "normal" programs this algorithm is faster.

I also tried a couple other smaller modules and it was faster or around
the same as the hashmap in all of them.

A lot of code has changed in this file since this patch was submitted,
can you rebase onto 6.7-rc1? Otherwise this patch is great.

Reviewed-by: Charlie Jenkins <charlie at rivosinc.com>

> 
> - Charlie
> 
> > +
> > +			} while (j_idx != j);
> > +
> > +			if (!found) {
> >  				pr_err(
> >  				  "%s: Can not find HI20 relocation information\n",
> >  				  me->name);
> >  				return -EINVAL;
> >  			}
> > +
> > +			/* Record the previous j-loop end index */
> > +			j_idx = j;
> >  		}
> >  
> >  		res = handler(me, location, v);
> > -- 
> > 2.40.1
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv



More information about the linux-riscv mailing list