[PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

Dave Martin dave.martin at linaro.org
Thu Mar 1 04:51:39 EST 2012


On Wed, Feb 29, 2012 at 02:52:38PM +0000, Stefano Stabellini wrote:
> On Wed, 29 Feb 2012, Dave Martin wrote:
> > On Wed, Feb 29, 2012 at 09:56:02AM +0000, Ian Campbell wrote:
> > > On Wed, 2012-02-29 at 09:34 +0000, Dave Martin wrote:
> > > > On Tue, Feb 28, 2012 at 12:28:29PM +0000, Stefano Stabellini wrote:
> > > 
> > > > > I don't have a very strong opinion on which register we should use, but
> > > > > I would like to avoid r7 if it is already actively used by gcc.
> > > > 
> > > > But there is no framepointer for Thumb-2 code (?)
> > > 
> > > Peter Maydell suggested there was:
> > > > r7 is (used by gcc as) the Thumb frame pointer; I don't know if this
> > > > makes it worth avoiding in this context.
> > > 
> > > Sounds like it might be a gcc-ism, possibly a non-default option?
> > > 
> > > Anyway, I think r12 will be fine for our purposes so the point is rather
> > > moot.
> > 
> > Just had a chat with some tools guys -- apparently, when passing register
> > arguments to gcc inline asms there really isn't a guarantee that those
> > variables will be in the expected registers on entry to the inline asm.
> > 
> > If gcc reorders other function calls or other code around the inline asm
> > (which it can do, except under certain controlled situations), then
> > intervening code can clobber any registers in general.
> > 
> > Or, to summarise another way, there is no way to control which register
> > is used to pass something to an inline asm in general (often we get away
> > with this, and there are a lot of inline asms in the kernel that assume
> > it works, but the more you inline the more likely you are to get nasty
> > surprises).  There is no workaroud, except on some architectures where
> > special asm constraints allow specific individual registers to be
> > specified for operands (i386 for example).
> > 
> > If you need a specific register, this means that you must set up that
> > register explicitly inside the asm if you want a guarantee that the
> > code will work:
> > 
> > 	asm volatile (
> > 		"movw	r12, %[hvc_num]\n\t"
> > 		...
> > 		"hvc	#0"
> > 		:: [hvc_num] "i" (NUMBER) : "r12"
> > 	);
> > 
> 
> OK, we can arrange the hypercall code to be like that.
> Also with your patch series it would be "_hvc" because of the .macro,
> right?

Yes, but I would avoid making too many assumptions about the final form
of that patch -- it looks like there's significant work to do there,
since I made some unsafe assumptions about how the tools work...

We might end up with a magic #define after all.

> > This is the kind of problem which goes away when out-of-lining the
> > hvc wrapper behind a C function interface, since the ABI then provides
> > guarantees about how values are mershaled into and out of that code.
> 
> Do you mean implementing the entire HYPERVISOR_example_op in assembly
> and calling it from C?
> Because I guess that gcc would still be free to mess with the registers
> between the C function entry point and any inline assembly code.

gcc can arrange for the relevant things to be already in r0-r3 and the
relevant stack slots before branching to a function just as for inline
asm.  The only differences are that the compiler cannot choose which
registers to use, and the branch cannot be optimised away by the compiler
(the CPU may be able to optimise the branch away at runtime of course,
but that's another story...)

What libc appears to do is wrap each syscall in a separate function.
This means that it's not necessary to shuffle all the arguments by
one position when invoking the actual syscall.  (The generic "syscall"
function does of course need to shuffle the arguments so as to
displace the syscall number from the first argument to r7 --
but that's hard to avoid without inlining.)

For example:

00090b50 <shmdt>:
   90b50:       e52d7004        push    {r7}            ; (str r7, [sp, #-4]!)
   90b54:       e59f7010        ldr     r7, [pc, #16]   ; 90b6c <shmdt+0x1c>
   90b58:       ef000000        svc     0x00000000
   90b5c:       e49d7004        pop     {r7}            ; (ldr r7, [sp], #4)
   90b60:       e3700a01        cmn     r0, #4096       ; 0x1000
...

Syscalls with more than 4 args still need to load the extra ones
from the stack, of course:

00090090 <getsockopt>:
   90090:       e92d0090        push    {r4, r7}
   90094:       e59d4008        ldr     r4, [sp, #8]
   90098:       e59f7010        ldr     r7, [pc, #16]   ; 900b0 <getsockopt+0x20>
   9009c:       ef000000        svc     0x00000000
...


I don't know whether that makes sense for a hypervisor... it partly
depends on how many different hypercalls there are.

By all means implement it both ways and measure the performance
difference, if possible.

Cheers
---Dave



More information about the linux-arm-kernel mailing list