[PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

Fri Mar 9 10:58:03 EST 2012

On Thu, Mar 8, 2012 at 6:47 PM, Richard Earnshaw
<Richard.Earnshaw at arm.com> wrote:
> On 08/03/12 17:21, Nicolas Pitre wrote:
>> On Thu, 8 Mar 2012, Richard Earnshaw wrote:
>>
>>> On 02/03/12 21:15, Nicolas Pitre wrote:
>>>> So, to me, the gcc documentation is perfectly clear on this topic.
>>>> there really _is_ a guarantee that those asm marked variables will be in
>>>> the expected registers on entry to the inline asm, given that the
>>>> variable is _also_ listed as an operand to the asm statement.  But only
>>>> in that case.
>>>>
>>>> It is true that gcc may reorder other function calls or other code
>>>> around the inline asm and then intervening code can clobber any
>>>> registers.  Then it is up to gcc to preserve the variable's content
>>>> elsewhere when its register is used for other purposes, and restore it
>>>> when some inline asm statement is referring to it.
>>>>
>>>> And if gcc does not do this then it is buggy.  Version 3.4.0 of gcc was
>>>> buggy.  No other gcc versions in the last 7 years had such a problem or
>>>> the __asmeq macro in the kernel would have told us.
>>>>
>>>>> Or, to summarise another way, there is no way to control which register
>>>>> is used to pass something to an inline asm in general (often we get away
>>>>> with this, and there are a lot of inline asms in the kernel that assume
>>>>> it works, but the more you inline the more likely you are to get nasty
>>>>> surprises).
>>>>
>>>> This statement is therefore unfounded and wrong.  Please direct the
>>>> tools guy who mislead you to the above gcc documentation.
>>>>
>>>
>>> The problem is not really about re-ordering functions but about implicit
>>> functions that come from the source code; for example
>>>
>>> int foo (int a, int b)
>>> {
>>>   register int x __asm__("r0") = 33;
>>>
>>>   register int c __asm__("r1") = a / b; /* Ooops, clobbers r0 with
>>> division function call.  */
>>>
>>>   asm ("svc 0" : : "r" (x));
>>> }
>>>
>>> There's nothing in the specification to say what happens if there's a
>>> statement in the code that causes an implicit clobber of your assembly
>>> register.
>>
>> I'm sure gcc is full of implicit behaviors that are not mentioned in
>> the specification.  But as long as the specification is respected, then
>> there is no need to mention any unobservable side effects from a program
>> flow point of view, right?
>>
>> Why wouldn't gcc be able to respect the documented feature by
>> preventing live variable from being clobbered and reloading them in
>> the specified register at the inline asm entry point, just like it does
>> for function calls?
>>
>> Here's an example code that shows that, unfortunately, gcc is still
>> broken with regards to the documented behavior:
>>
>> extern int bar(int);
>> int foo(int y)
>> {
>>         register int x __asm__("r1") = 33;
>>         y += bar(x);
>>         asm ("@ x should be live in %0 here" : "+r" (x) : "r" (y));
>>         y += bar(x);
>>         asm ("@ x should be live in %0 here" : "+r" (x) : "r" (y));
>>         return x;
>> }
>>
>> Result is:
>>
>> foo:
>>         stmfd   sp!, {r4, lr}
>>         mov     r4, r0
>>         mov     r0, #33
>>         bl      bar
>>         add     r4, r0, r4
>>         @ x should be live in r1 here
>>         mov     r0, r1
>>         bl      bar
>>         add     r0, r0, r4
>>         @ x should be live in r1 here
>>         mov     r0, r1
>>         ldmfd   sp!, {r4, lr}
>>         bx      lr
>>
>> To me this is clearly a bug if gcc is not able to meet the documented
>> expectation.  And the documented expectation is not at all unreasonable.
>>
> No, in this case it is presumed that /you/ know that calling bar() will
> modify x.  Thus the code is either well defined (if you know what is in
> r1 after each call to bar), or undefined (if you can't say anything
> about r1 after each call).

It could be argued that since the set of registers involved in the PCS
are well-known, then if the programmer assigns a variable to one of
those registers, then that is a conscious aliasing of the variable
with a global register which can be destroyed at any time as a
consequence of the ABI.  Because there are few guarantees about how
the compiler will or won't transform the code, this suggessts that
asm("rX") annotations can't work reliably for r0-r3 or r12 with the
ARM PCS.

Indeed, the GCC docs do in fact have this to say:

    "register int *p1 asm ("r0") = ...;
    register int *p2 asm ("r1") = ...;
    register int *result asm ("r0");
    asm ( [...] );

[...] beware that a register that is call-clobbered by the target ABI
will be overwritten by any function call in the assignment including
library calls for arithmetic operators.  Also a register may be
clobbered when generating some operations, like variable shift, memory
copy or memory move on x86.  Assuming it is a call-clobbered register,
this may happen to `r0' above by the assignment to `p2'.  Ig you have
to use such a register, use temporary variables for expressions
between the register assignment and use:

    int t1 = ...;
    register int *p1 asm("r0") = ...;
    register int *p2 asm("r1") = t1;
    register int *result asm("r0");
    asm ( [...] )"

But this is at least somewhat in conflict with "The compiler's data
flow analysis is capable of determining where the specified registers
contain live values, and where they are available for other uses."

It also seems to assume -O0 type behaviour where the compiler is doing
a straightforward sequential translation of the code.  Why it is
guaranteed that the assignment to p2 now certainly does not clobber p1
(even as a side effect), what the implied aliasing of result with p1
actually guarantees (or whether the compiler really understands it at
all); or what constraints there are on the compiler reordering or
inserting random extraneous code into the above, I have no idea.  Such
assumptions don't feel very safe in the presence of optimisation.

In other words, all sorts of undocumented guarantees beyond the C
language are needed for it even to be possible to interpret what the
above code examples should mean in the first place.

The documentation leaves a lot of questions unanswered, but it does at
least suggest that other arches have the same kind of potential
pitfalls that we observed on ARM.

Register variables feel like a red herring though.  We're only using
those because we can't do the needful thing and actually desscribe
these constraints in the asm constraints (which would seem to be the
right place).  We specifically don't care where those values are
except at the boundaries of the asm block itself.

Is there a reason why ARM gcc doesn't provide the ability to specify
such exact-register constraints, or is this more for historical
reasons?  It is possible?

Cheers
---Dave