[PATCH 1/1] ARM : missing corrupted reg in __do_div_asm
Dave Martin
Dave.Martin at arm.com
Thu Mar 31 04:41:07 PDT 2016
On Thu, Mar 31, 2016 at 11:20:24AM +0000, 陈刚(Gangchen) wrote:
>
>
> On 03/31/2016 06:30 PM, Dave Martin wrote:
> > On Thu, Mar 31, 2016 at 07:56:05AM +0000, 陈刚(Gangchen) wrote:
> >> On 03/30/2016 10:07 PM, Dave Martin wrote:
> >>> On Wed, Mar 30, 2016 at 03:27:01AM +0000, 陈刚(Gangchen) wrote:
> >>>> On 03/29/2016 06:56 PM, Dave Martin wrote:
> > [...]
> >
> >>>>> I wonder whether the following would be cleaner than having these
> >>>>> aliased arguments:
> >>>>>
> >>>>> asm( /* ... */
> >>>>> "bl __do_div64"
> >>>>> : "+r" (__n), "=r" (__res)
> >>>>> : "r" (__base)
> >>>>> : "ip", "lr", "cc");
> >>>>> *n = __res;
> >>>>> return __n >> 32;
> >>>>>
> >>>>> (providing that GCC doesn't make a mess of the "easy" shift).
> >>>> I tried your proposal. It didn't make any difference: this is inline
> >>>> function and gcc just ignores your trick.
> >>> What doesn't work for you when using this method?
> >>>
> >>> Why does the fact that this is an inline function make a difference?
> >> With the help of other colleagues, I understand your proposal now.
> >> I create a patch and I can verify that it works!
> > Ah, OK. I was wondering whether I made a mistake somewhere.
> >
> >> Should I submit it, as it seems better than this one I sent?
> > It's up to you -- I think my approach is a bit cleaner, but your
> > approach worked too and is not vulnerable to compilers that generate
> > silly code for (uint64_t) >> 32.
> (uint64_t) >> 32 is pretty much the standard way to get upper half of an
> uint64 variable, compiler should not
> generate silly code for this.
> > Note that I only tested my code for little endian -- it should do the
> > right thing for BE, but I recommend that you try it and examine the
> > generated code, to make sure.
> I don't have a BE system to test, but I did check assembly code
> generated for BE system and didn't find
> anything wrong.
That should be enough -- the code generation is simple enough in this
case to review directly.
> I will update the new patch soon.
>
> The following is my test code and assembly dump for BE system of the
> function.
>
> typedef unsigned long long ull;
> ull mydiv64y(ull tt, unsigned base, unsigned *p)
> {
> ull t = tt;
> *p = do_div(t, base);
> *p = do_div(tt, base+1);
> return tt;
> }
>
> /mnt/2nd_disk/rdaMicro/aosp_4.4/test_modules/div_test/.tmp_test_div.o:
> file format elf32-bigarm
>
>
> Disassembly of section .text:
>
> 00000000 <mydiv64y>:
> 0: e92d 47f0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, lr}
> 4: 4607 mov r7, r0
> 6: 2600 movs r6, #0
> 8: 4698 mov r8, r3
> a: ea57 0306 orrs.w r3, r7, r6
[...]
> 14: d111 bne.n 3a <mydiv64y+0x3a>
[...]
> 30: f8c8 7000 str.w r7, [r8]
> 34: 4610 mov r0, r2
> 36: e8bd 87f0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, pc}
[...]
> 3a: 4614 mov r4, r2
> 3c: f7ff fffe bl 0 <__do_div64>
[...]
> 44: f8c8 0000 str.w r0, [r8]
[...]
> 4c: f7ff fffe bl 0 <__do_div64>
> 50: 4619 mov r1, r3
> 52: 4607 mov r7, r0
> 54: e7ec b.n 30 <mydiv64y+0x30>
This looks sensible for the BE case. I see the high part of the
remainder (r0) being stored back to *p in each case...
For the LE case, we should see r1 stored instead, IIUC.
Cheers
---Dave
More information about the linux-arm-kernel
mailing list