[PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses

Thu Oct 13 02:58:53 PDT 2016

Hi Pavel,

Thanks a lot for your testing.

On Wed, Oct 12, 2016 at 7:20 PM, Pavel Labath <labath at google.com> wrote:
> Hello Pratyush,
>
> I am sorry about the delay. I have finally had a chance to try out
> your changes today. Response inline.
>
> On 8 October 2016 at 06:10, Pratyush Anand <panand at redhat.com> wrote:
>> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath at google.com> wrote:
>>> The thing is, I have observed different behavior here depending on the
>>> exact hardware used. I don't have the exact parameters with me now,
>>> but I can look it up next week.
>>>
>>> The thing is that the spec is imprecise about what exact address the
>>> hardware can report for the watchpoint hit. I presume that is
>>> deliberate to give some leeway to implementers. The spec says the
>>> address can be anywhere in the range from the lowest memory address
>>> accessed by the instruction to the highest address watched by the
>>> watchpoint,
>>
>> I think, my patches should  be able to take care of the above condition.
> Unfortunately, the patch does not solve the problem for my hardware,
> because of the leeway you give in watchpoint_handler is not big
> enough. It does work however, if I change the line
>> if (addr + 7 < val + lens || addr > val + lene)
> to
>> if (addr + 15 < val + lens || addr > val + lene)

Yes, I missed that floating point register will be of size 16.

> I do not think we can assume that address reported by the hardware
> will be at most 7 bytes off from the address we put the watchpoint at.
> There is nothing in the spec that guarantees that, and it does not
> seem to be enough for some hardware. In fact, I am not sure we can
> assume 15 is enough either, but maybe it can do for now, until we can

Right. It might even be bigger, in case of cache maintenance instructions.

> find hardware that does not work with that (I haven't yet tried what
> happens it the watchpoint is triggered by cache management
> instructions, which can access much larger blocks of memory).

Not, sure, may be it can lie in cache line size range.

>
> For reference, the hardware in question is:
>> Processor : AArch64 Processor rev 0 (aarch64)
>> processor : 0
>> min_vddcx : 400000
>> min_vddmx : 490000
>> BogoMIPS : 38.00
>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x51
>> CPU architecture: 8
>> CPU variant : 0x2
>> CPU part : 0x201
>> CPU revision : 0
>> CPU param : 300 468 468 621 939 299 445 445 621 1077
>> Hardware : Qualcomm Technologies, Inc MSM8996pro
>
> And this is how it behaves:
> The output of the test app triggering the watchpoint (I have set a
> single-byte watchpoint at 555556705f)
>>
>> Writing to: 555556705f, size: 1
>> Writing to: 555556705e, size: 2
>> Writing to: 555556705c, size: 4
>> Writing to: 5555567058, size: 8
>> Writing to: 5555567050, size: 16
>> Writing to: 5555567040, size: 32
>
> The addresses received by the kernel:
> [  251.812166] c1   3780 hw-breakpoint: watchpoint_handler: addr:
> 555556705f, val+lens: 555556705f, val+lene: 555556705f
> [  251.820341] c1   3781 hw-breakpoint: watchpoint_handler: addr:
> 555556705e, val+lens: 555556705f, val+lene: 555556705f
> [  251.825572] c0   3782 hw-breakpoint: watchpoint_handler: addr:
> 555556705c, val+lens: 555556705f, val+lene: 555556705f
> [  251.831085] c0   3783 hw-breakpoint: watchpoint_handler: addr:
> 5555567058, val+lens: 555556705f, val+lene: 555556705f
> [  251.835804] c0   3784 hw-breakpoint: watchpoint_handler: addr:
> 5555567050, val+lens: 555556705f, val+lene: 555556705f
> [  251.841350] c0   3785 hw-breakpoint: watchpoint_handler: addr:
> 5555567050, val+lens: 555556705f, val+lene: 555556705f
>
> Note that for the case of 16 and 32-byte access it returns the address
> 5555567050 -- this is why the "+15" is sufficient for me.
>
>
> The other thing I am not so sure about in your patch is that it has
> potential to mis-attribute the watchpoint hit if we have two
> watchpoints close together. For example, if I have first watchpoint on
> 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application
> writes one byte to 0x1004, then your code will still attribute the hit
> to the first watchpoint, even though it was not really triggered. This

Hummm..yes, thanks for pointing it out.
There could be only two solutions for it:
(1) We read instruction at the location regs->pc and analyse it and
find the size of read/write.
or(2) What you have suggested in your patch.

I think, its easier to go with your implementation. So, I have taken
your patch and updated my perf/upstream_arm64_devel branch. May be you
can give it a test for your test cases.

> is the reason I implemented my fix as a two-stage process, first
> looking for exact hits, and then falling back to the nearest one. That
> said, I don't know enough about the codebase to say if this is a real
> problem.
>
> On the plus side, I like the fact that we can watch arbitrary memory
> regions now, and the feature is working perfectly. :)
>

Thanks :-)

>
> My proposal would be to combine the two patches - take the byte mask
> handling code from yours, and the hit-attribution code from my patch.
> What do you think?

I am ok with merging them together as well as sending them as
different patch in my v2 series.

~Pratyush