[PATCH v2 1/2] arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

Robin Murphy robin.murphy at arm.com
Thu Nov 18 03:42:15 PST 2021


On 2021-11-18 08:18, Reiji Watanabe wrote:
> On Tue, Nov 16, 2021 at 3:00 PM Robin Murphy <robin.murphy at arm.com> wrote:
>>
>> On 2021-11-08 07:11, Reiji Watanabe wrote:
>>> Currently, clear_page() uses DC ZVA instruction unconditionally.  But it
>>> should make sure that DCZID_EL0.DZP, which indicates whether or not use
>>> of DC ZVA instruction is prohibited, is zero when using the instruction.
>>> Use STP as memset does instead when DCZID_EL0.DZP == 1.
>>
>> Reviewed-by: Robin Murphy <robin.murphy at arm.com>
>>
>> FWIW I did eventually figure out the "pre-bias" trick from v1 thanks to
>> Mark's nod toward the original context, but a quick survey of various
>> optimisation guides implied that the explicit add should generally be
>> preferred over post-index writeback anyway, so I think we're all good here.
> 
> Thank you for the review!
> The original code, which used *pre*-index (not post-index) addressing,

Oops, in the context I think I meant writeback in general anyway. This 
is what happens when a sudden urge to review random patches at 11PM 
strikes :)

> made no significant difference in page_clear performance on my test
> environment from the current code.
> 
> Now, I am looking at creating v3 patches to use stnp instead of stp
> in page_clear (NOTE: DC ZVA shows much better performance on my test
> system than stp/stnp).
> 
> Although using stnp didn't show significant difference in clear_page()
> performance on my test system from stp (no significant difference in
> cache-misses, cache_refill, cache_wb, or cache_allocate event counter
> either), using stnp should be more appropriate for page_clear than stp,
> and I understand it could show better performance on some CPUs.

Indeed - certainly most Arm Ltd. cores tend to be good at spotting the 
store pattern and switching into write-streaming mode automatically - 
but semantically, STNP probably is appropriate for the great majority of 
clear_page() usage. Feel free to keep my review tag with that change.

Thanks,
Robin.



More information about the linux-arm-kernel mailing list