[PATCH v4] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
robin.murphy at arm.com
Thu Feb 22 08:53:12 PST 2018
On 22/02/18 16:33, Mark Rutland wrote:
> On Thu, Feb 22, 2018 at 04:28:03PM +0000, Robin Murphy wrote:
>> [Apologies to keep elbowing in, and if I'm being thick here...]
>> On 22/02/18 15:22, Mark Rutland wrote:
>>> On Thu, Feb 22, 2018 at 08:51:30AM -0600, Shanker Donthineni wrote:
>>>> +#define CTR_B31_SHIFT 31
>>> Since this is just a RES1 bit, I think we don't need a mnemonic for it,
>>> but I'll defer to Will and Catalin on that.
>>>> +#ifdef CONFIG_ARM64_SKIP_CACHE_POU
>>>> +alternative_if ARM64_HAS_CACHE_DIC
>>>> + mov x0, xzr
>>>> + dsb ishst
>>>> + isb
>>>> + ret
>>> As commented on v3, I don't believe you need the DSB here. If prior
>>> stores haven't been completed at this point, the existing implementation
>>> would not work correctly here.
>> True in terms of ordering between stores prior to entry and the IC IVAU
>> itself, but what about the DSH ISH currently issued *after* the IC IVAU
>> before returning? Is provably impossible that existing callers might be
>> relying on that ordering *anything*, or would we risk losing something
>> subtle by effectively removing it?
> AFAIK, the only caller of this is KVM, before page table updates occur
> to add execute permissions to the page this is applied to.
> At least in that case, I do not beleive there would be breakage.
> If we're worried about subtleties in callers, then we'd need to stick
> with DSB ISH rather than optimising to DSH ISHST.
Hmm, I probably am just squawking needlessly. It is indeed hard to
imagine how callers could be relying on the invalidating the I-cache for
ordering unless doing something unreasonably stupid, and if the current
caller is clearly OK then there should be nothing to worry about.
This *has* helped me realise that I was indeed being somewhat thick
before, because the existing barrier is of course not about memory
ordering per se, but about completing the maintenance operation. Hooray
for overloaded semantics...
On a different track, I'm now wondering whether the extra complexity of
these alternatives might justify removing some obvious duplication and
letting __flush_cache_user_range() branch directly into
invalidate_icache_range(), or might that adversely affect the user fault
More information about the linux-arm-kernel