[PATCH 8/8] arm64: Work around systems with mismatched cache line sizes

Wed Aug 24 06:23:02 PDT 2016

On 22/08/16 14:02, Will Deacon wrote:
> On Thu, Aug 18, 2016 at 02:10:32PM +0100, Suzuki K Poulose wrote:
>> Systems with differing CPU i-cache/d-cache line sizes can cause
>> problems with the cache management by software when the execution
>> is migrated from one to another. Usually, the application reads
>> the cache size on a CPU and then uses that length to perform cache
>> operations. However, if it gets migrated to another CPU with a smaller
>> cache line size, things could go completely wrong. To prevent such
>> cases, always use the smallest cache line size among the CPUs. The
>> kernel CPU feature infrastructure already keeps track of the safe
>> value for all CPUID registers including CTR. This patch works around
>> the problem by :
>>
>> For kernel, dynamically patch the kernel to read the cache size
>> from the system wide copy of CTR_EL0.
>
> Is it only CTR that is mismatched in practice, or do we need to worry
> about DCZID_EL0 too?

A mismatched DCZID_EL0 is quite possible. However, there is no way to
trap accesses to DCZID_EL0. Rather, we can trap DC ZVA if we clear
SCTLR_EL1.DZE. But then clearing the SCTLR_EL1.DZE implies reading DCZID.DZP
returns 1, indicating DC ZVA is not supported. So if a proper application
checks the DZP before issuing a DC ZVA, we may never be able to emulate it.
Or in other words, if there is a mismatch, the work around is to disable
the DC ZVA operations (which could possibly affect existing (incorrect) userspace
applications assuming DC ZVA is supported without checking the DZP bit).

>>  static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index 93c5287..db2d6cb 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
>>  		regs->pc += 4;
>>  }
>>
>> +static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
>> +{
>> +	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
>> +
>> +	regs->regs[rt] = sys_ctr_ftr->sys_val;
>> +	regs->pc += 4;
>> +}
>
> Whilst this is correct, I wonder if there's any advantage in reporting a
> *larger* size to userspace and avoid incurring additional trap overhead?

Combining the trapping of user space dc operations for Errata work around for
clean cache, we could possibly report a larger size and emulate it properly
in the kernel. But I think that can be a enhancement on top of this series.

>
> Any idea what sort of size typical JITs are using?

I have no clue about it. I have Cc-ed Rodolph and Stuart, who may have better
idea about the JIT's usage.

Suzuki