[PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context

James Clark james.clark at linaro.org
Tue Feb 17 01:18:03 PST 2026



On 16/02/2026 5:05 pm, Will Deacon wrote:
> On Mon, Feb 16, 2026 at 03:13:54PM +0000, James Clark wrote:
>>
>>
>> On 16/02/2026 1:09 pm, Will Deacon wrote:
>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>>> generation in guest context when self-hosted TRBE is in use by the host.
>>>
>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>>> Trace Buffer Unit can perform address translation for the "owning
>>> exception level" even when it is out of context.
>>>
>>> Consequently, we can end up in a state where TRBE performs speculative
>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>>> result appears to be a heady mixture of data corruption and hardware
>>> lockups.
>>>
>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>>> draining the buffer, restoring the register on return to the host.
>>>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Cc: Oliver Upton <oupton at kernel.org>
>>> Cc: James Clark <james.clark at linaro.org>
>>> Cc: Leo Yan <leo.yan at arm.com>
>>> Cc: Suzuki K Poulose <suzuki.poulose at arm.com>
>>> Cc: Fuad Tabba <tabba at google.com>
>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>>> Signed-off-by: Will Deacon <will at kernel.org>
>>> ---
>>>
>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>>> run upstream but I noticed this by inspection when triaging occasional
>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>>> at the same time as a vCPU is loaded. This code has changed quite a bit
>>> over time, so stable backports are not entirely straightforward.
>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>>> general approach taken here.
>>>
>>>    arch/arm64/include/asm/kvm_host.h  |  1 +
>>>    arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>>>    2 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index ac7f970c7883..a932cf043b83 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>>>    		u64 pmscr_el1;
>>>    		/* Self-hosted trace */
>>>    		u64 trfcr_el1;
>>> +		u64 trblimitr_el1;
>>>    		/* Values of trap registers for the host before guest entry. */
>>>    		u64 mdcr_el2;
>>>    		u64 brbcr_el1;
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> index 2a1c0f49792b..fd389a26bc59 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>>    	write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>>    }
>>> -static bool __trace_needs_drain(void)
>>> +static void __trace_drain_and_disable(void)
>>>    {
>>> -	if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>>> -		return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>>> +	u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>> -	return host_data_test_flag(TRBE_ENABLED);
>>> +	*trblimitr_el1 = 0;
>>> +
>>> +	if (is_protected_kvm_enabled()) {
>>> +		if (!host_data_test_flag(HAS_TRBE))
>>> +			return;
>>> +	} else {
>>> +		if (!host_data_test_flag(TRBE_ENABLED))
>>> +			return;
>>> +	}
>>> +
>>> +	*trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>>> +	if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>>> +		isb();
>>> +		tsb_csync();
>>> +		write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>> +		isb();
>>> +	}
>>>    }
>>>    static bool __trace_needs_switch(void)
>>> @@ -79,15 +94,18 @@ static void __trace_switch_to_guest(void)
>>>    	__trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
>>>    			  *host_data_ptr(trfcr_while_in_guest));
>>> -
>>> -	if (__trace_needs_drain()) {
>>> -		isb();
>>> -		tsb_csync();
>>> -	}
>>> +	__trace_drain_and_disable();
>>>    }
>>>    static void __trace_switch_to_host(void)
>>>    {
>>> +	u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
>>> +
>>> +	if (trblimitr_el1 & TRBLIMITR_EL1_E) {
>>> +		write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
>>
>> Will this restore a stale value if you do kvm_enable_trbe() then later
>> kvm_disable_trbe()? Looks like the read and save will be skipped unless
>> host_data_test_flag(TRBE_ENABLED) is true, so it will never save a disabled
>> value.
> 
> __trace_drain_and_disable() sets the saved limit to 0 if TRBE_ENABLE is
> not set, so this shouldn't do anything in that case. Or did I
> misunderstand your scenario?
> 

No you're right. I saw the early return for !TRBE_ENABLED and thought 
things only happened after that. But the zeroing is before, so it's ok.

>> kvm_disable_trbe() might need to clear host_debug_state.trblimitr_el1.
> 
> pKVM can't rely on that thing being called, so the context switch still
> needs to be self-contained there.
> 
> Will




More information about the linux-arm-kernel mailing list