[PATCH v4 6/8] arm64: uprobes: Add GCS support to uretprobes

Thu Jul 24 13:41:06 PDT 2025

Hi,

On 7/23/25 5:09 AM, Catalin Marinas wrote:
> On Fri, Jul 18, 2025 at 11:37:38PM -0500, Jeremy Linton wrote:
>> @@ -159,11 +160,41 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr,
>>   				  struct pt_regs *regs)
>>   {
>>   	unsigned long orig_ret_vaddr;
>> +	unsigned long gcs_ret_vaddr;
>> +	int err = 0;
>> +	u64 gcspr;
>>   
>>   	orig_ret_vaddr = procedure_link_pointer(regs);
>> +
>> +	if (task_gcs_el0_enabled(current)) {
>> +		gcspr = read_sysreg_s(SYS_GCSPR_EL0);
>> +		gcs_ret_vaddr = load_user_gcs((unsigned long __user *)gcspr, &err);
>> +		if (err) {
>> +			force_sig(SIGSEGV);
>> +			goto out;
>> +		}
> 
> Nit: add an empty line here, I find it easier to read.
> 
>> +		/*
>> +		 * If the LR and GCS entry don't match, then some kind of PAC/control
>> +		 * flow happened. Likely because the user is attempting to retprobe
> 
> I don't full get the first sentence.

I'm trying to succinctly warn people about some non-obvious behavior 
that is being maintained.

Really long version:

So a Retprobe is intended to catch the function returning and run the 
user specified probe logic. But the breakpoint itself isn't placed at 
the 'ret' because there may be multiple 'ret's. Rather its intended to 
be placed at the function entry point. When the breakpoint fires, it 
runs this code to hijack the LR and point it at the actual probe 
routine. Except, ha!, the breakpoint for the ret routine may not be at 
the beginning of the function. Which is perfectly ok, even in some cases 
desirable.

But, if the user say places it after LR has been spilled to the stack, 
the hijack will be discarded when LR is restored and the probe will 
silently fail to run. The user will then eventually figure out that they 
are dropping a retprobe in a location where its basically a NOP. PAC 
messes with this behavior in an inconsistent manner. Is the target 
function's just signing the LR, or is its signing and spilling it. In 
the latter case the probe is again just a NOP, otherwise PAC fault.

But then GCS comes along, and it needs to also update the GCS region. 
but if we update it, and the LR gets restored its going to result in a 
GCS exception where previously the behavior was just the probe being 
NOPed. Now though, we have the advantage that for the most part anyplace 
that GCS is enabled, we are also going to have PAC signing the LR. So 
checking for LR != GCS value acts as both a sanity check and a bit of 
safety that we aren't inside a sign/authenticate block, or that the LR 
hasn't been tampered with via a blr/etc and we will restore a LR from 
the stack that won't match the now updated GCS region.

Hence the comment.

:)

> 
>> +		 * on something that isn't a function boundary or inside a leaf
>> +		 * function. Explicitly abort this retprobe because it will generate
>> +		 * a GCS exception.
>> +		 */
>> +		if (gcs_ret_vaddr != orig_ret_vaddr)	{
>> +			orig_ret_vaddr = -1;
>> +			goto out;
>> +		}
> 
> Nit: another empty line here.
> 
>> +		put_user_gcs(trampoline_vaddr, (unsigned long __user *) gcspr, &err);
> 
> Nit: (... *)gcspr (no space after cast).
> 
>> +		if (err) {
>> +			force_sig(SIGSEGV);
>> +			goto out;
>> +		}
>> +	}
>> +
>>   	/* Replace the return addr with trampoline addr */
>>   	procedure_link_pointer_set(regs, trampoline_vaddr);
>>   
>> +out:
>>   	return orig_ret_vaddr;
>>   }
> 
> Reviewed-by: Catalin Marinas <catalin.marinas at arm.com>