[PATCH v3 12/25] arm64: handle PKEY/POE faults

Catalin Marinas catalin.marinas at arm.com
Mon Dec 11 10:18:17 PST 2023


On Fri, Nov 24, 2023 at 04:34:57PM +0000, Joey Gouly wrote:
> @@ -497,6 +498,23 @@ static void do_bad_area(unsigned long far, unsigned long esr,
>  #define VM_FAULT_BADMAP		((__force vm_fault_t)0x010000)
>  #define VM_FAULT_BADACCESS	((__force vm_fault_t)0x020000)
>  
> +static bool fault_from_pkey(unsigned long esr, struct vm_area_struct *vma,
> +			unsigned int mm_flags)
> +{
> +	unsigned long iss2 = ESR_ELx_ISS2(esr);
> +
> +	if (!arch_pkeys_enabled())
> +		return false;
> +
> +	if (iss2 & ESR_ELx_Overlay)
> +		return true;
> +
> +	return !arch_vma_access_permitted(vma,
> +			mm_flags & FAULT_FLAG_WRITE,
> +			mm_flags & FAULT_FLAG_INSTRUCTION,
> +			mm_flags & FAULT_FLAG_REMOTE);
> +}

Do we actually need this additional arch_vma_access_permitted() check?
The ESR should tell us if it was a POE fault. Permission overlay faults
have priority over the base permission faults, so we'd not need to fall
back to this additional checks. Well, see below, we could do something
slightly smarter here.

I can see x86 and powerpc have similar checks (though at a different
point under the mmap lock) but I'm not familiar with their exception
model, exception priorities.

> +
>  static vm_fault_t __do_page_fault(struct mm_struct *mm,
>  				  struct vm_area_struct *vma, unsigned long addr,
>  				  unsigned int mm_flags, unsigned long vm_flags,
> @@ -688,9 +706,29 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>  		 * Something tried to access memory that isn't in our memory
>  		 * map.
>  		 */
> -		arm64_force_sig_fault(SIGSEGV,
> -				      fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR,
> -				      far, inf->name);
> +		int fault_kind;
> +		/*
> +		 * The pkey value that we return to userspace can be different
> +		 * from the pkey that caused the fault.
> +		 *
> +		 * 1. T1   : mprotect_key(foo, PAGE_SIZE, pkey=4);
> +		 * 2. T1   : set AMR to deny access to pkey=4, touches, page
> +		 * 3. T1   : faults...
> +		 * 4.    T2: mprotect_key(foo, PAGE_SIZE, pkey=5);
> +		 * 5. T1   : enters fault handler, takes mmap_lock, etc...
> +		 * 6. T1   : reaches here, sees vma_pkey(vma)=5, when we really
> +		 *	     faulted on a pte with its pkey=4.
> +		 */
> +		int pkey = vma_pkey(vma);

Other than the vma_pkey() race, I'm more worried about the vma
completely disappearing, e.g. munmap() in another thread. We end up
dereferencing a free pointer here. We need to do this under the mmap
lock.

Since we need to do this check under the mmap lock, we could add an
additional check to see if the pkey fault we got was a racy and just
restart the instruction if it no longer faults according to
por_el0_allows_pkey(). However, the code below is too late in the fault
handling to be able to do much other than SIGSEGV.

> +
> +		if (fault_from_pkey(esr, vma, mm_flags))
> +			fault_kind = SEGV_PKUERR;
> +		else
> +			fault_kind = fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR;
> +
> +		arm64_force_sig_fault_pkey(SIGSEGV,
> +				      fault_kind,
> +				      far, inf->name, pkey);
>  	}
>  
>  	return 0;

-- 
Catalin



More information about the linux-arm-kernel mailing list