[PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl

Topi Miettinen toiwoton at gmail.com
Tue Nov 15 11:31:02 PST 2022


On 15.11.2022 17.35, Catalin Marinas wrote:
> On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote:
>> On 10.11.2022 14.03, Catalin Marinas wrote:
>>> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
>>>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
>>>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>>> index 099468aee4d8..42eaf6683216 100644
>>>>>> --- a/mm/mmap.c
>>>>>> +++ b/mm/mmap.c
>>>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>>>>    			vm_flags |= VM_NORESERVE;
>>>>>>    	}
>>>>>> +	if (map_deny_write_exec(NULL, vm_flags))
>>>>>> +		return -EACCES;
>>>>>> +
>>>>>
>>>>> This seems like the wrong place to do the check -- that the vma argument
>>>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
>>>>> it live in mmap_region()? What happens with MAP_FIXED, when there is
>>>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
>>>>> check. For example, we had "c" above:
>>>>>
>>>>>        c)	mmap(PROT_READ);
>>>>> 	mprotect(PROT_READ|PROT_EXEC);		// fails
>>>>>
>>>>> But this would allow another case:
>>>>>
>>>>>        e)	addr = mmap(..., PROT_READ, ...);
>>>>> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
>>>>
>>>> I can move the check into mmap_region() but it won't fix the MAP_FIXED
>>>> example that you showed here.
>>>>
>>>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
>>>> However the `vma` for the 'old' region is not kept around, and a new vma will
>>>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
>>>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
>>>> will just be as good as passing NULL.
>>>>
>>>> It's possible to save the vm_flags from the region that is unmapped, but Catalin
>>>> suggested it might be better if that is part of a later extension, what do you
>>>> think?
>>>
>>> I thought initially we should keep the behaviour close to what systemd
>>> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
>>> vma is already executable (i.e. check actual permission change not just
>>> the PROT_* flags).
>>>
>>> We could pass the old vm_flags for that region (and maybe drop the vma
>>> pointer entirely, just check old and new vm_flags). But this feels like
>>> tightening slightly systemd's MDWE approach. If user-space doesn't get
>>> confused by this, I'm fine to go with it. Otherwise we can add a new
>>> flag later for this behaviour
>>>
>>> I guess that's more of a question for Topi on whether point tightening
>>> point (e) is feasible/desirable.
>>
>> I think we want 1:1 compatibility with seccomp() for the basic version, so
>> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions
>> (perhaps even less strict, too) when it's requested by configuration, like
>> MemoryDenyWriteExecute=[relaxed | strict].
> 
> Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is
> already PROT_EXEC? Or you'd rather reject that as well?
> 

I think that it's OK to allow that. It's an incompatible change, but it 
shouldn't break anything.

-Topi




More information about the linux-arm-kernel mailing list