[PATCH] ARC: Improve cmpxchng syscall implementation

Wed Apr 4 01:56:13 PDT 2018

Hi Vineet, Peter,

On Wed, 2018-03-21 at 14:54 +0300, Alexey Brodkin wrote:
> Hi Vineet,
> 
> On Mon, 2018-03-19 at 11:29 -0700, Vineet Gupta wrote:
> > On 03/19/2018 04:00 AM, Alexey Brodkin wrote:
> > > arc_usr_cmpxchg syscall is supposed to be used on platforms
> > > that lack support of Load-Locked/Store-Conditional instructions
> > > in hardware. And in that case we mimic missing hardware features
> > > with help of kernel's sycall that "atomically" checks current
> > > value in memory and then if it matches caller expectation new
> > > value is written to that same location.
> > > 
> > 
> > ...
> > ...
> > 
> > > 
> > > 2. What's worse if we're dealing with data from not yet allocated
> > >     page (think of pre-copy-on-write state) we'll successfully
> > >     read data but on write we'll silently return to user-space
> > >     with correct result 
> > 
> > This is technically incorrect, even for reading, you need a page, which could be 
> > common zero page in certain cases.
> 
> Ok I'll reword it like.
> 
> > 
> > (which we really read just before). That leads
> > >     to very strange problems in user-space app further down the line
> > >     because new value was never written to the destination.
> > > 
> > > 3. Regardless of what went wrong we'll return from syscall
> > >     and user-space application will continue to execute.
> > >     Even if user's pointer was completely bogus.
> > 
> > Again we are exaggerating (from technical correctness POV) - if user pointer was 
> > bogs, the read would not have worked in first place etc. So lets tone down the 
> > rhetoric.
> 
> Ok here I may rephrase it like that:
> ------------------------------->8-----------------------------
> 3. Regardless of what went wrong we'll return from syscall
>    and user-space application will continue to execute.
> ------------------------------->8-----------------------------
> 
> > 
> > >     In case of hardware LL/SC that app would have been killed
> > >     by the kernel.
> > > 
> > > With that change we attempt to imrove on all 3 items above:
> > > 
> > > 1. We still disable preemption around read-and-write of
> > >     user's data but if we happen to fail with either of them
> > >     we're enabling preemption and try to force page fault so
> > >     that we have a correct mapping in the TLB. Then re-try
> > >     again in "atomic" context.
> > > 
> > > 2. If real page fault fails or even access_ok() returns false
> > >     we send SIGSEGV to the user-space process so if something goes
> > >     seriously wrong we'll know about it much earlier.
> > > 
> > 
> > 
> > >   
> > >   	/*
> > >   	 * This is only for old cores lacking LLOCK/SCOND, which by defintion
> > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
> > >   	/* Z indicates to userspace if operation succeded */
> > >   	regs->status32 &= ~STATUS_Z_MASK;
> > >   
> > > -	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
> > > -		return -EFAULT;
> > > +	ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr));
> > > +	if (!ret)
> > > +		goto fail;
> > >   
> > > +again:
> > >   	preempt_disable();
> > >   
> > > -	if (__get_user(uval, uaddr))
> > > -		goto done;
> > > -
> > > -	if (uval == expected) {
> > > -		if (!__put_user(new, uaddr))
> > > +	ret = __get_user(val, uaddr);
> > > +	if (ret == -EFAULT) {
> > 
> > 
> > Lets see if this warrants adding complexity ! This implies that TLB entry with 
> > Read permissions didn't exist for reading the var and page fault handler could not 
> > wire up even a zero page due to preempt_disable, meaning it was something not 
> > touched by userspace already - sort of uninitialized variable in user code.
> 
> Ok I completely missed the fact that fast path TLB miss handler is being
> executed even if we have preemption disabled. So given the mapping exist
> we do not need to retry with enabled preemption.
> 
> Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case
> when the pointer is completely bogus and there's no mapping for him.
> I understand that today we only expect this syscall to be used from libc's
> internals but as long as syscall exists nobody stops anybody from using it
> directly without libc. So maybe instead of doing get_user_pages_fast() just
> send a SIGSEGV to the process? At least user will realize there's some problem
> at earlier stage.
> 
> > Otherwise it is extremely unlikely to start with a TLB entry with Read 
> > permissions, followed by syscall Trap only to find the entry missing, unless a 
> > global TLB flush came from other cores, right in the middle. But this syscall is 
> > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here.
> 
> Well but that's exactly the situation I was debugging: we start from data from read-only
> page and on attempt to write back modified value COW machinery gets involved.
> 
> That was on UP platform.
> 
> > Now in case it was *an* uninitialized var, do we have to guarantee any well 
> > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to 
> > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on 
> > user side, given the typical cmpxchg usage pattern.
> 
> The problem is libc only expects to get a value read from memory.
> And in theory expected value might be -14 which is basically -EFAULT.
> I'm not talking about 0 at all because in some cases that's exactly what
> user-space expects.
> 
> So if we read unexpected value then we'll just return it without even attempting
> to write.
> 
> If we read expected data but fail to write then we'll send a SIGSEGV and
> return whatever... let it be -EFAULT - anyways the app will be killed on exit from
> this syscall.

Any comments on my comments above?

-Alexey