[PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

David Laight David.Laight at ACULAB.COM
Sat Sep 19 10:53:08 EDT 2020

From: Al Viro
> Sent: 18 September 2020 14:58
> On Fri, Sep 18, 2020 at 03:44:06PM +0200, Christoph Hellwig wrote:
> > On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > > >  	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > > > -	return pt_regs_trap_type(current_pt_regs()) == 0x110;
> > > > +	return pt_regs_trap_type(current_pt_regs()) == 0x110 ||
> > > > +		(current->flags & PF_FORCE_COMPAT);
> > >
> > > Can't say I like that approach ;-/  Reasoning about the behaviour is much
> > > harder when it's controlled like that - witness set_fs() shite...
> >
> > I don't particularly like it either.  But do you have a better idea
> > how to deal with io_uring vs compat tasks?
> <wry> git rm fs/io_uring.c would make a good starting point </wry>
> Yes, I know it's not going to happen, but one can dream...

Maybe the io_uring code needs some changes to make it vaguely safe.
- No support for 32-bit compat mixed working (or at all?).
  Plausibly a special worker could do 32bit work.
- ring structure (I'm assuming mapped by mmap()) never mapped
  in more than one process (not cloned by fork()).
- No implicit handover of files to another process.
  Would need an munmap, handover, mmap sequence.

In any case the io_ring rather abuses the import_iovec() interface.

The canonical sequence is (types from memory):
	struct iovec cache[8], *iov = cache;
	struct iter iter;
	rval = import_iovec(..., &iov, 8, &iter);
	// Do read/write user using 'iter'

I don't think there is any strict requirement that iter.iov
is set to either 'cache' or 'iov' (it probably must point
into one of them.)
But the io_uring code will make that assumption because the
actual copies can be done much later and it doesn't save 'iter'.
It gets itself in a right mess because it doesn't separate
the 'address I need to free' from 'the iov[] for any transfers'.

io_uring is also the only code that relies on import_iovec()
returning the iter.count on success.
It would be much better to have:
	iov = import_iovec(..., &cache, ...);
and use ERR_PTR() et al for error detectoion.


Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

More information about the linux-arm-kernel mailing list