[kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode

Andy Lutomirski luto at kernel.org
Tue May 9 06:00:01 PDT 2017


On Tue, May 9, 2017 at 1:56 AM, Christoph Hellwig <hch at infradead.org> wrote:
> On Tue, May 09, 2017 at 08:45:22AM +0200, Ingo Molnar wrote:
>> We only have ~115 code blocks in the kernel that set/restore KERNEL_DS, it would
>> be a pity to add a runtime check to every system call ...
>
> I think we should simply strive to remove all of them that aren't
> in core scheduler / arch code.  Basically evetyytime we do the
>
>         oldfs = get_fs();
>         set_fs(KERNEL_DS);
>         ..
>         set_fs(oldfs);
>
> trick we're doing something wrong, and there should always be better
> ways to archive it.  E.g. using iov_iter with a ITER_KVEC type
> consistently would already remove most of them.

How about trying to remove all of them?  If we could actually get rid
of all of them, we could drop the arch support, and we'd get faster,
simpler, shorter uaccess code throughout the kernel.

The ones in kernel/compat.c are generally garbage.  They should be
using compat_alloc_user_space().  Ditto for kernel/power/user.c.

flush_module_icache() is a potentially silly arch thing.  Does the
code in kernel/module.c that uses set_fs() actually work?

kernel/signal.c's set_fs() is laziness.

__probe_kernel_read() and __probe_kernel_write() use set_fs(), but
that usage only matters on sane arches* like s390x.  We should
arguably have a set_uaccess_address_space() or similar for this
purpose that's a nop on normal arches like x86.

fs/splice.c has some, ahem, interesting uses that have been the source
of nasty exploits in the past.  Converting them to use iov_iter
properly would be really, really nice.  Christoph, I don't suppose
you'd like to do that?

The others seem to mostly be fixable, but I haven't looked that closely.

Overall, I suspect that a big part of why mitigations like the one
being discussed in this thread were developed is because addr_limit
used to be on the stack, making it (along with restart_block) a really
nice target.  This is fixed now on x86, arm64, and s390x, I believe,
and other arches can easily opt in to the fix.

* I'm strongly in favor of arches that have totally separate user and
kernel address spaces.  Sadly, the most common arches don't do this.



More information about the linux-arm-kernel mailing list