[RFC PATCH] vfs: Fix might sleep in load_unaligned_zeropad() with rcu read lock held
david laight
david.laight at runbox.com
Wed Nov 26 14:25:05 PST 2025
On Wed, 26 Nov 2025 20:02:21 +0000
Al Viro <viro at zeniv.linux.org.uk> wrote:
> On Wed, Nov 26, 2025 at 07:51:54PM +0000, Russell King (Oracle) wrote:
>
> > I don't understand how that helps. Wasn't the report that the filename
> > crosses a page boundary in userspace, but the following page is
> > inaccessible which causes a fault to be taken (as it always would do).
> > Thus, wouldn't "addr" be a userspace address (that the kernel is
> > accessing) and thus be below TASK_SIZE ?
> >
> > I'm also confused - if we can't take a fault and handle it while
> > reading the filename from userspace, how are pages that have been
> > swapped out or evicted from the page cache read back in from storage
> > which invariably results in sleeping - which we can't do here because
> > of the RCU context (not that I've ever understood RCU, which is why
> > I've always referred those bugs to Paul.)
>
> No, the filename is already copied in kernel space *and* it's long enough
> to end right next to the end of page. There's NUL before the end of page,
> at that, with '/' a couple of bytes prior. We attempt to save on memory
> accesses, doing word-by-word fetches, starting from the beginning of
> component. We *will* detect NUL and ignore all subsequent bytes; the
> problem is that the last 3 bytes of page might be '/', 'x' and '\0'.
> We call load_unaligned_zeropad() on page + PAGE_SIZE - 2. And get
> a fetch that spans the end of page.
>
> We don't care what's in the next page, if there is one mapped there
> to start with. If there's nothing mapped, we want zeroes read from
> it, but all we really care about is having the bytes within *our*
> page read correctly - and no oops happening, obviously.
>
> That fault is an extremely cold case on a fairly hot path. We don't
> want to mess with disabling pagefaults, etc. - not for the sake
> of that.
>
Can you fix it with a flag on the exception table entry that means
'don't try to fault in a page'?
I think the logic would be the same as 'disabling pagefaults', just
checking a different flag.
After all the fault itself happens in both cases.
David
More information about the linux-arm-kernel
mailing list