[Bug report] hash_name() may cross page boundary and trigger sleep in RCU context
David Laight
david.laight.linux at gmail.com
Mon Dec 8 01:26:54 PST 2025
On Mon, 8 Dec 2025 10:32:06 +0800
Xie Yuanbin <xieyuanbin1 at huawei.com> wrote:
> On Fri, 5 Dec 2025 12:08:14 +0000, Russell King wrote:
> > On Wed, Dec 03, 2025 at 09:48:00AM +0800, Xie Yuanbin wrote:
> >> On Tue, 2 Dec 2025 14:07:25 -0800, Linus Torvalds wrote:
> >> > On Tue, 2 Dec 2025 at 04:43, Russell King (Oracle)
> >> > <linux at armlinux.org.uk> wrote:
> >> >>
> >> >> What I'm thinking is to address both of these by handling kernel space
> >> >> page faults (which will be permission or PTE-not-present) separately
> >> >> (not even build tested):
> >> >
> >> > That patch looks sane to me.
> >> >
> >> > But I also didn't build test it, just scanned it visually ;)
> >>
> >> That patch removes harden_branch_predictor() from __do_user_fault(), and
> >> moves it to do_page_fault()->do_kernel_address_page_fault().
> >> This resolves previously mentioned kernel warning issue. However,
> >> __do_user_fault() is not only called by do_page_fault(), it is
> >> alse called by do_bad_area(), do_sect_fault() and do_translation_fault().
> >>
> >> So I think that some harden_branch_predictor() is missing on other paths.
> >> According to my tests, when CONFIG_ARM_LPAE=n, harden_branch_predictor()
> >> will never be called anymore, even if a user program trys to access the
> >> kernel address.
> >>
> >> Or perhaps I've misunderstood something, could you please point it out?
> >> Thank you very much.
> >
> > Right, let's split these issues into separate patches. Please test this
> > patch, which should address only the hash_name() fault issue, and
> > provides the basis for fixing the branch predictor issue.
>
> I conducted a simple test, and it seems that both the hash_name()
> might sleep issue and the branch predictor issue have been fixed.
>
> BTW, even with this patch, test cases may still fail. There is another
> bug in hash_name() will also be triggered by the testcase, which will be
> fixed in this patch:
> Link: https://lore.kernel.org/20251127025848.363992-1-pangliyuan1@huawei.com
>
> Test case is from:
> Link: https://lore.kernel.org/20251127140109.191657-1-xieyuanbin1@huawei.com
>
> Test in commit 6987d58a9cbc5bd57c98 ("Add linux-next specific files for
> 20251205") from linux-next branch.
>
> I still have a question about this patch: Is
> ```patch
> + if (interrupts_enabled(regs))
> + local_irq_enable();
> ```
> necessary? Although this implementation is closer to the original code,
> which can reduce side effects, do_bad_area(), do_sect_fault(),
> and do_translation_fault() all call __do_kernel_fault() with interrupts
> disabled.
It has to be safer to leave them disabled.
But you don't want to do that over long code paths.
But I'd have thought the 'act on an exception table entry or panic'
path wouldn't be long compared to an actual ISR (or other code that
disables interrupts) so there is no real point enabling them here.
But that is just my 2c.
David
>
> Thanks very much!
>
More information about the linux-arm-kernel
mailing list