[Bug report] hash_name() may cross page boundary and trigger sleep in RCU context

Sun Nov 30 18:08:00 PST 2025

在 2025/11/29 9:35, Linus Torvalds 写道:
> On Fri, 28 Nov 2025 at 17:01, Zizhi Wo <wozizhi at huaweicloud.com> wrote:
>>
>> Thank you for your answer. In fact, this solution is similar to the one
>> provided by Al.
> 
> Hmm. I'm not seeing the replies from Al for some reason. Maybe he didn't cc me.
> 
>> It has an additional check to determine reg:
>>
>> if (unlikely(addr > TASK_SIZE) && !user_mode(regs))
>>          goto no_context;
>>
>> I'd like to ask if this "regs" examination also needs to be brought
>> along?
> 
> That seems unnecessary.
> 
> Yes, in this case the original problem you reported with sleeping in
> an RCU region was triggered by a kernel access, and a user-space
> access would never have caused any such issues.
> 
> So checking for !user_mode(regs) isn't exactly *wrong*.
> 
> But while it isn't wrong, I think it's also kind of pointless.
> 
> Because regardless of whether it's a kernel or user space access, an
> access outside TASK_SIZE shouldn't be associated with a valid user
> space context, so the code might as well just go to the "no_context"
> label directly.
> 
> That said, somebody should  definitely double-check me - because I
> think arm also did the vdso trick at high addresses that i386 used to
> do, so there is the fake VDSO thing up there.
> 
> But if you get a page fault on that, it's not going to be fixed up, so
> even if user space can access it, there's no point in looking that
> fake vm area up for page faults.
> 
> I think.
> 
>> I'm even thinking if we directly have the corresponding processing
>> replaced by do_translation_fault(), is that also correct?
>>
>> ```
>> -       { do_page_fault,        SIGSEGV, SEGV_MAPERR,   "page
>> translation fault"           },
>> +       { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "page
>> translation fault"           },
> 
> I think that might break kprobes.
> 
> Looking around, I think my patch might also be a bit broken: I think
> it might be better to move it further down to below the check for
> FSR_LNX_PF.
> 
> But somebody who knows the exact arm page fault handling better than
> me should verify both that and my VDSO gate page thinking.
> 
>             Linus
> 

Thank you for your reply! Regarding the existing discussions in the
community, I will re-examine the logic in this regard and digest it.

Thanks,
Zizhi Wo