[PATCH -next v5 6/8] arm64: add support for machine check error safe

Tong Tiangen tongtiangen at huawei.com
Sun Jun 19 18:53:26 PDT 2022



在 2022/6/18 20:52, Mark Rutland 写道:
> On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote:
>> 在 2022/6/17 16:55, Mark Rutland 写道:
>>> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
>>>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
>>>> +				     struct pt_regs *regs, int sig, int code)
>>>> +{
>>>> +	if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>>>> +		return false;
>>>> +
>>>> +	if (user_mode(regs) || !current->mm)
>>>> +		return false;
>>>
>>> What's the `!current->mm` check for? >>
>> At first, I considered that only user processes have the opportunity to
>> recover when they trigger memory error.
>>
>> But it seems that this restriction is unreasonable. When the kernel thread
>> triggers memory error, it can also be recovered. for instance:
>>
>> https://lore.kernel.org/linux-mm/20220527190731.322722-1-jiaqiyan@google.com/
>>
>> And i think if(!current->mm) shoud be added below:
>>
>> if(!current->mm) {
>> 	set_thread_esr(0, esr);
>> 	arm64_force_sig_fault(...);
>> }
>> return true;
> 
> Why does 'current->mm' have anything to do with this, though?

Sorry, typo, my original logic was:
if(current->mm) {
	[...]
}

> 
> There can be kernel threads with `current->mm` set in unusual circumstances
> (and there's a lot of kernel code out there which handles that wrong), so if
> you want to treat user tasks differently, we should be doing something like
> checking PF_KTHREAD, or adding something like an is_user_task() helper.
> 

OK, i do want to treat user tasks differently here and didn't take into 
account what you said. will be fixed next version according to your 
suggestiong.

As follows:
if (!(current->flags & PF_KTHREAD)) {
   set_thread_esr(0, esr);
   arm64_force_sig_fault(...);
}
return true;


> [...]
> 
>>>> +
>>>> +	if (apei_claim_sea(regs) < 0)
>>>> +		return false;
>>>> +
>>>> +	if (!fixup_exception_mc(regs))
>>>> +		return false;
>>>
>>> I thought we still wanted to signal the task in this case? Or do you expect to
>>> add that into `fixup_exception_mc()` ?
>>
>> Yeah, here return false and will signal to task in do_sea() ->
>> arm64_notify_die().
> 
> I mean when we do the fixup.
> 
> I thought the idea was to apply the fixup (to stop the kernel from crashing),
> but still to deliver a fatal signal to the user task since we can't do what the
> user task asked us to.
> 

Yes, that's what i mean. :)

>>>> +
>>>> +	set_thread_esr(0, esr);
>>>
>>> Why are we not setting the address? Is that deliberate, or an oversight?
>>
>> Here set fault_address to 0, i refer to the logic of arm64_notify_die().
>>
>> void arm64_notify_die(...)
>> {
>>           if (user_mode(regs)) {
>>                   WARN_ON(regs != current_pt_regs());
>>                   current->thread.fault_address = 0;
>>                   current->thread.fault_code = err;
>>
>>                   arm64_force_sig_fault(signo, sicode, far, str);
>>           } else {
>>                   die(str, regs, err);
>>           }
>> }
>>
>> I don't know exactly why and do you know why arm64_notify_die() did this? :)
> 
> To be honest, I don't know, and that looks equally suspicious to me.
> 
> Looking at the git history, that was added in commit:
> 
>    9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers")
> 
> ... so maybe Catalin recalls why.
> 
> Perhaps the assumption is just that this will be fatal and so unimportant? ...
> but in that case the same logic would apply to the ESR value, so it's not clear
> to me.

OK, let's proceed as set to 0, if there is any change later, the two 
positions shall be changed together.

Thanks,
Tong.

> 
> Mark.
> 
> .



More information about the linux-arm-kernel mailing list