BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Mukesh Ojha
quic_mojha at quicinc.com
Thu Sep 29 08:13:43 PDT 2022
Hi All,
On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
>
>
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
>
> We are able to reproduce this issue explained at this link
>
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/
>
>
>
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>> if (cpumask_equal(&p->cpus_mask, new_mask))
>>> goto out;
>>>
>>> - if (WARN_ON_ONCE(p == current &&
>>> - is_migration_disabled(p) &&
>>> - !cpumask_test_cpu(task_cpu(p),
>>> new_mask)))
>>> {
>>> + if (is_migration_disabled(p) &&
>>> + !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> + WARN_ON_ONCE(p == current);
>>> ret = -EBUSY;
>>> goto out;
>>> }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>> if (flags & SCA_USER)
>>> user_mask = clear_user_cpus_ptr(p);
>>>
>>> - ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> + if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> + ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> + } else {
>>> + task_rq_unlock(rq, p, rf);
>>> + }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
>
> Was thinking if just skipping as below will help here, well i am not sure .
>
> But thinking what if we keep the task as it is on the same cpu and let's
> wait for migration to be enabled for the task to take care of it later.
>
> ------------------->O------------------------------------------
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
> * we're holding p->pi_lock.
> */
> if (task_rq(p) == rq) {
> - if (is_migration_disabled(p))
> + if (is_migration_disabled(p)) {
> + complete = true;
> goto out;
> + }
>
> if (pending) {
>
Any suggestion on this bug ?
-Mukesh
More information about the Linux-mediatek
mailing list