BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Waiman Long longman at redhat.com
Tue Sep 6 13:01:06 PDT 2022


On 9/6/22 14:30, Tejun Heo wrote:
> Hello,
>
> (cc'ing Waiman in case he has a better idea)
>
> On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
>> https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/ is for
>> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
>> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
> If I'm understanding what you're writing correctly, this isn't a deadlock.
> The cpuset_hotplug_workfn simply isn't being woken up while holding
> cpuset_rwsem and others are just waiting for that lock to be released.

I believe it is probably a bug in the scheduler core code. 
__set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a 
random cpu within the new set allowable CPUs. However, if migration is 
disabled, it shouldn't call affine_move_task() at all. Instead, I would 
suggest that if the current cpu is within the new allowable cpus, it 
should just skip doing affine_move_task(). Otherwise, it should fail 
__set_cpus_allowed_ptr_locked().

My 2 cents.

Cheers,
Longman




More information about the linux-arm-kernel mailing list