[PATCH 3/3] Revert "lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly"

Ming Lei ming.lei at redhat.com
Mon Mar 2 06:12:49 PST 2026


On Mon, Mar 2, 2026 at 10:05 PM Daniel Wagner <dwagner at suse.de> wrote:
>
> Hi Ming,
>
> Sorry for the late response. Last week the mail server did take a break...
>
> On Thu, Feb 26, 2026 at 10:04:18PM +0800, Ming Lei wrote:
> > On Thu, Feb 26, 2026 at 02:40:37PM +0100, Daniel Wagner wrote:
> > > This reverts commit 0263f92fadbb9d294d5971ac57743f882c93b2b3.
> > >
> > > The reason the lock was removed was that the nvme-pci driver reset
> > > handler attempted to acquire the CPU read lock during CPU hotplug
> > > offlining (holds the CPU write lock). Consequently, the block layer
> > > offline notifier callback could not progress because in-flight requests
> > > were detected.
> > >
> > > Since then, in-flight detection has been improved, and the nvme-pci
> > > driver now explicitly updates the hctx state when it is safe to ignore
> > > detected in-flight requests. As a result, it's possible to reintroduce
> > > the CPU read lock in group_cpus_evenly.
> >
> > Can you explain your motivation a bit? Especially adding back the lock
> > causes the API hard to use. Any benefit?
>
> Sure, I would like to add the lock back to group_cpus_evenly so it's
> possible to add support for the isolcpu use case. For the isolcpus case,
> it's necessary to access the cpu_online_mask when creating a
> housekeeping cpu mask. I failed to find a good solution which doesn't
> introduce horrible hacks (see Thomas' feedback on this [1]).
>
> Anyway, I am not totally set on this solution, but I having a proper
> lock in this code path would make the isolcpu extension way cleaner I
> think.

Then please include this patch with an explanation in your isolcpus patch set.

>
> What do you exactly mean with 'API hard to use'? The problem that the
> caller/driver has to make sure it doesn't do anything like the nvme-pci
> driver?

This API is usually called in slow path, in which subsystem locks are often
required, then lock dependency against cpus_read_lock is added.

Thanks,
Ming




More information about the Linux-nvme mailing list