[PATCH 1/1] psi: stop relying on timer_pending for poll_work rescheduling
Johannes Weiner
hannes at cmpxchg.org
Tue Jun 22 08:07:51 PDT 2021
On Tue, Jun 22, 2021 at 03:56:03PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 02:26:54PM -0700, Suren Baghdasaryan wrote:
> > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
> > Reported-by: Kathleen Chang <yt.chang at mediatek.com>
> > Reported-by: Wenju Xu <wenju.xu at mediatek.com>
> > Reported-by: Jonathan Chen <jonathan.jmchen at mediatek.com>
> > Signed-off-by: Suren Baghdasaryan <surenb at google.com>
>
> Johannes?
Looks generally good to me, I'd just want to get to the bottom of the
memory ordering before acking...
> > -/* Schedule polling if it's not already scheduled. */
> > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > +/* Schedule polling if it's not already scheduled or forced. */
> > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > + bool force)
> > {
> > struct task_struct *task;
> >
> > - /*
> > - * Do not reschedule if already scheduled.
> > - * Possible race with a timer scheduled after this check but before
> > - * mod_timer below can be tolerated because group->polling_next_update
> > - * will keep updates on schedule.
> > - */
> > - if (timer_pending(&group->poll_timer))
> > + /* cmpxchg should be called even when !force to set poll_scheduled */
> > + if (atomic_cmpxchg(&group->poll_scheduled, 0, 1) != 0 && !force)
>
> Do you care about memory ordering here? Afaict the whole thing is
> supposed to be ordered by ->trigger_lock, so you don't.
It's not always held when we get here.
The worker holds it when it reschedules itself, to serialize against
userspace destroying the trigger itself. But the scheduler doesn't
hold it when it kicks the worker on an actionable task change.
That said, I think the ordering we care about there is that when the
scheduler side sees the worker still queued, the worker must see the
scheduler's updates to the percpu states and process them correctly.
But that should be ensured already by the ordering implied by the
seqcount sections around both the writer and the reader side.
Is there another possible race that I'm missing?
More information about the Linux-mediatek
mailing list