Debugging a TTY race condition on M1 (memory ordering dragons)

Mon Aug 15 12:15:00 PDT 2022

On 16/08/2022 03.58, Boqun Feng wrote:
> I agree this is handy, but an unconditional full barrier may be costy to
> some users, and probably unnecessary if the users periodically queue
> the work. In that case, some successful enqueue will eventually make all
> memory accesses observable. Also if workqueue users use their own
> locking in work function, then the barrier is also unnecessary.
> 
> The document part of course needs some help to clear things up. But I'm
> not sure "strengthen"ing the ordering guarantee of queue_work() is a
> good idea. Maybe a dedicated API, like:
> 
> // More work is needed for the @work, it has the same semantics as
> // queue_work() if the @work is not pending. If the @work is pending,
> // this ensures the work function observes all memory access before
> // this.
> void queue_more_work(struct work_struct *work)
> {
> 	smp_mb();
> 	queue_work(work);
> }
> 
> Regards,
> Boqun

FWIW, I didn't actually use a full barrier in my patch. I just replaced
the test_and_set_bit() with the underlying atomic op, sans early exit path.

Personally though, I think it makes more sense to have the default
function provide the guarantees, and if someone *really* needs the
performance gain from eliding the implicit barrier, they could use an
alternate API for that (after they show useful gains). This stuff is too
subtle to expect every caller to wrap their head around memory ordering,
and having queue_work() always provide order with prior stores *feels*
intuitive.

But let's see what the workqueue folks say :)

- Hector