[PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC

Thu Jul 13 07:07:10 PDT 2023

On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
> On 2023/7/13 12:52, Paul E. McKenney wrote:
> > On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote:
> >>
> >>
>
> ...
>
> >>
> >> There are lots of performance issues here and even a plumber
> >> topic last year to show that, see:
> >>
> >> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org
> >> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com
> >> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com
> >> [4] https://lpc.events/event/16/contributions/1338/
> >> and more.
> >>
> >> I'm not sure if it's necessary to look info all of that,
> >> andSandeep knows more than I am (the scheduling issue
> >> becomes vital on some aarch64 platform.)
> >
> > Hmmm...  Please let me try again.
> >
> > Assuming that this approach turns out to make sense, the resulting
> > patch will need to clearly state the performance benefits directly in
> > the commit log.
> >
> > And of course, for the approach to make sense, it must avoid breaking
> > the existing lockdep-RCU debugging code.
> >
> > Is that more clear?
>
> Personally I'm not working on Android platform any more so I don't
> have a way to reproduce, hopefully Sandeep could give actually
> number _again_ if dm-verity is enabled and trigger another
> workqueue here and make a comparsion why the scheduling latency of
> the extra work becomes unacceptable.
>

Question from my side, are we talking about only performance issues or
also a crash? It appears z_erofs_decompress_pcluster() takes
mutex_lock(&pcl->lock);

So if it is either in an RCU read-side critical section or in an
atomic section, like the softirq path, then it may
schedule-while-atomic or trigger RCU warnings.

z_erofs_decompressqueue_endio
-> z_erofs_decompress_kickoff
 ->z_erofs_decompressqueue_work
  ->z_erofs_decompress_queue
   -> z_erofs_decompress_pcluster
    -> mutex_lock

Per Sandeep in [1], this stack happens under RCU read-lock in:

#define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \
[...]
                rcu_read_lock();
                (dispatch_ops);
                rcu_read_unlock();
[...]

Coming from:
blk_mq_flush_plug_list ->
                           blk_mq_run_dispatch_ops(q,
                                __blk_mq_flush_plug_list(q, plug));

and __blk_mq_flush_plug_list does this:
          q->mq_ops->queue_rqs(&plug->mq_list);

This somehow ends up calling the bio_endio and the
z_erofs_decompressqueue_endio which grabs the mutex.

So... I have a question, it looks like one of the paths in
__blk_mq_run_dispatch_ops() uses SRCU.  Where are as the alternate
path uses RCU. Why does this alternate want to block even if it is not
supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should
be set? It sounds like you want to block in the "else" path even
though BLK_MQ_F_BLOCKING is not set:

/* run the code block in @dispatch_ops with rcu/srcu read lock held */
#define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \
do {                                                            \
        if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {          \
                struct blk_mq_tag_set *__tag_set = (q)->tag_set; \
                int srcu_idx;                                   \
                                                                \
                might_sleep_if(check_sleep);                    \
                srcu_idx = srcu_read_lock(__tag_set->srcu);     \
                (dispatch_ops);                                 \
                srcu_read_unlock(__tag_set->srcu, srcu_idx);    \
        } else {                                                \
                rcu_read_lock();                                \
                (dispatch_ops);                                 \
                rcu_read_unlock();                              \
        }                                                       \
} while (0);

It seems quite incorrect to me that erofs wants to block even though
clearly BLK_MQ_F_BLOCKING is not set. Did I miss something? I believe
it may be more beneficial to correct this properly now, rather than
maintaining the appearance of non-blocking operations and potentially
having to manage atomicity detection with preempt counters at a later
stage.

thanks,

 - Joel

[1] https://lore.kernel.org/all/CAB=BE-QV0PiKFpCOcdEUFxS4wJHsLCcsymAw+nP72Yr3b1WMng@mail.gmail.com/