[PATCH] nvme: add cond_resched() to nvme_complete_batch()

Jiwei Sun jiweisun126 at 126.com
Tue May 16 06:58:57 PDT 2023


Hi Keith,

On 2023/5/16 04:40, Keith Busch wrote:
> On Tue, May 02, 2023 at 08:54:12PM +0800, jiweisun126 at 126.com wrote:
>> From: Jiwei Sun <sunjw10 at lenovo.com>
>>
>> A soft lockup issue will be triggered when do fio test on a 448-core
>> server, such as the following warning:
> ...
>
>> According to the above two logs, we can know the nvme_irq() cost too much
>> time, in the above case, about 4.8 second. And we can also know that the
>> main bottlenecks is in the competition for the spin lock pool->lock.
> The most recent 6.4-rc has included a significant changeset to the pool
> allocator that may show a considerable difference in pool->lock timing.
> It would be interesting to hear if it changes your observation with your
> 448-core setup. Would you be able to re-run your experiements that
> produced the soft lockup with this kernel on that machine?
We have done some testes with the latest kernel, the issue can not be 
reproduced,
and we have analyzed the ftrace log of nvme_irq, we did NOT find any 
competition for
the spin lock pool->lock, and all the dma_pool_free function completed 
within 2us.

  287)               |        dma_pool_free() {
  287)   0.150 us    |          _raw_spin_lock_irqsave();
  287)   0.421 us    |          _raw_spin_unlock_irqrestore();
  287)   1.472 us    |        }
+-- 63 lines: 287)               |        mempool_free() {-----------
  435)               |        dma_pool_free() {
  435)   0.170 us    |          _raw_spin_lock_irqsave();
  435)   0.210 us    |          _raw_spin_unlock_irqrestore();
  435)   1.172 us    |        }
+--145 lines: 435)               |        mempool_free() {---------
  317)               |        dma_pool_free() {
  317)   0.160 us    |          _raw_spin_lock_irqsave();
  317)   0.401 us    |          _raw_spin_unlock_irqrestore();
  317)   1.252 us    |        }

Based on the test results and analysis of the code principles, your 
patch has fixed this performance issue.

By the way, another task hung issue was triggered in the test. We are 
analyzing it, but this is another story,
we can discuss it in other thread.

Thanks,
Regards,
Jiwei




More information about the Linux-nvme mailing list