Observing Softlockup's while running heavy IOs

Fri Aug 19 04:44:45 PDT 2016

First of all thanks for Robert and Bart for reply.

Robert,
Thanks for the URL, I have gone though this URL.
Yes I am also observing that all the interrupts are routed to one CPU.
But still I observing softlockups (sometime hardlockups) even when I
set rq_affinity to 2.

Is their any way to route the interrupts the same CPUs which has
submitted the corresponding IOs?
or
Is their any way/option in the irqbalance/kernel which can route
interrupts to CPUs (enabled in affinity_hint) in round robin manner
after specific time period.

Bart,
I have tried with your path and here is logs,

[Aug19 13:48] __blk_mq_run_hw_queue() finished after 1 ms
[  +1.196454] __blk_mq_run_hw_queue() finished after 11 ms
[Aug19 13:49] __blk_mq_run_hw_queue() finished after 20 ms
[ +14.173018] __blk_mq_run_hw_queue() finished after 278 ms
[ +14.066448] __blk_mq_run_hw_queue() finished after 588 ms
[  +5.394698] __blk_mq_run_hw_queue() finished after 1360 ms
[Aug19 13:51] __blk_mq_run_hw_queue() finished after 1539 ms
[Aug19 13:54] __blk_mq_run_hw_queue() finished after 1762 ms
[Aug19 13:55] __blk_mq_run_hw_queue() finished after 2087 ms
[Aug19 13:57] __blk_mq_run_hw_queue() finished after 2915 ms
[Aug19 14:06] perf interrupt took too long (2509 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[Aug19 14:10] __blk_mq_run_hw_queue() finished after 3266 ms
[Aug19 14:15] __blk_mq_run_hw_queue() finished after 3953 ms
[Aug19 14:22] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[llvmpipe-9:3152]
[  +0.000439] __blk_mq_run_hw_queue() finished after 10058 ms
[  +0.007206] Modules linked in: mpt3sas(OE) xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables raid456 async_raid6_recov async_memcpy
async_pq async_xor xor intel_rapl async_tx iosf_mbi
x86_pkg_temp_thermal raid6_pq coretemp kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel mei_me joydev pcspkr sb_edac i2c_i801
mei iTCO_wdt iTCO_vendor_support edac_core lpc_ich ipmi_ssif ipmi_si
ipmi_msghandler shpchp tpm_tis ioatdma acpi_power_meter tpm wmi
acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ast
i2c_algo_bit drm_kms_helper ttm ixgbe drm mdio vxlan ip6_udp_tunnel
udp_tunnel crc32c_intel raid_class ptp scsi_transport_sas
[  +0.000063]  pps_core nvme dca [last unloaded: mpt3sas]
[  +0.000007] CPU: 6 PID: 3152 Comm: llvmpipe-9 Tainted: G        W
OE   4.2.0 #1
[  +0.000002] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+,
BIOS 1.1 07/22/2015
[  +0.000003] task: ffff883f5cf557c0 ti: ffff883f5afd8000 task.ti:
ffff883f5afd8000
[  +0.000002] RIP: 0010:[<ffffffff810a159b>]  [<ffffffff810a159b>]
__do_softirq+0x7b/0x290
[  +0.000008] RSP: 0000:ffff883f7f183f08  EFLAGS: 00000206
[  +0.000002] RAX: ffff883f5afdc000 RBX: ffff883f7f190080 RCX: 00000000000006e0
[  +0.000002] RDX: 0000000000003508 RSI: 0000000071c139c0 RDI: ffff883f5cf557c0
[  +0.000002] RBP: ffff883f7f183f68 R08: 00003508717f8da4 R09: ffff883f7f183d80
[  +0.000002] R10: 0000000000000000 R11: 0000000000000004 R12: ffff883f7f183e78
[  +0.000001] R13: ffffffff8177304b R14: ffff883f7f183f68 R15: 0000000000000000
[  +0.000003] FS:  00007fa76b7f6700(0000) GS:ffff883f7f180000(0000)
knlGS:0000000000000000

Message from syslogd at dhcp-135-24-192-127 at Aug 19 14:22:42 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [llvmpipe-9:3152]
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 00007fa03a2d90c0 CR3: 0000003f61fe4000 CR4: 00000000001406e0
[  +0.000002] Stack:
[  +0.000001]  ffff883f7f18fc40 404040407f18fce8 ffff883f5afdc000
0000000100aecc1a
[  +0.000004]  00000ab10000000a 000002027f18fc58 ffff883f7f183f48
0000000000000000
[  +0.000003]  ffff883f7f1967c0 ffff883f448d57c0 0000000000000000
0000000000000001
[  +0.000003] Call Trace:
[  +0.000002]  <IRQ>
[  +0.000005]  [<ffffffff810a19c6>] irq_exit+0x116/0x120
[  +0.000005]  [<ffffffff81774eb6>] smp_apic_timer_interrupt+0x46/0x60
[  +0.000005]  [<ffffffff8177304b>] apic_timer_interrupt+0x6b/0x70
[  +0.000002]  <EOI>
[  +0.000005]  [<ffffffff810c392b>] ? finish_task_switch+0x6b/0x200
[  +0.000006]  [<ffffffff8176dabc>] __schedule+0x36c/0x950
[  +0.000002]  [<ffffffff8176e0d7>] schedule+0x37/0x80
[  +0.000006]  [<ffffffff81116a1c>] futex_wait_queue_me+0xbc/0x120
[  +0.000004]  [<ffffffff81116da9>] futex_wait+0x119/0x270
[  +0.000004]  [<ffffffff81116800>] ? futex_wake+0x90/0x180
[  +0.000003]  [<ffffffff81118d6b>] do_futex+0x12b/0xb00
[  +0.000005]  [<ffffffff810d348e>] ? set_next_entity+0x23e/0x440
[  +0.000007]  [<ffffffff810136f1>] ? __switch_to+0x261/0x4b0
[  +0.000004]  [<ffffffff811197c1>] SyS_futex+0x81/0x180
[  +0.000002]  [<ffffffff8176e0d7>] ? schedule+0x37/0x80
[  +0.000004]  [<ffffffff817721ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[  +0.000001] Code: 7e 00 01 00 00 65 48 8b 04 25 c4 3c 01 00 c7 45 c0
0a 00 00 00 48 89 45 b0 65 c7 05 6c 26 f7 7e 00 00 00 00 fb 66 0f 1f
44 00 00 <b8> ff ff ff ff 49 c7 c4 c0 a0 c0 81 0f bc 45 cc 83 c0 01 89
45

and here is the 'sar -u ALL 1 -P ALL 1'
02:22:43 PM     CPU      %usr     %nice      %sys   %iowait    %steal
    %irq     %soft    %guest    %gnice     %idle
02:22:44 PM     all      5.73      0.60     29.31     14.90      0.00
    0.00      1.57      0.00      0.00     47.89
02:22:44 PM       0      8.82      0.00     48.53     38.24      0.00
    0.00      0.00      0.00      0.00      4.41
02:22:44 PM       1     10.29      0.00     63.24     26.47      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM       2      8.96      0.00     65.67     25.37      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM       3     11.94      0.00     61.19     26.87      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM       4      8.96      0.00     64.18     26.87      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM       5      7.46      0.00     67.16     25.37      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM       6      0.00      0.00      0.00      0.00      0.00
    0.00    100.00      0.00      0.00      0.00
02:22:44 PM       7      1.33      0.00      0.00      0.00      0.00
    0.00      0.00      0.00      0.00     98.67
02:22:44 PM       8      5.19      0.00      1.30      0.00      0.00
    0.00      0.00      0.00      0.00     93.51
02:22:44 PM       9      2.67      0.00      0.00      0.00      0.00
    0.00      0.00      0.00      0.00     97.33
02:22:44 PM      10      2.67      0.00      0.00      0.00      0.00
    0.00      0.00      0.00      0.00     97.33
02:22:44 PM      11      3.95      0.00      1.32      1.32      0.00
    0.00      0.00      0.00      0.00     93.42
02:22:44 PM      12     11.11      0.00     83.33      5.56      0.00
    0.00      0.00      0.00      0.00      0.00
02:22:44 PM      13      8.70      0.00     52.17     36.23      0.00
    0.00      0.00      0.00      0.00      2.90
02:22:44 PM      14      6.15      0.00     53.85     38.46      0.00
    0.00      0.00      0.00      0.00      1.54
02:22:44 PM      15      5.97      0.00     55.22     37.31      0.00
    0.00      0.00      0.00      0.00      1.49
02:22:44 PM      16      8.70      0.00     52.17     36.23      0.00
    0.00      0.00      0.00      0.00      2.90
02:22:44 PM      17      7.35      0.00     55.88     35.29      0.00
    0.00      0.00      0.00      0.00      1.47
02:22:44 PM      18      0.00      0.00      0.00      0.00      0.00
    0.00      0.00      0.00      0.00    100.00
02:22:44 PM      19      2.63      0.00      1.32      2.63      0.00
    0.00      0.00      0.00      0.00     93.42
02:22:44 PM      20      2.67      0.00      1.33      0.00      0.00
    0.00      0.00      0.00      0.00     96.00
02:22:44 PM      21      7.89      5.26      2.63      0.00      0.00
    0.00      0.00      0.00      0.00     84.21
02:22:44 PM      22      1.32      0.00      0.00      0.00      0.00
    0.00      0.00      0.00      0.00     98.68
02:22:44 PM      23      4.05      8.11      2.70      4.05      0.00
    0.00      0.00      0.00      0.00     81.08

Still I am continuing my investigation on this.

Note:
I am taking vacation next week, please expect some delay for response.

Thanks,
Sreekanth

On Fri, Aug 19, 2016 at 2:38 AM, Elliott, Robert (Persistent Memory)
<elliott at hpe.com> wrote:
>
>
>> -----Original Message-----
>> From: linux-kernel-owner at vger.kernel.org [mailto:linux-kernel-
>> owner at vger.kernel.org] On Behalf Of Sreekanth Reddy
>> Sent: Thursday, August 18, 2016 12:56 AM
>> Subject: Observing Softlockup's while running heavy IOs
>>
>> Problem statement:
>> Observing softlockups while running heavy IOs on 8 SSD drives
>> connected behind our LSI SAS 3004 HBA.
>>
> ...
>> Observing a loop in the IO path, i.e only one CPU is busy with
>> processing the interrupts and other CPUs (in the affinity_hint mask)
>> are busy with sending the IOs (these CPUs are not yet all receiving
>> any interrupts). For example, only CPU6 is busy with processing the
>> interrupts from IRQ 219 and remaining CPUs i.e CPU 7,8,9,10 & 11 are
>> just busy with pumping the IOs and they never processed any IO
>> interrupts from IRQ 219. So we are observing softlockups due to
>> existence this loop in the IO Path.
>>
>> We may not observe these softlockups if irqbalancer might have
>> balanced the interrupts among the CPUs enabled in the particular
>> irq's
>> affinity_hint mask. so that all the CPUs are equaly busy with send
>> IOs
>> and processing the interrupts. I am not sure how irqbalancer balance
>> the load among the CPUs, but here I see only one CPU from irq's
>> affinity_hint mask is busy with interrupts and remaining CPUs won't
>> receive any interrupts from this IRQ.
>>
>> Please help me with any suggestions/recomendations to slove/limit
>> these kind of softlockups. Also please let me known if I have missed
>> any setting in the irqbalance.
>>
>
> The CPUs need to be forced to self-throttle by processing interrupts for
> their own submissions, which reduces the time they can submit more IOs.
>
> See https://lkml.org/lkml/2014/9/9/931 for discussion of this
> problem when blk-mq was added.
>
>
> ---
> Robert Elliott, HPE Persistent Memory
>
>
>