[BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

Mon Feb 15 13:35:26 PST 2016

On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 13:31:59 +0200
> "Kirill A. Shutemov" <kirill at shutemov.name> wrote:
> 
> > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > 
> > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > Could you check if revert of fecffad25458 helps?
> > > 
> > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > 
> > > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > ¢ 1851.721269! Call Trace:
> > > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > > ¢ 1851.721319! INFO: lockdep is turned off.
> > > ¢ 1851.721321! Last Breaking-Event-Address:
> > > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > ¢ 1851.721327!
> > > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > > 
> > > 
> > > > 
> > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > 
> > > 
> > > Sure. I didn't because they really looked random to me. Most of the time
> > > in rcu or list debugging but I thought these have just been the messenger
> > > observing a corruption first. Anyhow, here is an older one that might look
> > > interesting:
> > > 
> > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > 
> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > 
> > Could you check if you see the problem on commit 1c290f642101 and its
> > immediate parent?
> > 
> 
> How should the page->mapping poison end up as next->prev in the list of
> pre-allocated THP splitting page tables?

May be pgtable was casted to struct page or something. I don't know.

> Also, commit 1c290f642101 is before the THP rework, at least the
> non-bisectable part, so we should expect not to see the problem there.

Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
crashes. Correct?

> 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> listheads are placed inside the pre-allocated pagetables instead of page->lru,
> because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
worth changing TAIL_MAPPING to some other value to make sure.

> So, for example, two concurrent withdraws could produce such a list
> corruption, because the first withdraw will overwrite the listhead at the
> beginning of the pagetable with 2 empty ptes.
> 
> Has anything changed regarding the general THP deposit/withdraw logic?

I don't see any changes in this area.

To eliminate one more variable, I would propose to disable split pmd lock
for testing and check if it makes difference.

Is there any chance that I'll be able to trigger the bug using QEMU?
Does anybody have an QEMU image I can use?

-- 
 Kirill A. Shutemov