2.6.34-rc4 : OOPS in unmap_vma

Linus Torvalds torvalds at linux-foundation.org
Wed Apr 14 10:32:08 EDT 2010



On Wed, 14 Apr 2010, Borislav Petkov wrote:
> 
> hmm, it doesn't look like it. Your code translates to something like
> 
>    0:   b8 00 00 00 00          mov    $0x0,%eax
>    5:   80 ff ff                cmp    $0xff,%bh
>    8:   ff 48 21                decl   0x21(%rax)
>    b:   45 80 48 8b 45          rex.RB orb    $0x45,-0x75(%r8)
>   10:   80 48 ff c8             orb    $0xc8,-0x1(%rax)

There's a large constant (0xffffff8000000000) in there at the beginning, 
and the disassembly hasn't found the start of the next instruction very 
cleanly. The same is true at the end: another large constant is cut off in 
the middle. 

The byte just before the dumped instruction stream is almost certainly 
'48h', and the last byte of the last constant is 0xff, and the disassembly 
ends up being:

   0:	48 b8 00 00 00 00 80 	mov    $0xffffff8000000000,%rax
   7:	ff ff ff 
   a:	48 21 45 80          	and    %rax,-0x80(%rbp)
   e:	48 8b 45 80          	mov    -0x80(%rbp),%rax
  12:	48 ff c8             	dec    %rax
  15:	48 3b 85 40 ff ff ff 	cmp    -0xc0(%rbp),%rax
  1c:	48 8b 85 50 ff ff ff 	mov    -0xb0(%rbp),%rax
  23:	48 0f 42 7d 80       	cmovb  -0x80(%rbp),%rdi
  28:	48 89 7d 80          	mov    %rdi,-0x80(%rbp)
  2c:*	48 8b 38             	mov    (%rax),%rdi     <-- trapping instruction
  2f:	48 85 ff             	test   %rdi,%rdi
  32:	0f 84 f5 04 00 00    	je     0x52d
  38:	48 b8 fb 0f 00 00 00 	mov    $0xffffc00000000ffb,%rax
  3f:	c0 ff ff 

But yes, you found the right spot (that 0xffffff8000000000 constant is 
-549755813888 decimal):

> which I could correlate with what I get here (comments added):

Yup. Close enough. Btw, it's often good to look at both the *.s code _and_ 
the *.lst code. If you do "make mm/memory.lst", you'll find those big 
constants easily, and then you'll see the code this way:

	        do {
	                next = pgd_addr_end(addr, end);
	ffffffff81b2aa45:       48 b8 00 00 00 00 80    mov    $0x8000000000,%rax
	ffffffff81b2aa4c:       00 00 00
	ffffffff81b2aa4f:       49 8d 04 04             lea    (%r12,%rax,1),%rax
	ffffffff81b2aa53:       48 89 45 a8             mov    %rax,-0x58(%rbp)
	ffffffff81b2aa57:       48 b8 00 00 00 00 80    mov    $0xffffff8000000000,%rax
	ffffffff81b2aa5e:       ff ff ff
	ffffffff81b2aa61:       48 21 45 a8             and    %rax,-0x58(%rbp)
	ffffffff81b2aa65:       48 8b 45 b8             mov    -0x48(%rbp),%rax
	ffffffff81b2aa69:       48 8b 55 a8             mov    -0x58(%rbp),%rdx
	ffffffff81b2aa6d:       48 ff c8                dec    %rax
	ffffffff81b2aa70:       48 ff ca                dec    %rdx   
	ffffffff81b2aa73:       48 39 c2                cmp    %rax,%rdx
	ffffffff81b2aa76:       48 8b 45 b8             mov    -0x48(%rbp),%rax
	ffffffff81b2aa7a:       48 8b 55 90             mov    -0x70(%rbp),%rdx
	ffffffff81b2aa7e:       48 0f 42 45 a8          cmovb  -0x58(%rbp),%rax
	ffffffff81b2aa83:       48 89 45 a8             mov    %rax,-0x58(%rbp)
	ffffffff81b2aa87:       48 8b 02                mov    (%rdx),%rax
	void pud_clear_bad(pud_t *);
	void pmd_clear_bad(pmd_t *);
	
	static inline int pgd_none_or_clear_bad(pgd_t *pgd)
	{
	        if (pgd_none(*pgd))
	ffffffff81b2aa8a:       48 85 c0                test   %rax,%rax
	ffffffff81b2aa8d:       74 20                   je     ffffffff81b2aaaf <unmap_vmas+0x228>
	                return 1;
	        if (unlikely(pgd_bad(*pgd))) {
	ffffffff81b2aa8f:       48 ba fb 0f 00 00 00    mov    $0xffffc00000000ffb,%rdx
	ffffffff81b2aa96:       c0 ff ff
	ffffffff81b2aa99:       48 21 c2                and    %rax,%rdx
	ffffffff81b2aa9c:       48 83 fa 63             cmp    $0x63,%rdx
	ffffffff81b2aaa0:       0f 84 d9 04 00 00       je     ffffffff81b2af7f <unmap_vmas+0x6f8>

although Parag's compiler has generated much better code (possibly due to 
config differences, possibly due to compiler versions)

> So you oops when dereferencing that pgd value in %rax (%rdx in my case),
> *pgd in pgd_none_or_clear_bad(pgd) which is called in the below fragment
> of unmap_page_range().
> 
> 	pgd = pgd_offset(vma->vm_mm, addr);
> 	do {
> 		next = pgd_addr_end(addr, end);
> 		if (pgd_none_or_clear_bad(pgd)) {
> 			(*zap_work)--;
> 			continue;
> 		}
> 		next = zap_pud_range(tlb, vma, pgd, addr, next,
> 						zap_work, details);
> 	} while (pgd++, addr = next, (addr != end && *zap_work > 0));

Correct.

> so it looks like it tries to find a page table rooted at that address
> but the pointer value of 0000000000002203 is bogus.

Yes, it does look like some strange page table corruption, doesn't look 
anon_vma related at all. It's intriguing that it started happening now, 
though, so.. 

				Linus



More information about the kexec mailing list