[ammarfaizi2-block:dhowells/linux-fs/fscache-fixes] [mm, netfs, fscache] 6919cda8e0: canonical_address#:#[##]
Linus Torvalds
torvalds at linux-foundation.org
Sun Dec 11 10:27:48 PST 2022
The disassembly isn't great, because the test robot doesn't try to
find where the instructions start, but before that
> 4: 48 8b 57 18 mov 0x18(%rdi),%rdx
instruction we also had a
mov (%rdi),%rax
and it looks like this is the very top of 'filemap_release_folio()',
so '%rdi' contains the folio pointer coming into this.
End result:
On Sun, Dec 11, 2022 at 6:27 AM kernel test robot <oliver.sang at intel.com> wrote:
>
> 4: 48 8b 57 18 mov 0x18(%rdi),%rdx
> 8: 83 e0 01 and $0x1,%eax
> b: 74 59 je 0x66
The
and $0x1,%eax
je 0x66
above is the test for
BUG_ON(!folio_test_locked(folio));
where it's jumping out to the 'ud2' in case the lock bit (bit #0) isn't set.
Then we have this:
> d: 48 f7 07 00 60 00 00 testq $0x6000,(%rdi)
> 14: 74 22 je 0x38
Which is testing PG_private | PG_private2, and jumping out (which we
also don't do) if neither is set.
And then we have:
> 16: 48 8b 07 mov (%rdi),%rax
> 19: f6 c4 80 test $0x80,%ah
> 1c: 75 32 jne 0x50
Which is checking for PG_writeback.
So then we get to
if (mapping && mapping->a_ops->release_folio)
return mapping->a_ops->release_folio(folio, gfp);
which is this:
> 1e: 48 85 d2 test %rdx,%rdx
> 21: 74 34 je 0x57
This %rdx value is the early load from the top of the function, it's
checking 'mapping' for NULL.
It's not NULL, but it's some odd value according to the oops report:
RDX: ffff889f03987f71
which doesn't look like it's valid (well, it's a valid kernel pointer,
but it's not aligned like a 'mapping' pointer should be.
So now when we're going to load 'a_ops' from there, we load another
garbage value:
> 23: 48 8b 82 90 00 00 00 mov 0x90(%rdx),%rax
and we now have RAX: b000000000000000
and then the 'a_ops->release_folio' access will trap:
> 2a:* 48 8b 40 48 mov 0x48(%rax),%rax <-- trapping instruction
> 2e: 48 85 c0 test %rax,%rax
> 31: 74 24 je 0x57
The above is the "load a_ops->release_folio and test it for NULL", but
the load took a page fault because RAX was garbage.
But RAX was garbage because we already had a bogus "mapping" pointer earlier.
Now, why 'mapping' was bogus, I don't know. Maybe that page wasn't a
page cache page at all? The mapping field is in a union and can
contain other things.
So I have no explanation for the oops, but I thought I'd just post the
decoding of the instruction stream in case that helps somebody else to
figure it out.
Linus
More information about the linux-afs
mailing list