mtd: kernel BUG at arch/x86/mm/pat.c:279!

Linus Torvalds torvalds at linux-foundation.org
Fri Sep 7 19:09:59 EDT 2012


On Fri, Sep 7, 2012 at 3:42 PM, Suresh Siddha <suresh.b.siddha at intel.com> wrote:
> -       unsigned long start;
> -       unsigned long off;
> -       u32 len;
> +       resource_size_t start, off;
> +       unsigned long len;

So since the oops is on x86-64, I don't think it's the "unsigned long"
-> "resource_size_t" part (which can be an issue on 32-bit
architectures, though).

The "u32 len" -> "unsigned long len" thing *might* make a difference, though.

I also think your patch is incomplete even on 32-bit, because this:

>         if (mtd->type == MTD_RAM || mtd->type == MTD_ROM) {
>                 off = vma->vm_pgoff << PAGE_SHIFT;

is still wrong. It probably should be

    off = vma->vm_pgoff;
    off <<= PAGE_SHIFT;

because vm_pgoff may be a 32-bit type, while "resource_size_t" may be
64-bit. Shifting the 32-bit type without a cast (implicit or explicit)
isn't going to help.

That said, we have absolutely *tons* of bugs with this particular
pattern. Just do

    git grep 'vm_pgoff.*<<.*PAGE_SHIFT'

and there are distressingly few casts in there (there's a few, mainly
in fs/proc).

Now, I suspect many of them are fine just because most users probably
are size-limited anyway, but it's a bit distressing stuff. And I
suspect it means we might want to introduce a helper function like

    static inline u64 vm_offset(struct vm_area_struct *vma)
    {
        return (u64)vma->vm_pgoff << PAGE_SHIFT;
    }

or something. Maybe add the "vm_length()" helper while at it too,
since the whole "vma->vm_end - vma->vm_start" thing is so common.

Anyway, since Sasha's oops is clearly not 32-bit, the above issues
don't matter, and it would be interesting to hear if it's the 32-bit
'len' thing that triggers this problem. Still, I can't see how it
would - as far as I can tell, a truncated 'len' would at most result
in spurious early "return -EINVAL", not any real problem.

What are we missing?

Sasha, since you can apparently reproduce it, can you replace the
"BUG_ON()" with just a

 if (start >= end) {
    printf("bogus range %llx - %llx\n", start, end);
    return -EINVAL;
  }

or something.

I'm starting to suspect that maybe it's actually that the length is
*zero*, and start == end, and that we should just return zero for that
case. But let's see what Sasha finds..

              Linus



More information about the linux-mtd mailing list