crash: struct command can read irrelevant pages.

Wed Feb 19 01:01:29 EST 2014

Hello,

Finally, I've found the cause of the issue I mentioned as below
when makedumpfile v1.5.5 was released:

> 2. At first, the supported kernel will be updated to 3.12, but I
> found an issue while testing for v1.5.5, which seems that the page
> filtering works wrongly on kernel 3.12. I couldn't investigate this
> yet and it will take some time to finish it.
> Therefore, the latest supported kernel version is 3.11 in v1.5.5.

This is neither a kernel issue nor a makedumpfile issue, it's a crash's bug.
It can happen when a slab cache is stored at almost end of a page.

== Description ==

At the beginning, I found the error message below when I used crash for
a dumpfile generated by makedumpfile -d2:

    please wait... (gathering kmem slab cache data)
    crash: page excluded: kernel virtual address: f4e87000  type: "kmem_cache buffer"

    crash: unable to initialize kmem slab cache subsystem

This message indicated that crash failed to get a slab cache during
kmem_cache_init(), and according to the below, crash failed to get
the slab cache stored at f4e86f40:

    crash> p kmem_cache
    kmem_cache = $1 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
    crash>
    crash> list kmem_cache.list -s kmem_cache.name -h 0xc0b1cbc0
    ...
    f4d37840
      name = 0xf4edf540 "uid_cache"
    f4e86f40
    list: page excluded: kernel virtual address: f4e87000  type: "gdb_readmem_callback"

It seems that the slab cache covered two pages, [f4e86000- f4e87000] and
[f4e87000- f4e88000]. Well, let's confirm the *real* size of it.

Since slab caches except kmem_cache_boot are allocated as slab objects,
we can confirm the size like below:

  crash> p kmem_cache
  kmem_cache = $2 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
  crash> struct kmem_cache.object_size 0xc0b1cbc0
    object_size = 104
  crash>

In my environment, the size was 104 bytes. Therefore, the slab cache
stored at f4e86f40 fits in the single page([f4e86000- f4e87000]) and
the excluded page([f4e87000- f4e88000]) isn't a related page.

On the other hand, crash get the size from vmlinux by using gdb,
it was 216 bytes:

    crash> struct kmem_cache
    struct kmem_cache {
        unsigned int batchcount;
        unsigned int limit;
        ...
        struct kmem_cache_node **node;
        struct array_cache *array[33];
    }
    SIZE: 216
    crash>

So crash mistook the correlative pages of the slab cache as
[f4e86000- f4e87000] and [f4e87000- f4e88000] even though the latter
was a irrelevant page.

This gap came from the fact that the size of slab cache is variable.

    struct kmem_cache {
    ...
            struct kmem_cache_node **node;
            struct array_cache *array[NR_CPUS + MAX_NUMNODES];
            /*
             * Do not add fields after array[]
             */
    };

The size of "array" is the variable factor of kmem_cache.
When building vmlinux, the size of kmem_cache will be calculated with
NR_CPUS and MAX_NUMNODES, and put it into vmlinux as a debug information.
(Sorry, I don't know gcc well. I may misunderstand this.)
However, the actual size will be smaller than the defined size because
the actual size will be decided based on the actual number of CPUs and NODEs.

void __init kmem_cache_init(void)::
...
        /*
         * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
         */
        create_boot_cache(kmem_cache, "kmem_cache",
                offsetof(struct kmem_cache, array[nr_cpu_ids]) +
                                  nr_node_ids * sizeof(struct kmem_cache_node *),  // object_size
                                  SLAB_HWCACHE_ALIGN);
        list_add(&kmem_cache->list, &slab_caches);

As for kmem_cache, we can get the actual size of it from kmem_cache_boot,
but I suppose that kmem_cache is not the only struct in kernel whose size
is variable. So I think we should discuss how to address such issues like this.

By the way, I mentioned the case of *SLAB* in this mail,
but SLUB seems have the same issue.

Thanks
Atsushi Kumagai