[Crash-utility] crash: struct command can read irrelevant pages.

Wed Feb 19 09:50:15 EST 2014

----- Original Message -----
> Hello,
> 
> Finally, I've found the cause of the issue I mentioned as below
> when makedumpfile v1.5.5 was released:
> 
> > 2. At first, the supported kernel will be updated to 3.12, but I
> > found an issue while testing for v1.5.5, which seems that the page
> > filtering works wrongly on kernel 3.12. I couldn't investigate this
> > yet and it will take some time to finish it.
> > Therefore, the latest supported kernel version is 3.11 in v1.5.5.
> 
> This is neither a kernel issue nor a makedumpfile issue, it's a crash's bug.
> It can happen when a slab cache is stored at almost end of a page.
> 
> == Description ==
> 
> At the beginning, I found the error message below when I used crash for
> a dumpfile generated by makedumpfile -d2:
> 
>     please wait... (gathering kmem slab cache data)
>     crash: page excluded: kernel virtual address: f4e87000  type: "kmem_cache
>     buffer"
> 
>     crash: unable to initialize kmem slab cache subsystem
> 
> This message indicated that crash failed to get a slab cache during
> kmem_cache_init(), and according to the below, crash failed to get
> the slab cache stored at f4e86f40:
> 
>     crash> p kmem_cache
>     kmem_cache = $1 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
>     crash>
>     crash> list kmem_cache.list -s kmem_cache.name -h 0xc0b1cbc0
>     ...
>     f4d37840
>       name = 0xf4edf540 "uid_cache"
>     f4e86f40
>     list: page excluded: kernel virtual address: f4e87000  type:
>     "gdb_readmem_callback"
> 
> It seems that the slab cache covered two pages, [f4e86000- f4e87000] and
> [f4e87000- f4e88000]. Well, let's confirm the *real* size of it.
> 
> Since slab caches except kmem_cache_boot are allocated as slab objects,
> we can confirm the size like below:
> 
>   crash> p kmem_cache
>   kmem_cache = $2 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
>   crash> struct kmem_cache.object_size 0xc0b1cbc0
>     object_size = 104
>   crash>
> 
> In my environment, the size was 104 bytes. Therefore, the slab cache
> stored at f4e86f40 fits in the single page([f4e86000- f4e87000]) and
> the excluded page([f4e87000- f4e88000]) isn't a related page.
> 
> On the other hand, crash get the size from vmlinux by using gdb,
> it was 216 bytes:
> 
>     crash> struct kmem_cache
>     struct kmem_cache {
>         unsigned int batchcount;
>         unsigned int limit;
>         ...
>         struct kmem_cache_node **node;
>         struct array_cache *array[33];
>     }
>     SIZE: 216
>     crash>
> 
> So crash mistook the correlative pages of the slab cache as
> [f4e86000- f4e87000] and [f4e87000- f4e88000] even though the latter
> was a irrelevant page.
> 
> This gap came from the fact that the size of slab cache is variable.
> 
>     struct kmem_cache {
>     ...
>             struct kmem_cache_node **node;
>             struct array_cache *array[NR_CPUS + MAX_NUMNODES];
>             /*
>              * Do not add fields after array[]
>              */
>     };
> 
> The size of "array" is the variable factor of kmem_cache.
> When building vmlinux, the size of kmem_cache will be calculated with
> NR_CPUS and MAX_NUMNODES, and put it into vmlinux as a debug information.
> (Sorry, I don't know gcc well. I may misunderstand this.)
> However, the actual size will be smaller than the defined size because
> the actual size will be decided based on the actual number of CPUs and NODEs.
> 
> void __init kmem_cache_init(void)::
> ...
>         /*
>          * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
>          */
>         create_boot_cache(kmem_cache, "kmem_cache",
>                 offsetof(struct kmem_cache, array[nr_cpu_ids]) +
>                                   nr_node_ids * sizeof(struct kmem_cache_node
>                                   *),  // object_size
>                                   SLAB_HWCACHE_ALIGN);
>         list_add(&kmem_cache->list, &slab_caches);
> 
> 
> As for kmem_cache, we can get the actual size of it from kmem_cache_boot,
> but I suppose that kmem_cache is not the only struct in kernel whose size
> is variable. So I think we should discuss how to address such issues like
> this.
> 
> By the way, I mentioned the case of *SLAB* in this mail,
> but SLUB seems have the same issue.
> 
> 
> Thanks
> Atsushi Kumagai

This is a "known" issue has been discussed on the crash-utility list in the past,
at least with respect to the kmem_cache data structure.  But for any random data
structure that has such a construct, I'm not sure what can be done.

In the case of the CONFIG_SLAB kmem_cache data structure, there is a function
that is supposed to "downsize" the size value of the kmem_cache data structure
that is returned by gdb.  It is called here in kmem_cache_init(), just
prior to cycling through all of the kmem_cache structures, where the
page excluded error shown above occurred:

   8561         if (!(pc->flags & RUNTIME))
   8562                 kmem_cache_downsize();
   8563 
   8564         cache_buf = GETBUF(SIZE(kmem_cache_s));
   8565         hq_open();
   8566 
   8567         do {
   8568                 cache_count++;
   8569 
   8570                 if (!readmem(cache, KVADDR, cache_buf, SIZE(kmem_cache_s),
   8571                         "kmem_cache buffer", RETURN_ON_ERROR)) {
   8572                         FREEBUF(cache_buf);
   8573                         vt->flags |= KMEM_CACHE_UNAVAIL;
   8574                         error(INFO,
   8575                           "%sunable to initialize kmem slab cache subsystem\n\n",
   8576                                 DUMPFILE() ? "\n" : "");
   8577                         hq_close();
   8578                         return;
   8579                 }

The SIZE(kmem_cache_s) value should have been downsized by that function,
but presumably it did not work.  If CRASHDEBUG(1) was turned on during initialization, 
you would have seen either of these two messages from kmem_cache_downsize():

                if (CRASHDEBUG(1))
                        fprintf(fp, "kmem_cache_downsize: %ld to %ld\n",
                                STRUCT_SIZE("kmem_cache"), SIZE(kmem_cache_s));

or:

                if (CRASHDEBUG(1)) {
                        fprintf(fp,
                            "\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld "
                            "cache_cache.buffer_size: %d\n",
                                STRUCT_SIZE("kmem_cache"), buffer_size);
                        fprintf(fp,
                            "kmem_cache_downsize: nr_node_ids: %ld\n",
                                vt->kmem_cache_len_nodes);
                }

The function failed probably failed due to some kernel change.  In fact, 
I just checked a 3.13 CONFIG_SLAB kernel, and I see that kmem_cache_downsize()
no longer works for that kernel.

I see that kmem_cache_boot would be a good alternative for determining
the size on CONFIG_SLAB kernels, at least on 3.7 and later kernels where
it was introduced.  And for CONFIG_SLUB, which doesn't currently have a
"downsize" function, it looks like its "kmem_cache" cache also has size
fields that could be used.

By any chance can you make the 32-bit vmlinux/vmcore pair available for
me to download?  Reply to me off-list if you can.

Thanks,
  Dave