crash: struct command can read irrelevant pages.

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Mon Feb 24 00:00:56 EST 2014


>Hello Atsushi,
>
>I've committed a SLAB/SLUB kmem_cache-specific fix for this issue:
>
>  https://github.com/crash-utility/crash/commit/c0b7a74fc13121203810d06d163550436b2d5476
>
>which is queued for crash-7.0.6.

Thanks Dave, I made sure that this patch solved my problem.

>> This is a "known" issue has been discussed on the crash-utility list in the
>> past, at least with respect to the kmem_cache data structure.  But for any random
>> data structure that has such a construct, I'm not sure what can be done.

I also have no ideas how to solve it, but it seems that it hasn't been a
practical problem yet. So I think your patch is enough for now.

>> By any chance can you make the 32-bit vmlinux/vmcore pair available for
>> me to download?  Reply to me off-list if you can.

Sure, I'll send another mail.


Thanks
Atsushi Kumagai

>>
>>
>> ----- Original Message -----
>> > Hello,
>> >
>> > Finally, I've found the cause of the issue I mentioned as below
>> > when makedumpfile v1.5.5 was released:
>> >
>> > > 2. At first, the supported kernel will be updated to 3.12, but I
>> > > found an issue while testing for v1.5.5, which seems that the page
>> > > filtering works wrongly on kernel 3.12. I couldn't investigate this
>> > > yet and it will take some time to finish it.
>> > > Therefore, the latest supported kernel version is 3.11 in v1.5.5.
>> >
>> > This is neither a kernel issue nor a makedumpfile issue, it's a crash's bug.
>> > It can happen when a slab cache is stored at almost end of a page.
>> >
>> > == Description ==
>> >
>> > At the beginning, I found the error message below when I used crash for
>> > a dumpfile generated by makedumpfile -d2:
>> >
>> >     please wait... (gathering kmem slab cache data)
>> >     crash: page excluded: kernel virtual address: f4e87000  type:
>> >     "kmem_cache
>> >     buffer"
>> >
>> >     crash: unable to initialize kmem slab cache subsystem
>> >
>> > This message indicated that crash failed to get a slab cache during
>> > kmem_cache_init(), and according to the below, crash failed to get
>> > the slab cache stored at f4e86f40:
>> >
>> >     crash> p kmem_cache
>> >     kmem_cache = $1 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
>> >     crash>
>> >     crash> list kmem_cache.list -s kmem_cache.name -h 0xc0b1cbc0
>> >     ...
>> >     f4d37840
>> >       name = 0xf4edf540 "uid_cache"
>> >     f4e86f40
>> >     list: page excluded: kernel virtual address: f4e87000  type:
>> >     "gdb_readmem_callback"
>> >
>> > It seems that the slab cache covered two pages, [f4e86000- f4e87000] and
>> > [f4e87000- f4e88000]. Well, let's confirm the *real* size of it.
>> >
>> > Since slab caches except kmem_cache_boot are allocated as slab objects,
>> > we can confirm the size like below:
>> >
>> >   crash> p kmem_cache
>> >   kmem_cache = $2 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
>> >   crash> struct kmem_cache.object_size 0xc0b1cbc0
>> >     object_size = 104
>> >   crash>
>> >
>> > In my environment, the size was 104 bytes. Therefore, the slab cache
>> > stored at f4e86f40 fits in the single page([f4e86000- f4e87000]) and
>> > the excluded page([f4e87000- f4e88000]) isn't a related page.
>> >
>> > On the other hand, crash get the size from vmlinux by using gdb,
>> > it was 216 bytes:
>> >
>> >     crash> struct kmem_cache
>> >     struct kmem_cache {
>> >         unsigned int batchcount;
>> >         unsigned int limit;
>> >         ...
>> >         struct kmem_cache_node **node;
>> >         struct array_cache *array[33];
>> >     }
>> >     SIZE: 216
>> >     crash>
>> >
>> > So crash mistook the correlative pages of the slab cache as
>> > [f4e86000- f4e87000] and [f4e87000- f4e88000] even though the latter
>> > was a irrelevant page.
>> >
>> > This gap came from the fact that the size of slab cache is variable.
>> >
>> >     struct kmem_cache {
>> >     ...
>> >             struct kmem_cache_node **node;
>> >             struct array_cache *array[NR_CPUS + MAX_NUMNODES];
>> >             /*
>> >              * Do not add fields after array[]
>> >              */
>> >     };
>> >
>> > The size of "array" is the variable factor of kmem_cache.
>> > When building vmlinux, the size of kmem_cache will be calculated with
>> > NR_CPUS and MAX_NUMNODES, and put it into vmlinux as a debug information.
>> > (Sorry, I don't know gcc well. I may misunderstand this.)
>> > However, the actual size will be smaller than the defined size because
>> > the actual size will be decided based on the actual number of CPUs and
>> > NODEs.
>> >
>> > void __init kmem_cache_init(void)::
>> > ...
>> >         /*
>> >          * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
>> >          */
>> >         create_boot_cache(kmem_cache, "kmem_cache",
>> >                 offsetof(struct kmem_cache, array[nr_cpu_ids]) +
>> >                                   nr_node_ids * sizeof(struct
>> >                                   kmem_cache_node
>> >                                   *),  // object_size
>> >                                   SLAB_HWCACHE_ALIGN);
>> >         list_add(&kmem_cache->list, &slab_caches);
>> >
>> >
>> > As for kmem_cache, we can get the actual size of it from kmem_cache_boot,
>> > but I suppose that kmem_cache is not the only struct in kernel whose size
>> > is variable. So I think we should discuss how to address such issues like
>> > this.
>> >
>> > By the way, I mentioned the case of *SLAB* in this mail,
>> > but SLUB seems have the same issue.
>> >
>> >
>> > Thanks
>> > Atsushi Kumagai
>>
>>
>> This is a "known" issue has been discussed on the crash-utility list in the
>> past,
>> at least with respect to the kmem_cache data structure.  But for any random
>> data
>> structure that has such a construct, I'm not sure what can be done.
>>
>> In the case of the CONFIG_SLAB kmem_cache data structure, there is a function
>> that is supposed to "downsize" the size value of the kmem_cache data
>> structure
>> that is returned by gdb.  It is called here in kmem_cache_init(), just
>> prior to cycling through all of the kmem_cache structures, where the
>> page excluded error shown above occurred:
>>
>>    8561         if (!(pc->flags & RUNTIME))
>>    8562                 kmem_cache_downsize();
>>    8563
>>    8564         cache_buf = GETBUF(SIZE(kmem_cache_s));
>>    8565         hq_open();
>>    8566
>>    8567         do {
>>    8568                 cache_count++;
>>    8569
>>    8570                 if (!readmem(cache, KVADDR, cache_buf,
>>    SIZE(kmem_cache_s),
>>    8571                         "kmem_cache buffer", RETURN_ON_ERROR)) {
>>    8572                         FREEBUF(cache_buf);
>>    8573                         vt->flags |= KMEM_CACHE_UNAVAIL;
>>    8574                         error(INFO,
>>    8575                           "%sunable to initialize kmem slab cache
>>    subsystem\n\n",
>>    8576                                 DUMPFILE() ? "\n" : "");
>>    8577                         hq_close();
>>    8578                         return;
>>    8579                 }
>>
>> The SIZE(kmem_cache_s) value should have been downsized by that function,
>> but presumably it did not work.  If CRASHDEBUG(1) was turned on during
>> initialization,
>> you would have seen either of these two messages from kmem_cache_downsize():
>>
>>                 if (CRASHDEBUG(1))
>>                         fprintf(fp, "kmem_cache_downsize: %ld to %ld\n",
>>                                 STRUCT_SIZE("kmem_cache"),
>>                                 SIZE(kmem_cache_s));
>>
>> or:
>>
>>                 if (CRASHDEBUG(1)) {
>>                         fprintf(fp,
>>                             "\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld "
>>                             "cache_cache.buffer_size: %d\n",
>>                                 STRUCT_SIZE("kmem_cache"), buffer_size);
>>                         fprintf(fp,
>>                             "kmem_cache_downsize: nr_node_ids: %ld\n",
>>                                 vt->kmem_cache_len_nodes);
>>                 }
>>
>> The function failed probably failed due to some kernel change.  In fact,
>> I just checked a 3.13 CONFIG_SLAB kernel, and I see that
>> kmem_cache_downsize()
>> no longer works for that kernel.
>>
>> I see that kmem_cache_boot would be a good alternative for determining
>> the size on CONFIG_SLAB kernels, at least on 3.7 and later kernels where
>> it was introduced.  And for CONFIG_SLUB, which doesn't currently have a
>> "downsize" function, it looks like its "kmem_cache" cache also has size
>> fields that could be used.
>>
>> By any chance can you make the 32-bit vmlinux/vmcore pair available for
>> me to download?  Reply to me off-list if you can.
>>
>> Thanks,
>>   Dave



More information about the kexec mailing list