/proc/vmcore mmap() failure issue
d.hatayama at jp.fujitsu.com
Mon Nov 25 04:01:37 EST 2013
(2013/11/25 17:10), Atsushi Kumagai wrote:
> On 2013/11/22 1:53:14, kexec <kexec-bounces at lists.infradead.org> wrote:
>> On Thu, Nov 21, 2013 at 05:31:46PM +0900, HATAYAMA Daisuke wrote:
>>>> So I think the patch I sent is enough, the policy will be simpler as
>>>> "Don't use mmap() for buggy kernels".
>>>> [PATCH] Fall back to read() when mmap() fails.
>>> I think logic becomes not so complex. For example, if input vmcore
>>> format is ELF, then:
>>> o in update_mmap_range():
>>> - first calculate a range of the corresponding PT_LOAD entry truncated with
>>> - Then, truncate range of mmap() by the truncated range of the corresponding
>>> PT_LOAD entry, i.e., exlucde partial pages from mmap() target range.
>>> - Then determine offsets of two partial pages; the number of partial pages
>>> are always at most two. The offsets can easily be calculated from the
>>> original range of the corresponding PT_LOAD entry
>>> o in read_from_vmcore(), if a given offset belongs to either of two partial
>>> pages, then go to read() path; if not, go to mmap() path.
>> I agree that we should do mmap() on all non-partial pages and do read()
>> on all partial pages. Otherwise we lose the benefit of faster speed of
> I agree to avoid this issue by fixing makedumpfile as workaround while to
> fix kernel is so tough and risky. However, it sounds strange to me to fix
> userspace side elaborately for such definite kernel issue whose cause is
> known, so we should fix the kernel itself.
> Otherwise, will you continue to add specific fixes into user tools to
> address kernel issues like this case ?
makedumpfile supports a wide range of kernel versions and needs to satisfy
backward compatibility. mmap() on /proc/vmcore might be backported to some of
the old versions on some distributions if necessary. Then, it's hard to fix
each old kernel at each back port. The method that can be applied to all the
kernels in general, is necessary.
Also, looking at ia64 case where there's boot loader data on partial pages,
there could be other environments where partial pages contain other important
data other components have. So, the issue depends not only on kernels but also
other components such as boot loader and firmwares that can put data on
partial pages. We need to get there as long as there's important data there
and we have access to there.
More information about the kexec