[PATCH 2/2] makedumpfile: make the incomplete vmcore generated by ENOSPC error analyzable

HATAYAMA, Daisuke d.hatayama at jp.fujitsu.com
Thu Sep 18 05:29:03 PDT 2014


Hello Wang,

(2014/09/16 18:48), Wang, Xiao/Wang Xiao wrote:
> Since the incomplete vmcore generated by ENOSPC error can't be anylyzed by
> crash utility, but sometimes this file may contain important information
> and the panic problem won't be reproduced, then we came up with an idea to
> modify the exist data of the incomplete vmcore file to make it analyzable
> by crash utility. As there are two formats of the vmcore file, different
> methods are needed to deal with each of them,
>
> elf:
> Modify the value of the PT_LOAD program header to reflect the actual size
> of the incomplete vmcore file. This method can't be used to modify the
> vmcore written in flattened mode.
>
> kdump-compressed:
> Dump the bitmap before any page header and page data.
>

Rigorously, this design depends on implementation of filesystems where it's
possible to lseek and write data onto already allocated blocks even after the
filesystem is full. To show correctness of this design, you should show that
test results for a variety of filesystems such as ext3, ext4, xfs, btrfs, etc.

Of course, if this design doesn't work well on some filesystem, then we just
see the second ENOSPC and we give up addtional work. This is the same situation
as now. There's no demerit.

> Signed-of-by: Wang Xiao <wangx.fnst at cn.fujitsu.com>
> ---
>   makedumpfile.c |  150 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 files changed, 143 insertions(+), 7 deletions(-)
>
> diff --git a/makedumpfile.c b/makedumpfile.c
> index ce4a866..debc15b 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -3621,6 +3621,128 @@ write_cache(struct cache_data *cd, void *buf, size_t size)
>   }
>
>   int
> +reserve_diskspace(int fd, off_t start_offset, off_t end_offset, char *file_name)
> +{
> +    size_t buf_size;
> +    char *buf = NULL;
> +
> +    int ret = FALSE;
> +
> +    if (start_offset >= end_offset) {
> +        ERRMSG("The start offset of diskspace to be reserved must smaller than "
> +               "the end offset.\n");
> +        return FALSE;
> +    }

I feel this check should be assert()-like. This is not a message when makedumpfile runs.
At least there's no need to print this message.

> +
> +    buf_size = end_offset - start_offset;
> +
> +    if ((buf = malloc(buf_size)) == NULL) {
> +        ERRMSG("Can't allocate memory for the size of reserved diskspace. %s\n",
> +               strerror(errno));
> +        return FALSE;
> +    }
> +
> +    memset(buf, 0, buf_size);
> +    if (!write_buffer(fd, start_offset, buf, buf_size, file_name))
> +        goto out;
> +
> +    ret = TRUE;
> +out:
> +    if (buf != NULL) {
> +        free(buf);
> +    }
> +
> +    return ret;
> +}
> +
> +int
> +check_and_modify_elf_header() {

check_and_modify_elf_header*s*() is better?
It's confusing to the ELF header that always occurs at the head of ELF file.

> +    int i, phnum;
> +    off_t file_end, offset, end_offset;
> +    Elf64_Ehdr ehdr64;
> +    Elf32_Ehdr ehdr32;
> +    Elf64_Phdr phdr64;
> +    Elf32_Phdr phdr32;
> +
> +    /*
> +     * the is_elf64_memory() function still can be used.
> +     */
> +    if (is_elf64_memory()) { /* ELF64 */

ehdr64 should be declared in this scope.

> +        if (!get_elf64_ehdr(info->fd_dumpfile, info->name_dumpfile, &ehdr64)) {
> +            ERRMSG("Can't get ehdr64.\n");
> +            return FALSE;
> +        }
> +        phnum = ehdr64.e_phnum;
> +    } else { /* ELF32 */

ehdr32 should be declared in this scope.

> +        if (!get_elf32_ehdr(info->fd_dumpfile, info->name_dumpfile, &ehdr32)) {
> +            ERRMSG("Can't get ehdr32.\n");
> +            return FALSE;
> +        }
> +        phnum = ehdr32.e_phnum;
> +    }
> +
> +    file_end = lseek(info->fd_dumpfile, 0, SEEK_END);
> +    if (file_end < 0) {
> +        ERRMSG("Can't detect the size of %s. %s\n",
> +               info->name_dumpfile,
> +               strerror(errno));
> +        return FALSE;
> +    }
> +
> +    for (i = 0; i < phnum; i++) {
> +        if (is_elf64_memory()) {

phdr64.

> +            if (!get_elf64_phdr(info->fd_dumpfile,
> +                                info->name_dumpfile,
> +                                i, &phdr64)) {
> +                ERRMSG("Can't find Phdr %d.\n", i);
> +                return FALSE;
> +            }
> +
> +            offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * i;
> +            end_offset = phdr64.p_offset + phdr64.p_filesz;
> +
> +            /*
> +             * Check the program header and modify its value according
> +             * to the actual written size.
> +             */
> +            if (file_end >= end_offset)
> +                continue;
> +            else if (file_end >= phdr64.p_offset && file_end < end_offset)
> +                phdr64.p_filesz = file_end - phdr64.p_offset;

So, this is the first case that p_filesz becomes strictly less than p_memsz
except for filtered pages.

> +            else if (file_end < phdr64.p_offset)
> +                memset(&phdr64, 0, sizeof(Elf64_Phdr));

This should be phdr64.p_filesz = 0 and phdr64.p_offset = 0.
The phdr64 has still other information on system memory such p_memsz and p_paddr.
Please don't remove them.

> +
> +            if (!write_buffer(info->fd_dumpfile, offset, &phdr64,
> +                              sizeof(Elf64_Phdr), info->name_dumpfile))
> +                return FALSE;
> +        } else {

phdr32.

> +            if (!get_elf32_phdr(info->fd_dumpfile,
> +                                info->name_dumpfile,
> +                                i, &phdr32)) {
> +                ERRMSG("Can't find Phdr %d.\n", i);
> +                return FALSE;
> +            }
> +
> +            offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * i;
> +            end_offset = phdr32.p_offset + phdr32.p_filesz;
> +
> +            if (file_end >= end_offset)
> +                continue;
> +            else if (file_end >= phdr32.p_offset && file_end < end_offset)
> +                phdr32.p_filesz = file_end - phdr32.p_offset;
> +            else if (file_end < phdr32.p_offset)
> +                memset(&phdr32, 0, sizeof(Elf32_Phdr));

same as the above phdr64 case.

> +
> +            if (!write_buffer(info->fd_dumpfile, offset, &phdr32,
> +                              sizeof(Elf32_Phdr), info->name_dumpfile))
> +                return FALSE;
> +        }
> +    }
> +
> +    return TRUE;
> +}
> +
> +int
>   write_cache_bufsz(struct cache_data *cd)
>   {
>       if (!cd->buf_size)
> @@ -5432,6 +5554,13 @@ write_elf_header(struct cache_data *cd_header)
>       size_note          = note.p_filesz;
>
>       /*
> +     * Reserve a space to store the whole program headers.
> +     */
> +    if (!reserve_diskspace(cd_header->fd, cd_header->offset,
> +                           offset_note_dumpfile, cd_header->file_name))
> +        goto out;
> +
> +    /*
>        * Modify the note size in PT_NOTE header to accomodate eraseinfo data.
>        * Eraseinfo will be written later.
>        */
> @@ -6956,11 +7085,11 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
>           if (!exclude_unnecessary_pages_cyclic(&cycle))
>               return FALSE;
>
> -        if (!write_kdump_pages_cyclic(cd_header, cd_page, &pd_zero,
> -                    &offset_data, &cycle))
> +        if (!write_kdump_bitmap2_cyclic(&cycle))
>               return FALSE;
>
> -        if (!write_kdump_bitmap2_cyclic(&cycle))
> +        if (!write_kdump_pages_cyclic(cd_header, cd_page, &pd_zero,
> +                    &offset_data, &cycle))
>               return FALSE;
>       }
>
> @@ -7906,10 +8035,10 @@ writeout_dumpfile(void)
>               goto out;
>           if (info->flag_cyclic) {
>               if (!write_elf_pages_cyclic(&cd_header, &cd_page))
> -                goto out;
> +                goto chk_enospc;
>           } else {
>               if (!write_elf_pages(&cd_header, &cd_page))
> -                goto out;
> +                goto chk_enospc;
>           }
>           if (!write_elf_eraseinfo(&cd_header))
>               goto out;
> @@ -7923,12 +8052,12 @@ writeout_dumpfile(void)
>       } else {
>           if (!write_kdump_header())
>               goto out;
> +        if (!write_kdump_bitmap())
> +            goto out;
>           if (!write_kdump_pages(&cd_header, &cd_page))
>               goto out;
>           if (!write_kdump_eraseinfo(&cd_page))
>               goto out;
> -        if (!write_kdump_bitmap())
> -            goto out;
>       }
>       if (info->flag_flatten) {
>           if (!write_end_flat_header())
> @@ -7936,6 +8065,13 @@ writeout_dumpfile(void)
>       }
>
>       ret = TRUE;
> +chk_enospc:
> +    if ((ret == FALSE) && info->flag_nospace && !info->flag_flatten) {
> +        if (!write_cache_bufsz(&cd_header))
> +            goto out;
> +        if (!check_and_modify_elf_header())
> +            ERRMSG("Can't modify the elf header.\n");
> +    }

There's a feature to repeat trying to capture dumpfile in
multiple dump levels in increasing order if no space error is
returned. So, this part should be moved upward.

This is part of man 8 makedumpfile:

      -d dump_level
           Specify the type of unnecessary page for analysis.
           Pages  of   the  specified  type  are   not  copied  to
           DUMPFILE. The  page type marked in  the following table
           is excluded. A user can  specify multiple page types by
           setting the sum  of each page type  for dump_level. The
           maximum of dump_level is 31. Note that a dump_level for
           Xen dump  filtering is 0 or  1 on a machine  other than
           x86_64 (On an x86_64 machine, it is possible to specify
           2 or bigger as a dump_level).
           If specifying  multiple dump_levels with  the delimiter
           ',', makedumpfile retries to create a DUMPFILE by other
           dump_level when "No space on device" error happens. For
           example,  if  dump_level  is "11,31"  and  makedumpfile
           fails  by dump_level  11,  makedumpfile  retries it  by
           dump_level 31.
           Example:
           # makedumpfile -d 11 -x vmlinux /proc/vmcore dumpfile
           #  makedumpfile   -d  11,31  -x   vmlinux  /proc/vmcore
           dumpfile

The process of multiple dump levels is done in create_dumpfile() below.
The above check should be done after even the last trial fails.

         if (info->flag_split) {
                 if ((status = writeout_multiple_dumpfiles()) == FALSE)
                         return FALSE;
         } else {
                 if ((status = writeout_dumpfile()) == FALSE)
                         return FALSE;
         }
         if (status == NOSPACE) {
                 /*
                  * If specifying the other dump_level, makedumpfile tries
                  * to create a dumpfile with it again.
                  */
                 num_retry++;
                 if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
                         return FALSE;
                 MSG("Retry to create a dumpfile by dump_level(%d).\n",
                     info->dump_level);
                 if (!delete_dumpfile())
                         return FALSE;
                 goto retry;
         }

-- 
Thanks.
HATAYAMA, Daisuke




More information about the kexec mailing list