[Patch v3 7/7] add a new interface to show the memory usage of 1st kernel
bhe at redhat.com
bhe at redhat.com
Mon Aug 25 20:22:47 PDT 2014
On 08/26/14 at 02:28am, Atsushi Kumagai wrote:
> >On 08/01/14 at 07:12am, Atsushi Kumagai wrote:
> >> >Page number of memory in different use
> >> >--------------------------------------------------
> >> >TYPE PAGES EXCLUDABLE DESCRIPTION
> >> >ZERO 0 yes Pages filled with zero
> >>
> >> The number of zero pages is always 0 since it isn't counted during
> >> get_num_dumpable_cyclic(). To count it up, we have to read all of the
> >> pages like exclude_zero_pages(), so we need "exclude_zero_pages_cyclic()".
> >> My idea is to call it in get_num_dumpable_cyclic() like:
> >>
> >> for_each_cycle(0, info->max_mapnr, &cycle)
> >> {
> >> if (!exclude_unnecessary_pages_cyclic(&cycle))
> >> return FALSE;
> >>
> >> + if (info->flag_mem_usage)
> >> + exclude_zero_pages_cyclic(&cycle);
> >> +
> >> for(pfn=cycle.start_pfn; pfn<cycle.end_pfn; pfn++)
> >
> >
> >Hi Atsushi,
> >
> >I just introduced a new function exclude_zero_pages_cyclic as you
> >suggested. But it always exited with below message. I don't know what's
> >wrong with this function. Could you help have a look at it?
> >
> >"Program terminated with signal SIGKILL"
>
> Umm, the code looks no problem and it works well at least on my
> machine (x86_64 on KVM), so I have no idea for now.
>
> Can strace and audit help your investigation? They may provide
> some hints (e.g. Who send SIGKILL) for us.
It only happened on a AMD machine with Quad-Core AMD Opteron(tm)
Processor 1352. I tested on my other 2 intel machines, both of them are
OK.
Just now I used strace to check it, and found it's caused by a reading.
It's weird since that page should be inside the System RAM and can be
read. And before this handling hwpoison has been checked. I am wondering
why it happened.
[ ~]$ sudo readelf -l /proc/kcore
Elf file type is CORE (Core file)
Entry point 0x0
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
This is the load segment where the page reading error happened.
LOAD 0x0000080080001000 0xffff880080000000
0x0000000000000000
0x000000004fee0000 0x000000004fee0000 RWE 1000
...
LOAD 0x00006a0002001000 0xffffea0002000000
0x0000000000000000
0x00000000013fc000 0x00000000013fc000 RWE 1000
LOAD 0x0000080100001000 0xffff880100000000
0x0000000000000000
0x0000000130000000 0x0000000130000000 RWE 1000
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 8799351988224, SEEK_SET) = 8799351988224
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 8799351992320, SEEK_SET) = 8799351992320
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 8799351996416, SEEK_SET) = 8799351996416
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 8799365541888, SEEK_SET) = 8799365541888
read(3,
"\340\216\274\226f\177\0\0PCD\224f\177\0\0\265\0\0\0\0\0\0\0p\217\274\226f\177\0\0"...,
4096) = 4096
-----------------------------------------
Here it use lseek to position, then try to read, and then reading failed
and raised a SIGKILL.
lseek(3, 8799381360640, SEEK_SET) = 8799381360640
read(3, <unfinished ...>
+++ killed by SIGKILL +++
Killed
>
>
> Thanks
> Atsushi Kumagai
>
More information about the kexec
mailing list