[RFC 0/3] kdump: Check mem_map of CMA area in kdump

Michal Hocko mhocko at suse.com
Mon Dec 18 07:19:20 PST 2023


On Mon 18-12-23 13:23:22, Pingfan Liu wrote:
> From: Pingfan Liu <piliu at redhat.com>
> 
> 
> First of all, this series is only for proof of concept. It only passes compilation.
> 
> For years, CMA is proposed to be used as crashkernel reserved memory.
> But DIO prevent us to follow it since DMA may be in-flight and ruin the
> kdump kernel.
> 
> This series exports the crash kernel's CMA area information through
> device-tree, and kdump kernel skips any page, which refcnt!=mapcount and
> has a potential DMA activity.

I didn't have time to look deeper into implementation (and I will get
back to it only early Jan) but mapcount based checks are really tricky
and unreliable.  folio_maybe_dma_pinned sounds like a better test. You
definitely want to have that checked by more MM people and CC linux-mm.
 
> The exported information include:
> 	u64 kdump_cma_pfn;
> 	u64 kdump_cma_pg_cnt;
> 	u64 kdump_cma_pg_paddr;
> 
> And they should be filled with Jiri's series "[PATCH 0/4] kdump:
> crashkernel reservation from CMA"
> 
> After the conjunction of two series, the CMA used for kdump has only the
> following risk, where the following conditions:
> 	-1.a wrong code forges _refcnt and mapcount to the same value
> 	-2.the page is also used by DIO
> 
> 
> Is it acceptable, or any rescue e.g. CRC on page?

We alredy do have vm_debug=P which enables init time poisoning
on all struct pages. The value is then checked when the page is
allocated.
 
> Please share your thoughts.

Having a sanity check on exported cma pages makes some sense to me. The
exact check might be more involved with false positives but they
shouldn't be a major problem unless there are too many of them.

-- 
Michal Hocko
SUSE Labs



More information about the kexec mailing list