[PATCH 0/4] kdump: crashkernel reservation from CMA
Donald Dutile
ddutile at redhat.com
Wed Nov 29 07:03:57 PST 2023
Baoquan,
hi!
On 11/29/23 3:10 AM, Baoquan He wrote:
> On 11/28/23 at 10:08am, Michal Hocko wrote:
>> On Tue 28-11-23 10:11:31, Baoquan He wrote:
>>> On 11/28/23 at 09:12am, Tao Liu wrote:
>> [...]
>>> Thanks for the effort to bring this up, Jiri.
>>>
>>> I am wondering how you will use this crashkernel=,cma parameter. I mean
>>> the scenario of crashkernel=,cma. Asking this because I don't know how
>>> SUSE deploy kdump in SUSE distros. In SUSE distros, kdump kernel's
>>> driver will be filter out? If latter case, It's possibly having the
>>> on-flight DMA issue, e.g NIC has DMA buffer in the CMA area, but not
>>> reset during kdump bootup because the NIC driver is not loaded in to
>>> initialize. Not sure if this is 100%, possible in theory?
>>
>> NIC drivers do not allocation from movable zones (that includes CMA
>> zone). In fact kernel doesn't use GFP_MOVABLE for non-user requests.
>> RDMA drivers might and do transfer from user backed memory but for that
>> purpose they should be pinning memory (have a look at
>> __gup_longterm_locked and its callers) and that will migrate away from
>> the any zone.
>
> Add Don in this thread.
>
> I am not familiar with RDMA. If we reserve a range of 1G meory as cma in
> 1st kernel, and RDMA or any other user space tools could use it. When
> corruption happened with any cause, that 1G cma memory will be reused as
> available MOVABLE memory of kdump kernel. If no risk at all, I mean 100%
> safe from RDMA, that would be great.
>
My RDMA days are long behind me... more in mm space these days, so this still
interests me.
I thought, in general, userspace memory is not saved or used in kdumps, so
if RDMA is using cma space for userspace-based IO (gup), then I would expect
it can be re-used for kexec'd kernel.
So, I'm not sure what 'safe from RDMA' means, but I would expect RDMA queues
are in-kernel data structures, not userspace strucutures, and they would be
more/most important to maintain/keep for kdump saving. The actual userspace
data ... ssdd wrt any other userspace data.
dma-buf's allocated from cma, which are (typically) shared with GPUs
(& RDMA in GPU-direct configs), again, would be shared userspace, not
control/cmd/rsp queues, so I'm not seeing an issue there either.
I would poke the NVIDIA+Mellanox folks for further review in this space,
if my reply leaves you (or others) 'wanting'.
- Don
>>
>> [...]
>>> The crashkernel=,cma requires no userspace data dumping, from our
>>> support engineers' feedback, customer never express they don't need to
>>> dump user space data. Assume a server with huge databse deployed, and
>>> the database often collapsed recently and database provider claimed that
>>> it's not database's fault, OS need prove their innocence. What will you
>>> do?
>>
>> Don't use CMA backed crash memory then? This is an optional feature.
>>
>>> So this looks like a nice to have to me. At least in fedora/rhel's
>>> usage, we may only back port this patch, and add one sentence in our
>>> user guide saying "there's a crashkernel=,cma added, can be used with
>>> crashkernel= to save memory. Please feel free to try if you like".
>>> Unless SUSE or other distros decides to use it as default config or
>>> something like that. Please correct me if I missed anything or took
>>> anything wrong.
>>
>> Jiri will know better than me but for us a proper crash memory
>> configuration has become a real nut. You do not want to reserve too much
>> because it is effectively cutting of the usable memory and we regularly
>> hit into "not enough memory" if we tried to be savvy. The more tight you
>> try to configure the easier to fail that is. Even worse any in kernel
>> memory consumer can increase its memory demand and get the overall
>> consumption off the cliff. So this is not an easy to maintain solution.
>> CMA backed crash memory can be much more generous while still usable.
>> --
>> Michal Hocko
>> SUSE Labs
>>
>
More information about the kexec
mailing list