[PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation

Mike Rapoport rppt at kernel.org
Tue Feb 2 07:48:57 EST 2021


On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote:
> On Mon 01-02-21 08:56:19, James Bottomley wrote:
> 
> I have also proposed potential ways out of this. Either the pool is not
> fixed sized and you make it a regular unevictable memory (if direct map
> fragmentation is not considered a major problem)

I think that the direct map fragmentation is not a major problem, and the
data we have confirms it, so I'd be more than happy to entirely drop the
pool, allocate memory page by page and remove each page from the direct
map. 

Still, we cannot prove negative and it could happen that there is a
workload that would suffer a lot from the direct map fragmentation, so
having a pool of large pages upfront is better than trying to fix it
afterwards. As we get more confidence that the direct map fragmentation is
not an issue as it is common to believe we may remove the pool altogether.

I think that using PMD_ORDER allocations for the pool with a fallback to
order 0 will do the job, but unfortunately I doubt we'll reach a consensus
about this because dogmatic beliefs are hard to shake...

A more restrictive possibility is to still use plain PMD_ORDER allocations
to fill the pool, without relying on CMA. In this case there will be no
global secretmem specific pool to exhaust, but then it's possible to drain
high order free blocks in a system, so CMA has an advantage of limiting
secretmem pools to certain amount of memory with somewhat higher
probability for high order allocation to succeed. 

> or you need a careful access control 

Do you mind elaborating what do you mean by "careful access control"?

> or you need SIGBUS on the mmap failure (to allow at least some fallback
> mode to caller).

As I've already said, I agree that SIGBUS is way better than OOM at #PF
time.
And we can add some means to fail at mmap() time if the pools are running
low.

-- 
Sincerely yours,
Mike.



More information about the linux-riscv mailing list