[PATCH 07/12] huge_memory: Allow mappings of PMD sized pages

Wed Oct 23 16:38:30 PDT 2024

Alistair Popple wrote:
> 
> Alistair Popple <apopple at nvidia.com> writes:
> 
> > Alistair Popple wrote:
> >> Dan Williams <dan.j.williams at intel.com> writes:
> 
> [...]
> 
> >>> +
> >>> +	return VM_FAULT_NOPAGE;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(dax_insert_pfn_pmd);
> >>
> >> Like I mentioned before, lets make the exported function
> >> vmf_insert_folio() and move the pte, pmd, pud internal private / static
> >> details of the implementation. The "dax_" specific aspect of this was
> >> removed at the conversion of a dax_pfn to a folio.
> >
> > Ok, let me try that. Note that vmf_insert_pfn{_pmd|_pud} will have to
> > stick around though.
> 
> Creating a single vmf_insert_folio() seems somewhat difficult because it
> needs to be called from multiple fault paths (either PTE, PMD or PUD
> fault) and do something different for each.
> 
> Specifically the issue I ran into is that DAX does not downgrade PMD
> entries to PTE entries if they are backed by storage. So the PTE fault
> handler will get a PMD-sized DAX entry and therefore a PMD size folio.
> 
> The way I tried implementing vmf_insert_folio() was to look at
> folio_order() to determine which internal implementation to call. But
> that doesn't work for a PTE fault, because there's no way to determine
> if we should PTE map a subpage or PMD map the entire folio.

Ah, that conflict makes sense.

> We could pass down some context as to what type of fault we're handling,
> or add it to the vmf struct, but that seems excessive given callers
> already know this and could just call a specific
> vmf_insert_page_{pte|pmd|pud}.

Ok, I think it is good to capture that "because dax does not downgrade
entries it may satisfy PTE faults with PMD inserts", or something like
that in comment or changelog.