[PATCH v2 01/14] mm: Batch-copy PTE ranges during fork()

Barry Song 21cnbao at gmail.com
Mon Nov 27 00:42:17 PST 2023


>> +		for (i = 0; i < nr; i++, page++) {
>> +			if (anon) {
>> +				/*
>> +				 * If this page may have been pinned by the
>> +				 * parent process, copy the page immediately for
>> +				 * the child so that we'll always guarantee the
>> +				 * pinned page won't be randomly replaced in the
>> +				 * future.
>> +				 */
>> +				if (unlikely(page_try_dup_anon_rmap(
>> +						page, false, src_vma))) {
>> +					if (i != 0)
>> +						break;
>> +					/* Page may be pinned, we have to copy. */
>> +					return copy_present_page(
>> +						dst_vma, src_vma, dst_pte,
>> +						src_pte, addr, rss, prealloc,
>> +						page);
>> +				}
>> +				rss[MM_ANONPAGES]++;
>> +				VM_BUG_ON(PageAnonExclusive(page));
>> +			} else {
>> +				page_dup_file_rmap(page, false);
>> +				rss[mm_counter_file(page)]++;
>> +			}
>>   		}
>> -		rss[MM_ANONPAGES]++;
>> -	} else if (page) {
>> -		folio_get(folio);
>> -		page_dup_file_rmap(page, false);
>> -		rss[mm_counter_file(page)]++;
>> +
>> +		nr = i;
>> +		folio_ref_add(folio, nr);
> 
> You're changing the order of mapcount vs. refcount increment. Don't. 
> Make sure your refcount >= mapcount.
> 
> You can do that easily by doing the folio_ref_add(folio, nr) first and 
> then decrementing in case of error accordingly. Errors due to pinned 
> pages are the corner case.
> 
> I'll note that it will make a lot of sense to have batch variants of 
> page_try_dup_anon_rmap() and page_dup_file_rmap().
> 

i still don't understand why it is not a entire map+1, but an increment
in each basepage.

as long as it is a CONTPTE large folio, there is no much difference with
PMD-mapped large folio. it has all the chance to be DoubleMap and need
split.

When A and B share a CONTPTE large folio, we do madvise(DONTNEED) or any
similar things on a part of the large folio in process A,

this large folio will have partially mapped subpage in A (all CONTPE bits
in all subpages need to be removed though we only unmap a part of the
large folioas HW requires consistent CONTPTEs); and it has entire map in
process B(all PTEs are still CONPTES in process B).

isn't it more sensible for this large folios to have entire_map = 0(for
process B), and subpages which are still mapped in process A has map_count
=0? (start from -1).

> Especially, the batch variant of page_try_dup_anon_rmap() would only 
> check once if the folio maybe pinned, and in that case, you can simply 
> drop all references again. So you either have all or no ptes to process, 
> which makes that code easier.
> 
> But that can be added on top, and I'll happily do that.
> 
> -- 
> Cheers,
> 
> David / dhildenb

Thanks
Barry




More information about the linux-arm-kernel mailing list