[PATCH] refpage_create.2: Document refpage_create(2)

Alejandro Colomar (man-pages) alx.manpages at gmail.com
Thu Jul 29 05:09:54 PDT 2021


Hi Peter,

On 7/17/21 4:59 AM, Peter Collingbourne wrote:
> ---
> The syscall has not landed in the kernel yet.
> Therefore, as usual, the patch should not be taken yet
> and I've used 5.x as the introducing kernel version for now.

Thanks!  Please see a few comments below.
Apart from formatting and code issues I noted,
the text looks good to me.

Please, ping us when this is merged in the kernel :)

Regards,

Alex

> 
>   man2/refpage_create.2 | 167 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 167 insertions(+)
>   create mode 100644 man2/refpage_create.2
> 
> diff --git a/man2/refpage_create.2 b/man2/refpage_create.2
> new file mode 100644
> index 000000000..c0b928b92
> --- /dev/null
> +++ b/man2/refpage_create.2
> @@ -0,0 +1,167 @@
> +.\" Copyright (C) 2021 Google LLC
> +.\" Author: Peter Collingbourne <pcc at google.com>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH REFPAGE_CREATE 2 2021-07-16 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +refpage_create \- create a reference page file descriptor
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <unistd.h>"
> +.PP
> +.BI "int syscall(SYS_refpage_create, void *" content ", unsigned int " size ,
> +.BI "            unsigned long " flags ");"
> +.fi
> +.PP
> +.IR Note :
> +glibc provides no wrapper for
> +.BR refpage_create (),
> +necessitating the use of
> +.BR syscall (2).
> +.SH DESCRIPTION
> +The
> +.BR refpage_create ()
> +system call is used to create a file descriptor
> +that conceptually refers to a read-only file
> +whose contents are an infinite repetition of
> +.I size
> +bytes of data read from the
> +.I content
> +argument to the system call,
> +and which may be mapped into memory with
> +.BR mmap (2).
> +The file descriptor is created as if by passing
> +.BR O_RDONLY | O_CLOEXEC
> +to
> +.BR open (2).
> +.PP
> +In reality, any read-only pages in the mapping are backed
> +by a so-called reference page,
> +whose contents are specified using the arguments to
> +.BR refpage_create ().
> +.PP
> +The reference page will consist of repetitions of
> +.I size
> +bytes read
> +from
> +.IR content ,
> +as many as are required to fill the page. The
> +.I size
> +argument must be a power of two less than or equal to the page size, and the
> +.I content
> +argument must have at least
> +.I size
> +alignment. The behavior is as if a copy of this data

s/\. /.\n/

Rationale: semantic newlines.

> +is made while servicing the system call;
> +any updates to the data after the system call has returned
> +will not be reflected in the reference page.
> +.PP
> +If the architecture specifies that // metadata may be associated /J/

Please, use semantic newlines (see man-pages(7))

> +with memory addresses, // that metadata if present is copied
> +into the reference page along with the data itself,
> +but only if the size argument is at least as large
> +as the granularity of the metadata.
> +For example, with the ARMv8.5 Memory Tagging Extension,
> +the memory tags are copied, // but only if the size is greater than /J/
> +or equal to // the architecturally specified tag granule size of 16 bytes.
> +.PP
> +Writable private mappings trigger specific copy-on-write behavior
> +when a page in the mapping is written to.
> +The behavior is as if the reference page is copied,
> +but the kernel may use a more efficient technique such as
> +.BR memset (3)
> +to produce the copy if the
> +.I size
> +argument originally used to create the reference page file descriptor
> +is sufficiently small.
> +For this reason it is recommended to specify as small of a
> +.I size
> +argument as possible
> +in order to activate any such optimizations implemented in the kernel.
> +.PP
> +The advantage of using this system call
> +over creating normal anonymous mappings
> +and manually initializing the pages from userspace
> +is that it is more efficient.
> +If it is not known that all of the pages in the mapping
> +will be faulted (for example, if the system call is used
> +by a general purpose memory allocator
> +where the behavior of the client program is unknown),
> +letting the pages be prepared on fault only if needed
> +is more efficient from both a performance
> +and memory consumption perspective.
> +Even if all of the pages would end up being faulted,
> +it would still be more efficient
> +to have the kernel initialize the pages with the required contents once
> +than to have the kernel zero initialize them on fault
> +and then have userspace initialize them again with different contents.
> +.SH EXAMPLES
> +The following program creates a 128KB memory mapping

The SI mandates that a space shall be inserted between a number and the 
associated unit.

Also, if it really means 128 KiB, which I guess, please use KiB.  See 
units(7).

Use a non-breaking space to make sure that the unit goes with the number.

With all that, it would be:

... creates a 128\ KiB memory ...

> +preinitialized with the pattern byte 0xAA
> +and verifies that the contents of the mapping are correct.
> +.PP
> +.EX
> +#include <linux/unistd.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +int main() {
> +    unsigned char pattern = 0xaa;

Please use capital AA to help visually differentiate x and a.

> +    unsigned long mmap_size = 131072;

Why that magic number?
Maybe a shift to indicate that it's a power of 2...  or 128 * 1024...
I don't know from the top of my head powers of 2 that high :)

Also, why 'unsigned long'?  The SYNOPSIS says it's an 'unsigned int'.

> +
> +    int fd = syscall(SYS_refpage_create, &pattern, 1, 0);

Please use sizeof(pattern) instead of 1 to communicate the relationship 
between them.

> +    if (fd < 0) {
> +        perror("refpage_create");
> +        return 1;

Please use EXIT_FAILURE (<stdlib.h>).  Also use exit(3) instead of 
return, as is common practice in manual pages.

> +    }
> +    unsigned char *p = mmap(0, mmap_size, PROT_READ | PROT_WRITE,

Use NULL instead of 0 for pointers.  The first argument of mmap(2) is 
'void *addr'.

> +                            MAP_PRIVATE, fd, 0);
> +    if (p == MAP_FAILED) {
> +        perror("mmap");
> +        return 1;
> +    }
> +    for (unsigned i = 0; i != mmap_size; ++i) {

s/unsigned/unsigned int/

> +        if (p[i] != pattern) {
> +            fprintf(stderr, "refpage failed contents check @ %u: "
> +                    "0x%x != 0x%x\n",

I prefer 0x%X, which is already in use in some manual pages (seccomp(2)).

Also, 'i' may be more readable in hex, given it's an offset of an 
address (actually the concept of a size_t, even if the kernel doesn't 
use that type) don't you think?

> +                    i, p[i], pattern);
> +            return 1;

exit(3)

> +        }
> +    }
> +}
> +.EE
> +.SH NOTE
> +Reading from a reference page file descriptor, e.g. with
> +.BR read (2),
> +is not supported, nor would this be particularly useful.
> +.SH VERSIONS
> +This system call first appeared in Linux 5.x. > +.SH CONFORMING TO
> +The
> +.BR refpage_create ()
> +system call is Linux-specific.
> +.SH SEE ALSO
> +.BR mmap (2),
> +.BR open (2).
> 


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



More information about the linux-arm-kernel mailing list