[PATCH] refpage_create.2: Document refpage_create(2)
Alejandro Colomar (man-pages)
alx.manpages at gmail.com
Thu Jul 29 05:09:54 PDT 2021
Hi Peter,
On 7/17/21 4:59 AM, Peter Collingbourne wrote:
> ---
> The syscall has not landed in the kernel yet.
> Therefore, as usual, the patch should not be taken yet
> and I've used 5.x as the introducing kernel version for now.
Thanks! Please see a few comments below.
Apart from formatting and code issues I noted,
the text looks good to me.
Please, ping us when this is merged in the kernel :)
Regards,
Alex
>
> man2/refpage_create.2 | 167 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 167 insertions(+)
> create mode 100644 man2/refpage_create.2
>
> diff --git a/man2/refpage_create.2 b/man2/refpage_create.2
> new file mode 100644
> index 000000000..c0b928b92
> --- /dev/null
> +++ b/man2/refpage_create.2
> @@ -0,0 +1,167 @@
> +.\" Copyright (C) 2021 Google LLC
> +.\" Author: Peter Collingbourne <pcc at google.com>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date. The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein. The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH REFPAGE_CREATE 2 2021-07-16 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +refpage_create \- create a reference page file descriptor
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <unistd.h>"
> +.PP
> +.BI "int syscall(SYS_refpage_create, void *" content ", unsigned int " size ,
> +.BI " unsigned long " flags ");"
> +.fi
> +.PP
> +.IR Note :
> +glibc provides no wrapper for
> +.BR refpage_create (),
> +necessitating the use of
> +.BR syscall (2).
> +.SH DESCRIPTION
> +The
> +.BR refpage_create ()
> +system call is used to create a file descriptor
> +that conceptually refers to a read-only file
> +whose contents are an infinite repetition of
> +.I size
> +bytes of data read from the
> +.I content
> +argument to the system call,
> +and which may be mapped into memory with
> +.BR mmap (2).
> +The file descriptor is created as if by passing
> +.BR O_RDONLY | O_CLOEXEC
> +to
> +.BR open (2).
> +.PP
> +In reality, any read-only pages in the mapping are backed
> +by a so-called reference page,
> +whose contents are specified using the arguments to
> +.BR refpage_create ().
> +.PP
> +The reference page will consist of repetitions of
> +.I size
> +bytes read
> +from
> +.IR content ,
> +as many as are required to fill the page. The
> +.I size
> +argument must be a power of two less than or equal to the page size, and the
> +.I content
> +argument must have at least
> +.I size
> +alignment. The behavior is as if a copy of this data
s/\. /.\n/
Rationale: semantic newlines.
> +is made while servicing the system call;
> +any updates to the data after the system call has returned
> +will not be reflected in the reference page.
> +.PP
> +If the architecture specifies that // metadata may be associated /J/
Please, use semantic newlines (see man-pages(7))
> +with memory addresses, // that metadata if present is copied
> +into the reference page along with the data itself,
> +but only if the size argument is at least as large
> +as the granularity of the metadata.
> +For example, with the ARMv8.5 Memory Tagging Extension,
> +the memory tags are copied, // but only if the size is greater than /J/
> +or equal to // the architecturally specified tag granule size of 16 bytes.
> +.PP
> +Writable private mappings trigger specific copy-on-write behavior
> +when a page in the mapping is written to.
> +The behavior is as if the reference page is copied,
> +but the kernel may use a more efficient technique such as
> +.BR memset (3)
> +to produce the copy if the
> +.I size
> +argument originally used to create the reference page file descriptor
> +is sufficiently small.
> +For this reason it is recommended to specify as small of a
> +.I size
> +argument as possible
> +in order to activate any such optimizations implemented in the kernel.
> +.PP
> +The advantage of using this system call
> +over creating normal anonymous mappings
> +and manually initializing the pages from userspace
> +is that it is more efficient.
> +If it is not known that all of the pages in the mapping
> +will be faulted (for example, if the system call is used
> +by a general purpose memory allocator
> +where the behavior of the client program is unknown),
> +letting the pages be prepared on fault only if needed
> +is more efficient from both a performance
> +and memory consumption perspective.
> +Even if all of the pages would end up being faulted,
> +it would still be more efficient
> +to have the kernel initialize the pages with the required contents once
> +than to have the kernel zero initialize them on fault
> +and then have userspace initialize them again with different contents.
> +.SH EXAMPLES
> +The following program creates a 128KB memory mapping
The SI mandates that a space shall be inserted between a number and the
associated unit.
Also, if it really means 128 KiB, which I guess, please use KiB. See
units(7).
Use a non-breaking space to make sure that the unit goes with the number.
With all that, it would be:
... creates a 128\ KiB memory ...
> +preinitialized with the pattern byte 0xAA
> +and verifies that the contents of the mapping are correct.
> +.PP
> +.EX
> +#include <linux/unistd.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +int main() {
> + unsigned char pattern = 0xaa;
Please use capital AA to help visually differentiate x and a.
> + unsigned long mmap_size = 131072;
Why that magic number?
Maybe a shift to indicate that it's a power of 2... or 128 * 1024...
I don't know from the top of my head powers of 2 that high :)
Also, why 'unsigned long'? The SYNOPSIS says it's an 'unsigned int'.
> +
> + int fd = syscall(SYS_refpage_create, &pattern, 1, 0);
Please use sizeof(pattern) instead of 1 to communicate the relationship
between them.
> + if (fd < 0) {
> + perror("refpage_create");
> + return 1;
Please use EXIT_FAILURE (<stdlib.h>). Also use exit(3) instead of
return, as is common practice in manual pages.
> + }
> + unsigned char *p = mmap(0, mmap_size, PROT_READ | PROT_WRITE,
Use NULL instead of 0 for pointers. The first argument of mmap(2) is
'void *addr'.
> + MAP_PRIVATE, fd, 0);
> + if (p == MAP_FAILED) {
> + perror("mmap");
> + return 1;
> + }
> + for (unsigned i = 0; i != mmap_size; ++i) {
s/unsigned/unsigned int/
> + if (p[i] != pattern) {
> + fprintf(stderr, "refpage failed contents check @ %u: "
> + "0x%x != 0x%x\n",
I prefer 0x%X, which is already in use in some manual pages (seccomp(2)).
Also, 'i' may be more readable in hex, given it's an offset of an
address (actually the concept of a size_t, even if the kernel doesn't
use that type) don't you think?
> + i, p[i], pattern);
> + return 1;
exit(3)
> + }
> + }
> +}
> +.EE
> +.SH NOTE
> +Reading from a reference page file descriptor, e.g. with
> +.BR read (2),
> +is not supported, nor would this be particularly useful.
> +.SH VERSIONS
> +This system call first appeared in Linux 5.x. > +.SH CONFORMING TO
> +The
> +.BR refpage_create ()
> +system call is Linux-specific.
> +.SH SEE ALSO
> +.BR mmap (2),
> +.BR open (2).
>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
More information about the linux-arm-kernel
mailing list