[RFC PATCH v4 1/5] mm/readahead: Honour new_order in page_cache_ra_order()

Jan Kara jack at suse.cz
Mon May 5 03:09:57 PDT 2025


On Mon 05-05-25 11:51:43, David Hildenbrand wrote:
> On 30.04.25 16:59, Ryan Roberts wrote:
> > page_cache_ra_order() takes a parameter called new_order, which is
> > intended to express the preferred order of the folios that will be
> > allocated for the readahead operation. Most callers indeed call this
> > with their preferred new order. But page_cache_async_ra() calls it with
> > the preferred order of the previous readahead request (actually the
> > order of the folio that had the readahead marker, which may be smaller
> > when alignment comes into play).
> > 
> > And despite the parameter name, page_cache_ra_order() always treats it
> > at the old order, adding 2 to it on entry. As a result, a cold readahead
> > always starts with order-2 folios.
> > 
> > Let's fix this behaviour by always passing in the *new* order.
> > 
> > Worked example:
> > 
> > Prior to the change, mmaping an 8MB file and touching each page
> > sequentially, resulted in the following, where we start with order-2
> > folios for the first 128K then ramp up to order-4 for the next 128K,
> > then get clamped to order-5 for the rest of the file because pa_pages is
> > limited to 128K:
> > 
> > TYPE    STARTOFFS     ENDOFFS       SIZE  STARTPG    ENDPG   NRPG  ORDER
> > -----  ----------  ----------  ---------  -------  -------  -----  -----
> > FOLIO  0x00000000  0x00004000      16384        0        4      4      2
> > FOLIO  0x00004000  0x00008000      16384        4        8      4      2
> > FOLIO  0x00008000  0x0000c000      16384        8       12      4      2
> > FOLIO  0x0000c000  0x00010000      16384       12       16      4      2
> > FOLIO  0x00010000  0x00014000      16384       16       20      4      2
> > FOLIO  0x00014000  0x00018000      16384       20       24      4      2
> > FOLIO  0x00018000  0x0001c000      16384       24       28      4      2
> > FOLIO  0x0001c000  0x00020000      16384       28       32      4      2
> > FOLIO  0x00020000  0x00030000      65536       32       48     16      4
> > FOLIO  0x00030000  0x00040000      65536       48       64     16      4
> > FOLIO  0x00040000  0x00060000     131072       64       96     32      5
> > FOLIO  0x00060000  0x00080000     131072       96      128     32      5
> > FOLIO  0x00080000  0x000a0000     131072      128      160     32      5
> > FOLIO  0x000a0000  0x000c0000     131072      160      192     32      5
> 
> Interesting, I would have thought we'd ramp up earlier.
> 
> > ...
> > 
> > After the change, the same operation results in the first 128K being
> > order-0, then we start ramping up to order-2, -4, and finally get
> > clamped at order-5:
> > 
> > TYPE    STARTOFFS     ENDOFFS       SIZE  STARTPG    ENDPG   NRPG  ORDER
> > -----  ----------  ----------  ---------  -------  -------  -----  -----
> > FOLIO  0x00000000  0x00001000       4096        0        1      1      0
> > FOLIO  0x00001000  0x00002000       4096        1        2      1      0
> > FOLIO  0x00002000  0x00003000       4096        2        3      1      0
> > FOLIO  0x00003000  0x00004000       4096        3        4      1      0
> > FOLIO  0x00004000  0x00005000       4096        4        5      1      0
> > FOLIO  0x00005000  0x00006000       4096        5        6      1      0
> > FOLIO  0x00006000  0x00007000       4096        6        7      1      0
> > FOLIO  0x00007000  0x00008000       4096        7        8      1      0
> > FOLIO  0x00008000  0x00009000       4096        8        9      1      0
> > FOLIO  0x00009000  0x0000a000       4096        9       10      1      0
> > FOLIO  0x0000a000  0x0000b000       4096       10       11      1      0
> > FOLIO  0x0000b000  0x0000c000       4096       11       12      1      0
> > FOLIO  0x0000c000  0x0000d000       4096       12       13      1      0
> > FOLIO  0x0000d000  0x0000e000       4096       13       14      1      0
> > FOLIO  0x0000e000  0x0000f000       4096       14       15      1      0
> > FOLIO  0x0000f000  0x00010000       4096       15       16      1      0
> > FOLIO  0x00010000  0x00011000       4096       16       17      1      0
> > FOLIO  0x00011000  0x00012000       4096       17       18      1      0
> > FOLIO  0x00012000  0x00013000       4096       18       19      1      0
> > FOLIO  0x00013000  0x00014000       4096       19       20      1      0
> > FOLIO  0x00014000  0x00015000       4096       20       21      1      0
> > FOLIO  0x00015000  0x00016000       4096       21       22      1      0
> > FOLIO  0x00016000  0x00017000       4096       22       23      1      0
> > FOLIO  0x00017000  0x00018000       4096       23       24      1      0
> > FOLIO  0x00018000  0x00019000       4096       24       25      1      0
> > FOLIO  0x00019000  0x0001a000       4096       25       26      1      0
> > FOLIO  0x0001a000  0x0001b000       4096       26       27      1      0
> > FOLIO  0x0001b000  0x0001c000       4096       27       28      1      0
> > FOLIO  0x0001c000  0x0001d000       4096       28       29      1      0
> > FOLIO  0x0001d000  0x0001e000       4096       29       30      1      0
> > FOLIO  0x0001e000  0x0001f000       4096       30       31      1      0
> > FOLIO  0x0001f000  0x00020000       4096       31       32      1      0
> > FOLIO  0x00020000  0x00024000      16384       32       36      4      2
> > FOLIO  0x00024000  0x00028000      16384       36       40      4      2
> > FOLIO  0x00028000  0x0002c000      16384       40       44      4      2
> > FOLIO  0x0002c000  0x00030000      16384       44       48      4      2
> > FOLIO  0x00030000  0x00034000      16384       48       52      4      2
> > FOLIO  0x00034000  0x00038000      16384       52       56      4      2
> > FOLIO  0x00038000  0x0003c000      16384       56       60      4      2
> > FOLIO  0x0003c000  0x00040000      16384       60       64      4      2
> > FOLIO  0x00040000  0x00050000      65536       64       80     16      4
> > FOLIO  0x00050000  0x00060000      65536       80       96     16      4
> > FOLIO  0x00060000  0x00080000     131072       96      128     32      5
> > FOLIO  0x00080000  0x000a0000     131072      128      160     32      5
> > FOLIO  0x000a0000  0x000c0000     131072      160      192     32      5
> > FOLIO  0x000c0000  0x000e0000     131072      192      224     32      5
> 
> Similar here, do you know why we don't ramp up earlier. Allocating that many
> order-0 + order-2 pages looks a bit suboptimal to me for a sequential read.

Note that this is reading through mmap using the mmap readahead code. If
you use standard read(2), the readahead window starts small as well and
ramps us along with the desired order so we don't allocate that many small
order pages in that case.

								Honza
-- 
Jan Kara <jack at suse.com>
SUSE Labs, CR



More information about the linux-arm-kernel mailing list