Race condition in build_all_zonelists() when offlining movable zone

Patrick Daly quic_pdaly at quicinc.com
Tue Aug 16 20:42:50 PDT 2022


System: arm64 with 5.15 based kernel. CONFIG_NUMA=n.

NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation
[0] - ZONE_MOVABLE
[1] - ZONE_NORMAL
[2] - NULL

For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the offset of
ZONE_NORMAL in ac->preferred_zoneref. If a concurrent memory_offline operation
removes the last page from ZONE_MOVABLE, build_all_zonelists() &
build_zonerefs_node() will update node_zonelists as shown below. Only
populated zones are added.

NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation
[0] - ZONE_NORMAL
[1] - NULL
[2] - NULL  

The thread in alloc_pages_slowpath() will call get_page_from_freelist()
repeatedly to allocate from the zones in zonelist beginning from
preferred_zoneref. Since this is now NULL, it will never succeed, and OOM
killer will kill all killable processes.

I noticed a comment on a recent change bb7645c33869
("mm, page_alloc: fix build_zonerefs_node()") which appeared to be relevant,
but later replies indicated concerns with performance implications.
https://lore.kernel.org/linux-mm/Yk7NqTlw7lmFzpKb@dhcp22.suse.cz/




More information about the linux-arm-kernel mailing list