Race condition in build_all_zonelists() when offlining movable zone

Michal Hocko mhocko at suse.com
Tue Aug 16 23:38:12 PDT 2022


[Cc Mel, David and Juergen]

On Tue 16-08-22 20:42:50, Patrick Daly wrote:
> System: arm64 with 5.15 based kernel. CONFIG_NUMA=n.
> 
> NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation
> [0] - ZONE_MOVABLE
> [1] - ZONE_NORMAL
> [2] - NULL
> 
> For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the offset of
> ZONE_NORMAL in ac->preferred_zoneref. If a concurrent memory_offline operation
> removes the last page from ZONE_MOVABLE, build_all_zonelists() &
> build_zonerefs_node() will update node_zonelists as shown below. Only
> populated zones are added.
>
> NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation
> [0] - ZONE_NORMAL
> [1] - NULL
> [2] - NULL  
> 
> The thread in alloc_pages_slowpath() will call get_page_from_freelist()
> repeatedly to allocate from the zones in zonelist beginning from
> preferred_zoneref. Since this is now NULL, it will never succeed, and OOM
> killer will kill all killable processes.
>
> I noticed a comment on a recent change bb7645c33869 ("mm, page_alloc:
> fix build_zonerefs_node()") which appeared to be relevant, but later
> replies indicated concerns with performance implications.
> https://lore.kernel.org/linux-mm/Yk7NqTlw7lmFzpKb@dhcp22.suse.cz/

I guess you mean e553f62f10d9 here. After re-reading the discussion I
seem to remember. We've decided to go with a simple fix (the said
commit) but I do not think we have realized this side effect of the
zonelists index invalidating.

In order to address that, we should either have to call first_zones_zonelist
inside get_page_from_freelist if the zoneref doesn't correspond to a
real zone in the zonelist or we should revisit my older approach
referenced above.

Thanks for the report!
-- 
Michal Hocko
SUSE Labs



More information about the linux-arm-kernel mailing list