mm, page_alloc: fix check for NULL preferred_zone

commit ea57485af8f4221312a5a95d63c382b45e7840dc upstream. Patch series "fix premature OOM regression in 4.7+ due to cpuset races". This is v2 of my attempt to fix the recent report based on LTP cpuset stress test [1]. The intention is to go to stable 4.9 LTSS with this, as triggering repeated OOMs is not nice. That's why the patches try to be not too intrusive. Unfortunately why investigating I found that modifying the testcase to use per-VMA policies instead of per-task policies will bring the OOM's back, but that seems to be much older and harder to fix problem. I have posted a RFC [2] but I believe that fixing the recent regressions has a higher priority. Longer-term we might try to think how to fix the cpuset mess in a better and less error prone way. I was for example very surprised to learn, that cpuset updates change not only task->mems_allowed, but also nodemask of mempolicies. Until now I expected the parameter to alloc_pages_nodemask() to be stable. I wonder why do we then treat cpusets specially in get_page_from_freelist() and distinguish HARDWALL etc, when there's unconditional intersection between mempolicy and cpuset. I would expect the nodemask adjustment for saving overhead in g_p_f(), but that clearly doesn't happen in the current form. So we have both crazy complexity and overhead, AFAICS. [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz This patch (of 4): Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") we have a wrong check for NULL preferred_zone, which can theoretically happen due to concurrent cpuset modification. We check the zoneref pointer which is never NULL and we should check the zone pointer. Also document this in first_zones_zonelist() comment per Michal Hocko. Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Vlastimil Babka <vbabka@suse.cz> 2017-01-24 15:18:32 -0800
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2017-02-01 08:33:04 +0100
commit: ade7afe9dca6b13919f88abd38eefe32f22eaeb3 (patch)
tree: 0a81138f31ff4ec6e24e1cf7698c0997486f92ab /mm
parent: 9b1a1ae9b5281e12893a34bcbd1686b5bcd2cd82 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 34ada718ef47..593a11d8bc6b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3763,7 +3763,7 @@ retry_cpuset:
 	 */
 	ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
 					ac.high_zoneidx, ac.nodemask);
-	if (!ac.preferred_zoneref) {
+	if (!ac.preferred_zoneref->zone) {
 		page = NULL;
 		goto no_zone;
 	}
author	Vlastimil Babka <vbabka@suse.cz>	2017-01-24 15:18:32 -0800
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-02-01 08:33:04 +0100
commit	ade7afe9dca6b13919f88abd38eefe32f22eaeb3 (patch)
tree	0a81138f31ff4ec6e24e1cf7698c0997486f92ab /mm
parent	9b1a1ae9b5281e12893a34bcbd1686b5bcd2cd82 (diff)