linux-toradex.git/include/linux/memcontrol.h, branch v5.14-rc6

Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu

2021-07-02T00:17:24+00:00

Pull percpu updates from Dennis Zhou:

 - percpu chunk depopulation - depopulate backing pages for chunks with
   empty pages when we exceed a global threshold without those pages.
   This lets us reclaim a portion of memory that would previously be
   lost until the full chunk would be freed (possibly never).

 - memcg accounting cleanup - previously separate chunks were managed
   for normal allocations and __GFP_ACCOUNT allocations. These are now
   consolidated which cleans up the code quite a bit.

 - a few misc clean ups for clang warnings

* 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: optimize locking in pcpu_balance_workfn()
  percpu: initialize best_upa variable
  percpu: rework memcg accounting
  mm, memcg: introduce mem_cgroup_kmem_disabled()
  mm, memcg: mark cgroup_memory_nosocket, nokmem and noswap as __ro_after_init
  percpu: make symbol 'pcpu_free_slot' static
  percpu: implement partial chunk depopulation
  percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1
  percpu: factor out pcpu_check_block_hint()
  percpu: split __pcpu_balance_workfn()
  percpu: fix a comment about the chunks ordering

mm: memcontrol: remove trailing semicolon in macros

2021-06-29T17:53:50+00:00

Macros should not use a trailing semicolon.

Link: https://lkml.kernel.org/r/20210614091530.22117-1-denghuilong@cdjrlc.com
Signed-off-by: Huilong Deng 
Reviewed-by: Andrew Morton 
Reviewed-by: Shakeel Butt 
Cc: Roman Gushchin 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

loop: charge i/o to mem and blk cg

2021-06-29T17:53:50+00:00

The current code only associates with the existing blkcg when aio is used
to access the backing file.  This patch covers all types of i/o to the
backing file and also associates the memcg so if the backing file is on
tmpfs, memory is charged appropriately.

This patch also exports cgroup_get_e_css and int_active_memcg so it can be
used by the loop module.

Link: https://lkml.kernel.org/r/20210610173944.1203706-4-schatzberg.dan@gmail.com
Signed-off-by: Dan Schatzberg 
Acked-by: Johannes Weiner 
Acked-by: Jens Axboe 
Cc: Chris Down 
Cc: Michal Hocko 
Cc: Ming Lei 
Cc: Shakeel Butt 
Cc: Tejun Heo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memcontrol: use flexible-array member

2021-06-29T17:53:50+00:00

Change deprecated zero-length-and-one-element-arrays into flexible array
member.Zero-length and one-element arrays detected by Lukas's CodeChecker.
Zero/one element arrays cause undefined behaviours if sizeof() used.

Link: https://lkml.kernel.org/r/20210518200910.29912-1-wenhui@gwmail.gwu.edu
Signed-off-by: wenhuizhang 
Reviewed-by: Muchun Song 
Acked-by: Michal Hocko 
Cc: Shakeel Butt 
Cc: Johannes Weiner 
Cc: Roman Gushchin 
Cc: Muchun Song 
Cc: Yang Shi 
Cc: Alex Shi 
Cc: Alexander Duyck 
Cc: Wei Yang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: rename lruvec_holds_page_lru_lock to page_matches_lruvec

2021-06-29T17:53:50+00:00

lruvec_holds_page_lru_lock() doesn't check anything about locking and is
used to check whether the page belongs to the lruvec.  So rename it to
page_matches_lruvec().

Link: https://lkml.kernel.org/r/20210417043538.9793-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Acked-by: Michal Hocko 
Acked-by: Johannes Weiner 
Reviewed-by: Shakeel Butt 
Cc: Roman Gushchin 
Cc: Vladimir Davydov 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: simplify lruvec_holds_page_lru_lock

2021-06-29T17:53:50+00:00

We already have a helper lruvec_memcg() to get the memcg from lruvec, we
do not need to do it ourselves in the lruvec_holds_page_lru_lock().  So
use lruvec_memcg() instead.  And if mem_cgroup_disabled() returns false,
the page_memcg(page) (the LRU pages) cannot be NULL.  So remove the odd
logic of "memcg = page_memcg(page) ?  : root_mem_cgroup".  And use
lruvec_pgdat to simplify the code.  We can have a single definition for
this function that works for !CONFIG_MEMCG, CONFIG_MEMCG +
mem_cgroup_disabled() and CONFIG_MEMCG.

Link: https://lkml.kernel.org/r/20210417043538.9793-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Acked-by: Johannes Weiner 
Reviewed-by: Shakeel Butt 
Acked-by: Roman Gushchin 
Acked-by: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec

2021-06-29T17:53:50+00:00

All the callers of mem_cgroup_page_lruvec() just pass page_pgdat(page) as
the 2nd parameter to it (except isolate_migratepages_block()).  But for
isolate_migratepages_block(), the page_pgdat(page) is also equal to the
local variable of @pgdat.  So mem_cgroup_page_lruvec() do not need the
pgdat parameter.  Just remove it to simplify the code.

Link: https://lkml.kernel.org/r/20210417043538.9793-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Acked-by: Johannes Weiner 
Reviewed-by: Shakeel Butt 
Acked-by: Roman Gushchin 
Acked-by: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, memcg: introduce mem_cgroup_kmem_disabled()

2021-06-05T20:41:14+00:00

Introduce a new mem_cgroup_kmem_disabled() helper, similar to
mem_cgroup_disabled(), to check whether the kernel memory accounting
is off. A user could disable it using a boot option to eliminate
some associated costs.

The helper can be used outside of memcontrol.c to dynamically disable
the kmem-related code. The returned value is stable after the kernel
initialization is finished.

Signed-off-by: Roman Gushchin 
Signed-off-by: Dennis Zhou

mm: memcontrol: reparent nr_deferred when memcg offline

2021-05-05T18:27:23+00:00

Now shrinker's nr_deferred is per memcg for memcg aware shrinkers, add
to parent's corresponding nr_deferred when memcg offline.

Link: https://lkml.kernel.org/r/20210311190845.9708-13-shy828301@gmail.com
Signed-off-by: Yang Shi 
Acked-by: Vlastimil Babka 
Acked-by: Kirill Tkhai 
Acked-by: Roman Gushchin 
Reviewed-by: Shakeel Butt 
Cc: Dave Chinner 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: vmscan: add per memcg shrinker nr_deferred

2021-05-05T18:27:23+00:00

Currently the number of deferred objects are per shrinker, but some
slabs, for example, vfs inode/dentry cache are per memcg, this would
result in poor isolation among memcgs.

The deferred objects typically are generated by __GFP_NOFS allocations,
one memcg with excessive __GFP_NOFS allocations may blow up deferred
objects, then other innocent memcgs may suffer from over shrink,
excessive reclaim latency, etc.

For example, two workloads run in memcgA and memcgB respectively,
workload in B is vfs heavy workload.  Workload in A generates excessive
deferred objects, then B's vfs cache might be hit heavily (drop half of
caches) by B's limit reclaim or global reclaim.

We observed this hit in our production environment which was running vfs
heavy workload shown as the below tracing log:

  <...>-409454 [016] .... 28286961.747146: mm_shrink_slab_start: super_cache_scan+0x0/0x1a0 ffff9a83046f3458:
  nid: 1 objects to shrink 3641681686040 gfp_flags GFP_HIGHUSER_MOVABLE|__GFP_ZERO pgs_scanned 1 lru_pgs 15721
  cache items 246404277 delta 31345 total_scan 123202138
  <...>-409454 [022] .... 28287105.928018: mm_shrink_slab_end: super_cache_scan+0x0/0x1a0 ffff9a83046f3458:
  nid: 1 unused scan count 3641681686040 new scan count 3641798379189 total_scan 602
  last shrinker return val 123186855

The vfs cache and page cache ratio was 10:1 on this machine, and half of
caches were dropped.  This also resulted in significant amount of page
caches were dropped due to inodes eviction.

Make nr_deferred per memcg for memcg aware shrinkers would solve the
unfairness and bring better isolation.

The following patch will add nr_deferred to parent memcg when memcg
offline.  To preserve nr_deferred when reparenting memcgs to root, root
memcg needs shrinker_info allocated too.

When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the
shrinker's nr_deferred would be used.  And non memcg aware shrinkers use
shrinker's nr_deferred all the time.

Link: https://lkml.kernel.org/r/20210311190845.9708-10-shy828301@gmail.com
Signed-off-by: Yang Shi 
Acked-by: Roman Gushchin 
Acked-by: Kirill Tkhai 
Reviewed-by: Shakeel Butt 
Cc: Dave Chinner 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds