From 5d833062139d290adb8b62c093b654a01a353448 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Feb 2015 14:58:16 -0800 Subject: mm: numa: do not dereference pmd outside of the lock during NUMA hinting fault Automatic NUMA balancing depends on being able to protect PTEs to trap a fault and gather reference locality information. Very broadly speaking it would mark PTEs as not present and use another bit to distinguish between NUMA hinting faults and other types of faults. It was universally loved by everybody and caused no problems whatsoever. That last sentence might be a lie. This series is very heavily based on patches from Linus and Aneesh to replace the existing PTE/PMD NUMA helper functions with normal change protections. I did alter and add parts of it but I consider them relatively minor contributions. At their suggestion, acked-bys are in there but I've no problem converting them to Signed-off-by if requested. AFAIK, this has received no testing on ppc64 and I'm depending on Aneesh for that. I tested trinity under kvm-tool and passed and ran a few other basic tests. At the time of writing, only the short-lived tests have completed but testing of V2 indicated that long-term testing had no surprises. In most cases I'm leaving out detail as it's not that interesting. specjbb single JVM: There was negligible performance difference in the benchmark itself for short runs. However, system activity is higher and interrupts are much higher over time -- possibly TLB flushes. Migrations are also higher. Overall, this is more overhead but considering the problems faced with the old approach I think we just have to suck it up and find another way of reducing the overhead. specjbb multi JVM: Negligible performance difference to the actual benchmark but like the single JVM case, the system overhead is noticeably higher. Again, interrupts are a major factor. autonumabench: This was all over the place and about all that can be reasonably concluded is that it's different but not necessarily better or worse. autonumabench 3.18.0-rc5 3.18.0-rc5 mmotm-20141119 protnone-v3r3 User NUMA01 32380.24 ( 0.00%) 21642.92 ( 33.16%) User NUMA01_THEADLOCAL 22481.02 ( 0.00%) 22283.22 ( 0.88%) User NUMA02 3137.00 ( 0.00%) 3116.54 ( 0.65%) User NUMA02_SMT 1614.03 ( 0.00%) 1543.53 ( 4.37%) System NUMA01 322.97 ( 0.00%) 1465.89 (-353.88%) System NUMA01_THEADLOCAL 91.87 ( 0.00%) 49.32 ( 46.32%) System NUMA02 37.83 ( 0.00%) 14.61 ( 61.38%) System NUMA02_SMT 7.36 ( 0.00%) 7.45 ( -1.22%) Elapsed NUMA01 716.63 ( 0.00%) 599.29 ( 16.37%) Elapsed NUMA01_THEADLOCAL 553.98 ( 0.00%) 539.94 ( 2.53%) Elapsed NUMA02 83.85 ( 0.00%) 83.04 ( 0.97%) Elapsed NUMA02_SMT 86.57 ( 0.00%) 79.15 ( 8.57%) CPU NUMA01 4563.00 ( 0.00%) 3855.00 ( 15.52%) CPU NUMA01_THEADLOCAL 4074.00 ( 0.00%) 4136.00 ( -1.52%) CPU NUMA02 3785.00 ( 0.00%) 3770.00 ( 0.40%) CPU NUMA02_SMT 1872.00 ( 0.00%) 1959.00 ( -4.65%) System CPU usage of NUMA01 is worse but it's an adverse workload on this machine so I'm reluctant to conclude that it's a problem that matters. On the other workloads that are sensible on this machine, system CPU usage is great. Overall time to complete the benchmark is comparable 3.18.0-rc5 3.18.0-rc5 mmotm-20141119protnone-v3r3 User 59612.50 48586.44 System 460.22 1537.45 Elapsed 1442.20 1304.29 NUMA alloc hit 5075182 5743353 NUMA alloc miss 0 0 NUMA interleave hit 0 0 NUMA alloc local 5075174 5743339 NUMA base PTE updates 637061448 443106883 NUMA huge PMD updates 1243434 864747 NUMA page range updates 1273699656 885857347 NUMA hint faults 1658116 1214277 NUMA hint local faults 959487 754113 NUMA hint local percent 57 62 NUMA pages migrated 5467056 61676398 The NUMA pages migrated look terrible but when I looked at a graph of the activity over time I see that the massive spike in migration activity was during NUMA01. This correlates with high system CPU usage and could be simply down to bad luck but any modifications that affect that workload would be related to scan rates and migrations, not the protection mechanism. For all other workloads, migration activity was comparable. Overall, headline performance figures are comparable but the overhead is higher, mostly in interrupts. To some extent, higher overhead from this approach was anticipated but not to this degree. It's going to be necessary to reduce this again with a separate series in the future. It's still worth going ahead with this series though as it's likely to avoid constant headaches with Xen and is probably easier to maintain. This patch (of 10): A transhuge NUMA hinting fault may find the page is migrating and should wait until migration completes. The check is race-prone because the pmd is deferenced outside of the page lock and while the race is tiny, it'll be larger if the PMD is cleared while marking PMDs for hinting fault. This patch closes the race. Signed-off-by: Mel Gorman Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Dave Jones Cc: Hugh Dickins Cc: Ingo Molnar Cc: Kirill Shutemov Cc: Linus Torvalds Cc: Paul Mackerras Cc: Rik van Riel Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/migrate.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index fab9b32ace8e..78baed5f2952 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -67,7 +67,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #ifdef CONFIG_NUMA_BALANCING extern bool pmd_trans_migrating(pmd_t pmd); -extern void wait_migrate_huge_page(struct anon_vma *anon_vma, pmd_t *pmd); extern int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node); extern bool migrate_ratelimited(int node); @@ -76,9 +75,6 @@ static inline bool pmd_trans_migrating(pmd_t pmd) { return false; } -static inline void wait_migrate_huge_page(struct anon_vma *anon_vma, pmd_t *pmd) -{ -} static inline int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node) { -- cgit v1.2.3 From 4d9424669946532be754a6e116618dcb58430cb4 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Feb 2015 14:58:28 -0800 Subject: mm: convert p[te|md]_mknonnuma and remaining page table manipulations With PROT_NONE, the traditional page table manipulation functions are sufficient. [andre.przywara@arm.com: fix compiler warning in pmdp_invalidate()] [akpm@linux-foundation.org: fix build with STRICT_MM_TYPECHECKS] Signed-off-by: Mel Gorman Acked-by: Linus Torvalds Acked-by: Aneesh Kumar Tested-by: Sasha Levin Cc: Benjamin Herrenschmidt Cc: Dave Jones Cc: Hugh Dickins Cc: Ingo Molnar Cc: Kirill Shutemov Cc: Paul Mackerras Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/huge_mm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f10b20f05159..062bd252e994 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -31,8 +31,7 @@ extern int move_huge_pmd(struct vm_area_struct *vma, unsigned long new_addr, unsigned long old_end, pmd_t *old_pmd, pmd_t *new_pmd); extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, pgprot_t newprot, - int prot_numa); + unsigned long addr, pgprot_t newprot); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_FLAG, -- cgit v1.2.3 From 21d9ee3eda7792c45880b2f11bff8e95c9a061fb Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Feb 2015 14:58:32 -0800 Subject: mm: remove remaining references to NUMA hinting bits and helpers This patch removes the NUMA PTE bits and associated helpers. As a side-effect it increases the maximum possible swap space on x86-64. One potential source of problems is races between the marking of PTEs PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that a PTE being protected is not faulted in parallel, seen as a pte_none and corrupting memory. The base case is safe but transhuge has problems in the past due to an different migration mechanism and a dependance on page lock to serialise migrations and warrants a closer look. task_work hinting update parallel fault ------------------------ -------------- change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault pmd_none do_huge_pmd_anonymous_page read? pmd_lock blocks until hinting complete, fail !pmd_none test write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none pmd_modify set_pmd_at task_work hinting update parallel migration ------------------------ ------------------ change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault do_huge_pmd_numa_page migrate_misplaced_transhuge_page pmd_lock waits for updates to complete, recheck pmd_same pmd_modify set_pmd_at Both of those are safe and the case where a transhuge page is inserted during a protection update is unchanged. The case where two processes try migrating at the same time is unchanged by this series so should still be ok. I could not find a case where we are accidentally depending on the PTE not being cleared and flushed. If one is missed, it'll manifest as corruption problems that start triggering shortly after this series is merged and only happen when NUMA balancing is enabled. Signed-off-by: Mel Gorman Tested-by: Sasha Levin Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Dave Jones Cc: Hugh Dickins Cc: Ingo Molnar Cc: Kirill Shutemov Cc: Linus Torvalds Cc: Paul Mackerras Cc: Rik van Riel Cc: Mark Brown Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/swapops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 831a3168ab35..cedf3d3c373f 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -54,7 +54,7 @@ static inline pgoff_t swp_offset(swp_entry_t entry) /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { - return !pte_none(pte) && !pte_present_nonuma(pte); + return !pte_none(pte) && !pte_present(pte); } #endif -- cgit v1.2.3 From e944fd67b625c02bda4a78ddf85e413c5e401474 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Feb 2015 14:58:35 -0800 Subject: mm: numa: do not trap faults on the huge zero page Faults on the huge zero page are pointless and there is a BUG_ON to catch them during fault time. This patch reintroduces a check that avoids marking the zero page PAGE_NONE. Signed-off-by: Mel Gorman Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Dave Jones Cc: Hugh Dickins Cc: Ingo Molnar Cc: Kirill Shutemov Cc: Linus Torvalds Cc: Paul Mackerras Cc: Rik van Riel Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/huge_mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 062bd252e994..f10b20f05159 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -31,7 +31,8 @@ extern int move_huge_pmd(struct vm_area_struct *vma, unsigned long new_addr, unsigned long old_end, pmd_t *old_pmd, pmd_t *new_pmd); extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, pgprot_t newprot); + unsigned long addr, pgprot_t newprot, + int prot_numa); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_FLAG, -- cgit v1.2.3 From 503c358cf1925853195ee39ec437e51138bbb7df Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:58:47 -0800 Subject: list_lru: introduce list_lru_shrink_{count,walk} Kmem accounting of memcg is unusable now, because it lacks slab shrinker support. That means when we hit the limit we will get ENOMEM w/o any chance to recover. What we should do then is to call shrink_slab, which would reclaim old inode/dentry caches from this cgroup. This is what this patch set is intended to do. Basically, it does two things. First, it introduces the notion of per-memcg slab shrinker. A shrinker that wants to reclaim objects per cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be passed the memory cgroup to scan from in shrink_control->memcg. For such shrinkers shrink_slab iterates over the whole cgroup subtree under the target cgroup and calls the shrinker for each kmem-active memory cgroup. Secondly, this patch set makes the list_lru structure per-memcg. It's done transparently to list_lru users - everything they have to do is to tell list_lru_init that they want memcg-aware list_lru. Then the list_lru will automatically distribute objects among per-memcg lists basing on which cgroup the object is accounted to. This way to make FS shrinkers (icache, dcache) memcg-aware we only need to make them use memcg-aware list_lru, and this is what this patch set does. As before, this patch set only enables per-memcg kmem reclaim when the pressure goes from memory.limit, not from memory.kmem.limit. Handling memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and it is still unclear whether we will have this knob in the unified hierarchy. This patch (of 9): NUMA aware slab shrinkers use the list_lru structure to distribute objects coming from different NUMA nodes to different lists. Whenever such a shrinker needs to count or scan objects from a particular node, it issues commands like this: count = list_lru_count_node(lru, sc->nid); freed = list_lru_walk_node(lru, sc->nid, isolate_func, isolate_arg, &sc->nr_to_scan); where sc is an instance of the shrink_control structure passed to it from vmscan. To simplify this, let's add special list_lru functions to be used by shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which consolidate the nid and nr_to_scan arguments in the shrink_control structure. This will also allow us to avoid patching shrinkers that use list_lru when we make shrink_slab() per-memcg - all we will have to do is extend the shrink_control structure to include the target memcg and make list_lru_shrink_{count,walk} handle this appropriately. Signed-off-by: Vladimir Davydov Suggested-by: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index f3434533fbf8..f500a2e39b13 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -9,6 +9,7 @@ #include #include +#include /* list_lru_walk_cb has to always return one of those */ enum lru_status { @@ -81,6 +82,13 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item); * Callers that want such a guarantee need to provide an outer lock. */ unsigned long list_lru_count_node(struct list_lru *lru, int nid); + +static inline unsigned long list_lru_shrink_count(struct list_lru *lru, + struct shrink_control *sc) +{ + return list_lru_count_node(lru, sc->nid); +} + static inline unsigned long list_lru_count(struct list_lru *lru) { long count = 0; @@ -119,6 +127,14 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk); +static inline unsigned long +list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc, + list_lru_walk_cb isolate, void *cb_arg) +{ + return list_lru_walk_node(lru, sc->nid, isolate, cb_arg, + &sc->nr_to_scan); +} + static inline unsigned long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, void *cb_arg, unsigned long nr_to_walk) -- cgit v1.2.3 From 4101b624352fddb5ed72e7a1b6f8be8cffaa20fa Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:58:51 -0800 Subject: fs: consolidate {nr,free}_cached_objects args in shrink_control We are going to make FS shrinkers memcg-aware. To achieve that, we will have to pass the memcg to scan to the nr_cached_objects and free_cached_objects VFS methods, which currently take only the NUMA node to scan. Since the shrink_control structure already holds the node, and the memcg to scan will be added to it when we introduce memcg-aware vmscan, let us consolidate the methods' arguments in this structure to keep things clean. Signed-off-by: Vladimir Davydov Suggested-by: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/fs.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index cdcb1e9d9613..a20d6586e517 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1635,8 +1635,10 @@ struct super_operations { struct dquot **(*get_dquots)(struct inode *); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); - long (*nr_cached_objects)(struct super_block *, int); - long (*free_cached_objects)(struct super_block *, long, int); + long (*nr_cached_objects)(struct super_block *, + struct shrink_control *); + long (*free_cached_objects)(struct super_block *, + struct shrink_control *); }; /* -- cgit v1.2.3 From cb731d6c62bbc2f890b08ea3d0386d5dad887326 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:58:54 -0800 Subject: vmscan: per memory cgroup slab shrinkers This patch adds SHRINKER_MEMCG_AWARE flag. If a shrinker has this flag set, it will be called per memory cgroup. The memory cgroup to scan objects from is passed in shrink_control->memcg. If the memory cgroup is NULL, a memcg aware shrinker is supposed to scan objects from the global list. Unaware shrinkers are only called on global pressure with memcg=NULL. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 7 +++++++ include/linux/mm.h | 5 ++--- include/linux/shrinker.h | 6 +++++- 3 files changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6cfd934c7c9b..54992fe0959f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -413,6 +413,8 @@ static inline bool memcg_kmem_enabled(void) return static_key_false(&memcg_kmem_enabled_key); } +bool memcg_kmem_is_active(struct mem_cgroup *memcg); + /* * In general, we'll do everything in our power to not incur in any overhead * for non-memcg users for the kmem functions. Not even a function call, if we @@ -542,6 +544,11 @@ static inline bool memcg_kmem_enabled(void) return false; } +static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg) +{ + return false; +} + static inline bool memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) { diff --git a/include/linux/mm.h b/include/linux/mm.h index a4d24f3c5430..af4ff88a11e0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2168,9 +2168,8 @@ int drop_caches_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); #endif -unsigned long shrink_node_slabs(gfp_t gfp_mask, int nid, - unsigned long nr_scanned, - unsigned long nr_eligible); +void drop_slab(void); +void drop_slab_node(int nid); #ifndef CONFIG_MMU #define randomize_va_space 0 diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index f4aee75f00b1..4fcacd915d45 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -20,6 +20,9 @@ struct shrink_control { /* current node being shrunk (for NUMA aware shrinkers) */ int nid; + + /* current memcg being shrunk (for memcg aware shrinkers) */ + struct mem_cgroup *memcg; }; #define SHRINK_STOP (~0UL) @@ -61,7 +64,8 @@ struct shrinker { #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */ /* Flags */ -#define SHRINKER_NUMA_AWARE (1 << 0) +#define SHRINKER_NUMA_AWARE (1 << 0) +#define SHRINKER_MEMCG_AWARE (1 << 1) extern int register_shrinker(struct shrinker *); extern void unregister_shrinker(struct shrinker *); -- cgit v1.2.3 From dbcf73e26cd0b3d66e6db65ab595e664a55e58ff Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:58:57 -0800 Subject: memcg: rename some cache id related variables memcg_limited_groups_array_size, which defines the size of memcg_caches arrays, sounds rather cumbersome. Also it doesn't point anyhow that it's related to kmem/caches stuff. So let's rename it to memcg_nr_cache_ids. It's concise and points us directly to memcg_cache_id. Also, rename kmem_limited_groups to memcg_cache_ida. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 54992fe0959f..2607c91230af 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -398,7 +398,7 @@ static inline void sock_release_memcg(struct sock *sk) #ifdef CONFIG_MEMCG_KMEM extern struct static_key memcg_kmem_enabled_key; -extern int memcg_limited_groups_array_size; +extern int memcg_nr_cache_ids; /* * Helper macro to loop through all memcg-specific caches. Callers must still @@ -406,7 +406,7 @@ extern int memcg_limited_groups_array_size; * the slab_mutex must be held when looping through those caches */ #define for_each_memcg_cache_index(_idx) \ - for ((_idx) = 0; (_idx) < memcg_limited_groups_array_size; (_idx)++) + for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++) static inline bool memcg_kmem_enabled(void) { -- cgit v1.2.3 From 05257a1a3dcc196c197714b5c9a8dd35b7f6aefc Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:01 -0800 Subject: memcg: add rwsem to synchronize against memcg_caches arrays relocation We need a stable value of memcg_nr_cache_ids in kmem_cache_create() (memcg_alloc_cache_params() wants it for root caches), where we only hold the slab_mutex and no memcg-related locks. As a result, we have to update memcg_nr_cache_ids under the slab_mutex, which we can only take on the slab's side (see memcg_update_array_size). This looks awkward and will become even worse when per-memcg list_lru is introduced, which also wants stable access to memcg_nr_cache_ids. To get rid of this dependency between the memcg_nr_cache_ids and the slab_mutex, this patch introduces a special rwsem. The rwsem is held for writing during memcg_caches arrays relocation and memcg_nr_cache_ids updates. Therefore one can take it for reading to get a stable access to memcg_caches arrays and/or memcg_nr_cache_ids. Currently the semaphore is taken for reading only from kmem_cache_create, right before taking the slab_mutex, so right now there's no much point in using rwsem instead of mutex. However, once list_lru is made per-memcg it will allow list_lru initializations to proceed concurrently. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 2607c91230af..dbc4baa3619c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -399,6 +399,8 @@ static inline void sock_release_memcg(struct sock *sk) extern struct static_key memcg_kmem_enabled_key; extern int memcg_nr_cache_ids; +extern void memcg_get_cache_ids(void); +extern void memcg_put_cache_ids(void); /* * Helper macro to loop through all memcg-specific caches. Callers must still @@ -434,8 +436,6 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order); int memcg_cache_id(struct mem_cgroup *memcg); -void memcg_update_array_size(int num_groups); - struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep); void __memcg_kmem_put_cache(struct kmem_cache *cachep); @@ -569,6 +569,14 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) return -1; } +static inline void memcg_get_cache_ids(void) +{ +} + +static inline void memcg_put_cache_ids(void) +{ +} + static inline struct kmem_cache * memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) { -- cgit v1.2.3 From ff0b67ef5b1687692bc1fd3ce4bc3d1ff83587c7 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:04 -0800 Subject: list_lru: get rid of ->active_nodes The active_nodes mask allows us to skip empty nodes when walking over list_lru items from all nodes in list_lru_count/walk. However, these functions are never called from hot paths, so it doesn't seem we need such kind of optimization there. OTOH, removing the mask will make it easier to make list_lru per-memcg. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index f500a2e39b13..53c1d6b78270 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -31,7 +31,6 @@ struct list_lru_node { struct list_lru { struct list_lru_node *node; - nodemask_t active_nodes; }; void list_lru_destroy(struct list_lru *lru); @@ -94,7 +93,7 @@ static inline unsigned long list_lru_count(struct list_lru *lru) long count = 0; int nid; - for_each_node_mask(nid, lru->active_nodes) + for_each_node_state(nid, N_NORMAL_MEMORY) count += list_lru_count_node(lru, nid); return count; @@ -142,7 +141,7 @@ list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, long isolated = 0; int nid; - for_each_node_mask(nid, lru->active_nodes) { + for_each_node_state(nid, N_NORMAL_MEMORY) { isolated += list_lru_walk_node(lru, nid, isolate, cb_arg, &nr_to_walk); if (nr_to_walk <= 0) -- cgit v1.2.3 From c0a5b560938a0f2fd2fbf66ddc446c7c2b41383a Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:07 -0800 Subject: list_lru: organize all list_lrus to list To make list_lru memcg aware, we need all list_lrus to be kept on a list protected by a mutex, so that we could sleep while walking over the list. Therefore after this change list_lru_destroy may sleep. Fortunately, there is only one user that calls it from an atomic context - it's put_super - and we can easily fix it by calling list_lru_destroy before put_super in destroy_locked_super - anyway we don't longer need lrus by that time. Another point that should be noted is that list_lru_destroy is allowed to be called on an uninitialized zeroed-out object, in which case it is a no-op. Before this patch this was guaranteed by kfree, but now we need an explicit check there. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 53c1d6b78270..ee9486ac0621 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -31,6 +31,9 @@ struct list_lru_node { struct list_lru { struct list_lru_node *node; +#ifdef CONFIG_MEMCG_KMEM + struct list_head list; +#endif }; void list_lru_destroy(struct list_lru *lru); -- cgit v1.2.3 From 60d3fd32a7a9da4c8c93a9f89cfda22a0b4c65ce Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:10 -0800 Subject: list_lru: introduce per-memcg lists There are several FS shrinkers, including super_block::s_shrink, that keep reclaimable objects in the list_lru structure. Hence to turn them to memcg-aware shrinkers, it is enough to make list_lru per-memcg. This patch does the trick. It adds an array of lru lists to the list_lru_node structure (per-node part of the list_lru), one for each kmem-active memcg, and dispatches every item addition or removal to the list corresponding to the memcg which the item is accounted to. So now the list_lru structure is not just per node, but per node and per memcg. Not all list_lrus need this feature, so this patch also adds a new method, list_lru_init_memcg, which initializes a list_lru as memcg aware. Otherwise (i.e. if initialized with old list_lru_init), the list_lru won't have per memcg lists. Just like per memcg caches arrays, the arrays of per-memcg lists are indexed by memcg_cache_id, so we must grow them whenever memcg_nr_cache_ids is increased. So we introduce a callback, memcg_update_all_list_lrus, invoked by memcg_alloc_cache_id if the id space is full. The locking is implemented in a manner similar to lruvecs, i.e. we have one lock per node that protects all lists (both global and per cgroup) on the node. Signed-off-by: Vladimir Davydov Cc: Dave Chinner Cc: Johannes Weiner Cc: Michal Hocko Cc: Greg Thelen Cc: Glauber Costa Cc: Alexander Viro Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 52 +++++++++++++++++++++++++++++++++++----------- include/linux/memcontrol.h | 14 +++++++++++++ 2 files changed, 54 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index ee9486ac0621..305b598abac2 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -11,6 +11,8 @@ #include #include +struct mem_cgroup; + /* list_lru_walk_cb has to always return one of those */ enum lru_status { LRU_REMOVED, /* item removed from list */ @@ -22,11 +24,26 @@ enum lru_status { internally, but has to return locked. */ }; -struct list_lru_node { - spinlock_t lock; +struct list_lru_one { struct list_head list; /* kept as signed so we can catch imbalance bugs */ long nr_items; +}; + +struct list_lru_memcg { + /* array of per cgroup lists, indexed by memcg_cache_id */ + struct list_lru_one *lru[0]; +}; + +struct list_lru_node { + /* protects all lists on the node, including per cgroup */ + spinlock_t lock; + /* global list, used for the root cgroup in cgroup aware lrus */ + struct list_lru_one lru; +#ifdef CONFIG_MEMCG_KMEM + /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ + struct list_lru_memcg *memcg_lrus; +#endif } ____cacheline_aligned_in_smp; struct list_lru { @@ -37,11 +54,14 @@ struct list_lru { }; void list_lru_destroy(struct list_lru *lru); -int list_lru_init_key(struct list_lru *lru, struct lock_class_key *key); -static inline int list_lru_init(struct list_lru *lru) -{ - return list_lru_init_key(lru, NULL); -} +int __list_lru_init(struct list_lru *lru, bool memcg_aware, + struct lock_class_key *key); + +#define list_lru_init(lru) __list_lru_init((lru), false, NULL) +#define list_lru_init_key(lru, key) __list_lru_init((lru), false, (key)) +#define list_lru_init_memcg(lru) __list_lru_init((lru), true, NULL) + +int memcg_update_all_list_lrus(int num_memcgs); /** * list_lru_add: add an element to the lru list's tail @@ -75,20 +95,23 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item); bool list_lru_del(struct list_lru *lru, struct list_head *item); /** - * list_lru_count_node: return the number of objects currently held by @lru + * list_lru_count_one: return the number of objects currently held by @lru * @lru: the lru pointer. * @nid: the node id to count from. + * @memcg: the cgroup to count from. * * Always return a non-negative number, 0 for empty lists. There is no * guarantee that the list is not updated while the count is being computed. * Callers that want such a guarantee need to provide an outer lock. */ +unsigned long list_lru_count_one(struct list_lru *lru, + int nid, struct mem_cgroup *memcg); unsigned long list_lru_count_node(struct list_lru *lru, int nid); static inline unsigned long list_lru_shrink_count(struct list_lru *lru, struct shrink_control *sc) { - return list_lru_count_node(lru, sc->nid); + return list_lru_count_one(lru, sc->nid, sc->memcg); } static inline unsigned long list_lru_count(struct list_lru *lru) @@ -105,9 +128,10 @@ static inline unsigned long list_lru_count(struct list_lru *lru) typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg); /** - * list_lru_walk_node: walk a list_lru, isolating and disposing freeable items. + * list_lru_walk_one: walk a list_lru, isolating and disposing freeable items. * @lru: the lru pointer. * @nid: the node id to scan from. + * @memcg: the cgroup to scan from. * @isolate: callback function that is resposible for deciding what to do with * the item currently being scanned * @cb_arg: opaque type that will be passed to @isolate @@ -125,6 +149,10 @@ typedef enum lru_status * * Return value: the number of objects effectively removed from the LRU. */ +unsigned long list_lru_walk_one(struct list_lru *lru, + int nid, struct mem_cgroup *memcg, + list_lru_walk_cb isolate, void *cb_arg, + unsigned long *nr_to_walk); unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk); @@ -133,8 +161,8 @@ static inline unsigned long list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc, list_lru_walk_cb isolate, void *cb_arg) { - return list_lru_walk_node(lru, sc->nid, isolate, cb_arg, - &sc->nr_to_scan); + return list_lru_walk_one(lru, sc->nid, sc->memcg, isolate, cb_arg, + &sc->nr_to_scan); } static inline unsigned long diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dbc4baa3619c..72dff5fb0d0c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -439,6 +439,8 @@ int memcg_cache_id(struct mem_cgroup *memcg); struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep); void __memcg_kmem_put_cache(struct kmem_cache *cachep); +struct mem_cgroup *__mem_cgroup_from_kmem(void *ptr); + int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, unsigned long nr_pages); void memcg_uncharge_kmem(struct mem_cgroup *memcg, unsigned long nr_pages); @@ -535,6 +537,13 @@ static __always_inline void memcg_kmem_put_cache(struct kmem_cache *cachep) if (memcg_kmem_enabled()) __memcg_kmem_put_cache(cachep); } + +static __always_inline struct mem_cgroup *mem_cgroup_from_kmem(void *ptr) +{ + if (!memcg_kmem_enabled()) + return NULL; + return __mem_cgroup_from_kmem(ptr); +} #else #define for_each_memcg_cache_index(_idx) \ for (; NULL; ) @@ -586,6 +595,11 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) static inline void memcg_kmem_put_cache(struct kmem_cache *cachep) { } + +static inline struct mem_cgroup *mem_cgroup_from_kmem(void *ptr) +{ + return NULL; +} #endif /* CONFIG_MEMCG_KMEM */ #endif /* _LINUX_MEMCONTROL_H */ -- cgit v1.2.3 From f7ce3190c4a35bf887adb7a1aa1ba899b679872d Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:20 -0800 Subject: slab: embed memcg_cache_params to kmem_cache Currently, kmem_cache stores a pointer to struct memcg_cache_params instead of embedding it. The rationale is to save memory when kmem accounting is disabled. However, the memcg_cache_params has shrivelled drastically since it was first introduced: * Initially: struct memcg_cache_params { bool is_root_cache; union { struct kmem_cache *memcg_caches[0]; struct { struct mem_cgroup *memcg; struct list_head list; struct kmem_cache *root_cache; bool dead; atomic_t nr_pages; struct work_struct destroy; }; }; }; * Now: struct memcg_cache_params { bool is_root_cache; union { struct { struct rcu_head rcu_head; struct kmem_cache *memcg_caches[0]; }; struct { struct mem_cgroup *memcg; struct kmem_cache *root_cache; }; }; }; So the memory saving does not seem to be a clear win anymore. OTOH, keeping a pointer to memcg_cache_params struct instead of embedding it results in touching one more cache line on kmem alloc/free hot paths. Besides, it makes linking kmem caches in a list chained by a field of struct memcg_cache_params really painful due to a level of indirection, while I want to make them linked in the following patch. That said, let us embed it. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Dave Chinner Cc: Dan Carpenter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/slab.h | 17 +++++++---------- include/linux/slab_def.h | 2 +- include/linux/slub_def.h | 2 +- 3 files changed, 9 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 2e3b448cfa2d..1e03c11bbfbd 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -473,14 +473,14 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) #ifndef ARCH_SLAB_MINALIGN #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long) #endif + +struct memcg_cache_array { + struct rcu_head rcu; + struct kmem_cache *entries[0]; +}; + /* * This is the main placeholder for memcg-related information in kmem caches. - * struct kmem_cache will hold a pointer to it, so the memory cost while - * disabled is 1 pointer. The runtime cost while enabled, gets bigger than it - * would otherwise be if that would be bundled in kmem_cache: we'll need an - * extra pointer chase. But the trade off clearly lays in favor of not - * penalizing non-users. - * * Both the root cache and the child caches will have it. For the root cache, * this will hold a dynamically allocated array large enough to hold * information about the currently limited memcgs in the system. To allow the @@ -495,10 +495,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) struct memcg_cache_params { bool is_root_cache; union { - struct { - struct rcu_head rcu_head; - struct kmem_cache *memcg_caches[0]; - }; + struct memcg_cache_array __rcu *memcg_caches; struct { struct mem_cgroup *memcg; struct kmem_cache *root_cache; diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index b869d1662ba3..33d049066c3d 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -70,7 +70,7 @@ struct kmem_cache { int obj_offset; #endif /* CONFIG_DEBUG_SLAB */ #ifdef CONFIG_MEMCG_KMEM - struct memcg_cache_params *memcg_params; + struct memcg_cache_params memcg_params; #endif struct kmem_cache_node *node[MAX_NUMNODES]; diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index d82abd40a3c0..9abf04ed0999 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -85,7 +85,7 @@ struct kmem_cache { struct kobject kobj; /* For sysfs */ #endif #ifdef CONFIG_MEMCG_KMEM - struct memcg_cache_params *memcg_params; + struct memcg_cache_params memcg_params; int max_attr_size; /* for propagation, maximum size of a stored attr */ #ifdef CONFIG_SYSFS struct kset *memcg_kset; -- cgit v1.2.3 From 426589f571f7d6d5ab2ca33ece73164149279ca1 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:23 -0800 Subject: slab: link memcg caches of the same kind into a list Sometimes, we need to iterate over all memcg copies of a particular root kmem cache. Currently, we use memcg_cache_params->memcg_caches array for that, because it contains all existing memcg caches. However, it's a bad practice to keep all caches, including those that belong to offline cgroups, in this array, because it will be growing beyond any bounds then. I'm going to wipe away dead caches from it to save space. To still be able to perform iterations over all memcg caches of the same kind, let us link them into a list. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/slab.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 1e03c11bbfbd..26d99f41b410 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -491,9 +491,13 @@ struct memcg_cache_array { * * @memcg: pointer to the memcg this cache belongs to * @root_cache: pointer to the global, root cache, this cache was derived from + * + * Both root and child caches of the same kind are linked into a list chained + * through @list. */ struct memcg_cache_params { bool is_root_cache; + struct list_head list; union { struct memcg_cache_array __rcu *memcg_caches; struct { -- cgit v1.2.3 From 2a4db7eb9391a544ff58f4fa11d35246e87c87af Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:32 -0800 Subject: memcg: free memcg_caches slot on css offline We need to look up a kmem_cache in ->memcg_params.memcg_caches arrays only on allocations, so there is no need to have the array entries set until css free - we can clear them on css offline. This will allow us to reuse array entries more efficiently and avoid costly array relocations. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/slab.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 26d99f41b410..ed2ffaab59ea 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -115,13 +115,12 @@ int slab_is_available(void); struct kmem_cache *kmem_cache_create(const char *, size_t, size_t, unsigned long, void (*)(void *)); -#ifdef CONFIG_MEMCG_KMEM -void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); -void memcg_destroy_kmem_caches(struct mem_cgroup *); -#endif void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); -void kmem_cache_free(struct kmem_cache *, void *); + +void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); +void memcg_deactivate_kmem_caches(struct mem_cgroup *); +void memcg_destroy_kmem_caches(struct mem_cgroup *); /* * Please use this macro to create slab caches. Simply specify the @@ -288,6 +287,7 @@ static __always_inline int kmalloc_index(size_t size) void *__kmalloc(size_t size, gfp_t flags); void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags); +void kmem_cache_free(struct kmem_cache *, void *); #ifdef CONFIG_NUMA void *__kmalloc_node(size_t size, gfp_t flags, int node); -- cgit v1.2.3 From 3f97b163207c67a3b35931494ad3db1de66356f0 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:35 -0800 Subject: list_lru: add helpers to isolate items Currently, the isolate callback passed to the list_lru_walk family of functions is supposed to just delete an item from the list upon returning LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by __list_lru_walk_one after the callback returns. Since the callback is allowed to drop the lock after removing an item (it has to return LRU_REMOVED_RETRY then), the nr_items can be less than the actual number of elements on the list even if we check them under the lock. This makes it difficult to move items from one list_lru_one to another, which is required for per-memcg list_lru reparenting - we can't just splice the lists, we have to move entries one by one. This patch therefore introduces helpers that must be used by callback functions to isolate items instead of raw list_del/list_move. These are list_lru_isolate and list_lru_isolate_move. They not only remove the entry from the list, but also fix the nr_items counter, making sure nr_items always reflects the actual number of elements on the list if checked under the appropriate lock. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 305b598abac2..7edf9c9ab9eb 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -125,8 +125,13 @@ static inline unsigned long list_lru_count(struct list_lru *lru) return count; } -typedef enum lru_status -(*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg); +void list_lru_isolate(struct list_lru_one *list, struct list_head *item); +void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item, + struct list_head *head); + +typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item, + struct list_lru_one *list, spinlock_t *lock, void *cb_arg); + /** * list_lru_walk_one: walk a list_lru, isolating and disposing freeable items. * @lru: the lru pointer. -- cgit v1.2.3 From 2788cf0c401c268b4819c5407493a8769b7007aa Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 12 Feb 2015 14:59:38 -0800 Subject: memcg: reparent list_lrus and free kmemcg_id on css offline Now, the only reason to keep kmemcg_id till css free is list_lru, which uses it to distribute elements between per-memcg lists. However, it can be easily sorted out - we only need to change kmemcg_id of an offline cgroup to its parent's id, making further list_lru_add()'s add elements to the parent's list, and then move all elements from the offline cgroup's list to the one of its parent. It will work, because a racing list_lru_del() does not need to know the list it is deleting the element from. It can decrement the wrong nr_items counter though, but the ongoing reparenting will fix it. After list_lru reparenting is done we are free to release kmemcg_id saving a valuable slot in a per-memcg array for new cgroups. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 7edf9c9ab9eb..2a6b9947aaa3 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -26,7 +26,7 @@ enum lru_status { struct list_lru_one { struct list_head list; - /* kept as signed so we can catch imbalance bugs */ + /* may become negative during memcg reparenting */ long nr_items; }; @@ -62,6 +62,7 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, #define list_lru_init_memcg(lru) __list_lru_init((lru), true, NULL) int memcg_update_all_list_lrus(int num_memcgs); +void memcg_drain_all_list_lrus(int src_idx, int dst_idx); /** * list_lru_add: add an element to the lru list's tail -- cgit v1.2.3 From 2d2f5119b8bb057595e18f5b2f07aa097ea1b233 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 12 Feb 2015 14:59:59 -0800 Subject: mm: do not use mm->nr_pmds on !MMU configurations mm->nr_pmds doesn't make sense on !MMU configurations Signed-off-by: Kirill A. Shutemov Cc: Guenter Roeck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index af4ff88a11e0..bd52e2f14027 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1447,13 +1447,15 @@ static inline int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address); #endif -#ifdef __PAGETABLE_PMD_FOLDED +#if defined(__PAGETABLE_PMD_FOLDED) || !defined(CONFIG_MMU) static inline int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) { return 0; } +static inline void mm_nr_pmds_init(struct mm_struct *mm) {} + static inline unsigned long mm_nr_pmds(struct mm_struct *mm) { return 0; @@ -1465,6 +1467,11 @@ static inline void mm_dec_nr_pmds(struct mm_struct *mm) {} #else int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address); +static inline void mm_nr_pmds_init(struct mm_struct *mm) +{ + atomic_long_set(&mm->nr_pmds, 0); +} + static inline unsigned long mm_nr_pmds(struct mm_struct *mm) { return atomic_long_read(&mm->nr_pmds); -- cgit v1.2.3 From 3eba0c6a56c04f2b017b43641a821f1ebfb7fb4c Mon Sep 17 00:00:00 2001 From: Ganesh Mahendran Date: Thu, 12 Feb 2015 15:00:51 -0800 Subject: mm/zpool: add name argument to create zpool Currently the underlay of zpool: zsmalloc/zbud, do not know who creates them. There is not a method to let zsmalloc/zbud find which caller they belong to. Now we want to add statistics collection in zsmalloc. We need to name the debugfs dir for each pool created. The way suggested by Minchan Kim is to use a name passed by caller(such as zram) to create the zsmalloc pool. /sys/kernel/debug/zsmalloc/zram0 This patch adds an argument `name' to zs_create_pool() and other related functions. Signed-off-by: Ganesh Mahendran Acked-by: Minchan Kim Cc: Seth Jennings Cc: Nitin Gupta Cc: Dan Streetman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/zpool.h | 5 +++-- include/linux/zsmalloc.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/zpool.h b/include/linux/zpool.h index f14bd75f08b3..56529b34dc63 100644 --- a/include/linux/zpool.h +++ b/include/linux/zpool.h @@ -36,7 +36,8 @@ enum zpool_mapmode { ZPOOL_MM_DEFAULT = ZPOOL_MM_RW }; -struct zpool *zpool_create_pool(char *type, gfp_t gfp, struct zpool_ops *ops); +struct zpool *zpool_create_pool(char *type, char *name, + gfp_t gfp, struct zpool_ops *ops); char *zpool_get_type(struct zpool *pool); @@ -80,7 +81,7 @@ struct zpool_driver { atomic_t refcount; struct list_head list; - void *(*create)(gfp_t gfp, struct zpool_ops *ops); + void *(*create)(char *name, gfp_t gfp, struct zpool_ops *ops); void (*destroy)(void *pool); int (*malloc)(void *pool, size_t size, gfp_t gfp, diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 05c214760977..3283c6a55425 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -36,7 +36,7 @@ enum zs_mapmode { struct zs_pool; -struct zs_pool *zs_create_pool(gfp_t flags); +struct zs_pool *zs_create_pool(char *name, gfp_t flags); void zs_destroy_pool(struct zs_pool *pool); unsigned long zs_malloc(struct zs_pool *pool, size_t size); -- cgit v1.2.3 From 695f055936938c674473ea071ca7359a863551e7 Mon Sep 17 00:00:00 2001 From: Petr Cermak Date: Thu, 12 Feb 2015 15:01:00 -0800 Subject: fs/proc/task_mmu.c: add user-space support for resetting mm->hiwater_rss (peak RSS) Peak resident size of a process can be reset back to the process's current rss value by writing "5" to /proc/pid/clear_refs. The driving use-case for this would be getting the peak RSS value, which can be retrieved from the VmHWM field in /proc/pid/status, per benchmark iteration or test scenario. [akpm@linux-foundation.org: clarify behaviour in documentation] Signed-off-by: Petr Cermak Cc: Bjorn Helgaas Cc: Primiano Tucci Cc: Petr Cermak Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index bd52e2f14027..9bee7ec0c31f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1408,6 +1408,11 @@ static inline void update_hiwater_vm(struct mm_struct *mm) mm->hiwater_vm = mm->total_vm; } +static inline void reset_mm_hiwater_rss(struct mm_struct *mm) +{ + mm->hiwater_rss = get_mm_rss(mm); +} + static inline void setmax_mm_hiwater_rss(unsigned long *maxrss, struct mm_struct *mm) { -- cgit v1.2.3 From f56141e3e2d9aabf7e6b89680ab572c2cdbb2a24 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 12 Feb 2015 15:01:14 -0800 Subject: all arches, signal: move restart_block to struct task_struct If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski Cc: Thomas Gleixner Cc: Al Viro Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Kees Cook Cc: David Miller Acked-by: Richard Weinberger Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Vineet Gupta Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Haavard Skinnemoen Cc: Hans-Christian Egtvedt Cc: Steven Miao Cc: Mark Salter Cc: Aurelien Jacquiot Cc: Mikael Starvik Cc: Jesper Nilsson Cc: David Howells Cc: Richard Kuo Cc: "Luck, Tony" Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Ralf Baechle Cc: Jonas Bonn Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Acked-by: Michael Ellerman (powerpc) Tested-by: Michael Ellerman (powerpc) Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Chen Liqin Cc: Lennox Wu Cc: Chris Metcalf Cc: Guan Xuetao Cc: Chris Zankel Cc: Max Filippov Cc: Oleg Nesterov Cc: Guenter Roeck Signed-off-by: James Hogan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/init_task.h | 3 +++ include/linux/sched.h | 2 ++ 2 files changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 3037fc085e8e..d3d43ecf148c 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -193,6 +193,9 @@ extern struct task_group root_task_group; .nr_cpus_allowed= NR_CPUS, \ .mm = NULL, \ .active_mm = &init_mm, \ + .restart_block = { \ + .fn = do_no_restart_syscall, \ + }, \ .se = { \ .group_node = LIST_HEAD_INIT(tsk.se.group_node), \ }, \ diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef98d2f..22ee0d5d7f8c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1370,6 +1370,8 @@ struct task_struct { unsigned long atomic_flags; /* Flags needing atomic access. */ + struct restart_block restart_block; + pid_t pid; pid_t tgid; -- cgit v1.2.3 From dd4a5c1e6531b9c6ba6ee1f3cb11f0aeea25881e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 12 Feb 2015 15:01:22 -0800 Subject: linux/types.h: Always use unsigned long for pgoff_t Everybody uses unsigned long for pgoff_t, and no one ever overrode the definition of pgoff_t. Keep it that way, and remove the option of overriding it. Signed-off-by: Geert Uytterhoeven Cc: Randy Dunlap Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/types.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/types.h b/include/linux/types.h index 62323825cff9..6747247e3f9f 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -135,12 +135,9 @@ typedef unsigned long blkcnt_t; #endif /* - * The type of an index into the pagecache. Use a #define so asm/types.h - * can override it. + * The type of an index into the pagecache. */ -#ifndef pgoff_t #define pgoff_t unsigned long -#endif /* A dma_addr_t can hold any valid DMA or bus address for the platform */ #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT -- cgit v1.2.3 From 545a2bf742fb41f17d03486dd8a8c74ad511dec2 Mon Sep 17 00:00:00 2001 From: Cyril Bur Date: Thu, 12 Feb 2015 15:01:24 -0800 Subject: kernel/sched/clock.c: add another clock for use with the soft lockup watchdog When the hypervisor pauses a virtualised kernel the kernel will observe a jump in timebase, this can cause spurious messages from the softlockup detector. Whilst these messages are harmless, they are accompanied with a stack trace which causes undue concern and more problematically the stack trace in the guest has nothing to do with the observed problem and can only be misleading. Futhermore, on POWER8 this is completely avoidable with the introduction of the Virtual Time Base (VTB) register. This patch (of 2): This permits the use of arch specific clocks for which virtualised kernels can use their notion of 'running' time, not the elpased wall time which will include host execution time. Signed-off-by: Cyril Bur Cc: Michael Ellerman Cc: Andrew Jones Acked-by: Don Zickus Cc: Ingo Molnar Cc: Ulrich Obergfell Cc: chai wen Cc: Fabian Frederick Cc: Aaron Tomlin Cc: Ben Zhang Cc: Martin Schwidefsky Cc: John Stultz Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 22ee0d5d7f8c..048b91b983ed 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2147,6 +2147,7 @@ extern unsigned long long notrace sched_clock(void); */ extern u64 cpu_clock(int cpu); extern u64 local_clock(void); +extern u64 running_clock(void); extern u64 sched_clock_cpu(int cpu); -- cgit v1.2.3 From 02f1f2170d2831b3233e91091c60a66622f29e82 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:01:31 -0800 Subject: kernel.h: remove ancient __FUNCTION__ hack __FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so this only helps people who only test-compile using 3.3 (compiler-gcc3.h barks at anything older than that). Besides, there are almost no occurrences of __FUNCTION__ left in the tree. [akpm@linux-foundation.org: convert remaining __FUNCTION__ references] Signed-off-by: Rasmus Villemoes Cc: Michal Nazarewicz Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kernel.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index e42e7dc34c68..d6d630d31ef3 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -800,9 +800,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { } const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) -/* Trap pasters of __FUNCTION__ at compile-time */ -#define __FUNCTION__ (__func__) - /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */ #ifdef CONFIG_FTRACE_MCOUNT_RECORD # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD -- cgit v1.2.3 From d1214c65c02d503330ce86bd38e344a36599e055 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:01:50 -0800 Subject: libstring_helpers.c:string_get_size(): return void string_get_size() was documented to return an error, but in fact always returned 0. Since the output always fits in 9 bytes, just document that and let callers do what they do now: pass a small stack buffer and ignore the return value. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/string_helpers.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 6eb567ac56bc..657571817260 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -10,8 +10,8 @@ enum string_size_units { STRING_UNITS_2, /* use binary powers of 2^10 */ }; -int string_get_size(u64 size, enum string_size_units units, - char *buf, int len); +void string_get_size(u64 size, enum string_size_units units, + char *buf, int len); #define UNESCAPE_SPACE 0x01 #define UNESCAPE_OCTAL 0x02 -- cgit v1.2.3 From 8b4daad52fee7731ea4bae22a99ece2d4ba7ba43 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:01:53 -0800 Subject: lib/bitmap.c: more signed->unsigned conversions For consistency with the other bitmap_* functions, also make the nbits parameter of bitmap_zero, bitmap_fill and bitmap_copy unsigned. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 202e4034fe26..1406d5453781 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -185,33 +185,33 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, #define small_const_nbits(nbits) \ (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) -static inline void bitmap_zero(unsigned long *dst, int nbits) +static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { if (small_const_nbits(nbits)) *dst = 0UL; else { - int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); memset(dst, 0, len); } } -static inline void bitmap_fill(unsigned long *dst, int nbits) +static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) { - size_t nlongs = BITS_TO_LONGS(nbits); + unsigned int nlongs = BITS_TO_LONGS(nbits); if (!small_const_nbits(nbits)) { - int len = (nlongs - 1) * sizeof(unsigned long); + unsigned int len = (nlongs - 1) * sizeof(unsigned long); memset(dst, 0xff, len); } dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits); } static inline void bitmap_copy(unsigned long *dst, const unsigned long *src, - int nbits) + unsigned int nbits) { if (small_const_nbits(nbits)) *dst = *src; else { - int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); memcpy(dst, src, len); } } -- cgit v1.2.3 From 33c4fa8c6763f1ba9f4ea64079882eaa6d7957b7 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:01:56 -0800 Subject: linux/nodemask.h: update bitmap wrappers to take unsigned int Since the various bitmap_* functions now take an unsigned int as nbits parameter, it makes sense to also update the various wrappers. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/nodemask.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 83a6aeda899d..21cef483dc1b 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -120,13 +120,13 @@ static inline void __node_clear(int node, volatile nodemask_t *dstp) } #define nodes_setall(dst) __nodes_setall(&(dst), MAX_NUMNODES) -static inline void __nodes_setall(nodemask_t *dstp, int nbits) +static inline void __nodes_setall(nodemask_t *dstp, unsigned int nbits) { bitmap_fill(dstp->bits, nbits); } #define nodes_clear(dst) __nodes_clear(&(dst), MAX_NUMNODES) -static inline void __nodes_clear(nodemask_t *dstp, int nbits) +static inline void __nodes_clear(nodemask_t *dstp, unsigned int nbits) { bitmap_zero(dstp->bits, nbits); } @@ -144,7 +144,7 @@ static inline int __node_test_and_set(int node, nodemask_t *addr) #define nodes_and(dst, src1, src2) \ __nodes_and(&(dst), &(src1), &(src2), MAX_NUMNODES) static inline void __nodes_and(nodemask_t *dstp, const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits); } @@ -152,7 +152,7 @@ static inline void __nodes_and(nodemask_t *dstp, const nodemask_t *src1p, #define nodes_or(dst, src1, src2) \ __nodes_or(&(dst), &(src1), &(src2), MAX_NUMNODES) static inline void __nodes_or(nodemask_t *dstp, const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits); } @@ -160,7 +160,7 @@ static inline void __nodes_or(nodemask_t *dstp, const nodemask_t *src1p, #define nodes_xor(dst, src1, src2) \ __nodes_xor(&(dst), &(src1), &(src2), MAX_NUMNODES) static inline void __nodes_xor(nodemask_t *dstp, const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits); } @@ -168,7 +168,7 @@ static inline void __nodes_xor(nodemask_t *dstp, const nodemask_t *src1p, #define nodes_andnot(dst, src1, src2) \ __nodes_andnot(&(dst), &(src1), &(src2), MAX_NUMNODES) static inline void __nodes_andnot(nodemask_t *dstp, const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits); } @@ -176,7 +176,7 @@ static inline void __nodes_andnot(nodemask_t *dstp, const nodemask_t *src1p, #define nodes_complement(dst, src) \ __nodes_complement(&(dst), &(src), MAX_NUMNODES) static inline void __nodes_complement(nodemask_t *dstp, - const nodemask_t *srcp, int nbits) + const nodemask_t *srcp, unsigned int nbits) { bitmap_complement(dstp->bits, srcp->bits, nbits); } @@ -184,7 +184,7 @@ static inline void __nodes_complement(nodemask_t *dstp, #define nodes_equal(src1, src2) \ __nodes_equal(&(src1), &(src2), MAX_NUMNODES) static inline int __nodes_equal(const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { return bitmap_equal(src1p->bits, src2p->bits, nbits); } @@ -192,7 +192,7 @@ static inline int __nodes_equal(const nodemask_t *src1p, #define nodes_intersects(src1, src2) \ __nodes_intersects(&(src1), &(src2), MAX_NUMNODES) static inline int __nodes_intersects(const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { return bitmap_intersects(src1p->bits, src2p->bits, nbits); } @@ -200,25 +200,25 @@ static inline int __nodes_intersects(const nodemask_t *src1p, #define nodes_subset(src1, src2) \ __nodes_subset(&(src1), &(src2), MAX_NUMNODES) static inline int __nodes_subset(const nodemask_t *src1p, - const nodemask_t *src2p, int nbits) + const nodemask_t *src2p, unsigned int nbits) { return bitmap_subset(src1p->bits, src2p->bits, nbits); } #define nodes_empty(src) __nodes_empty(&(src), MAX_NUMNODES) -static inline int __nodes_empty(const nodemask_t *srcp, int nbits) +static inline int __nodes_empty(const nodemask_t *srcp, unsigned int nbits) { return bitmap_empty(srcp->bits, nbits); } #define nodes_full(nodemask) __nodes_full(&(nodemask), MAX_NUMNODES) -static inline int __nodes_full(const nodemask_t *srcp, int nbits) +static inline int __nodes_full(const nodemask_t *srcp, unsigned int nbits) { return bitmap_full(srcp->bits, nbits); } #define nodes_weight(nodemask) __nodes_weight(&(nodemask), MAX_NUMNODES) -static inline int __nodes_weight(const nodemask_t *srcp, int nbits) +static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits) { return bitmap_weight(srcp->bits, nbits); } -- cgit v1.2.3 From f5ac1f55204fec9d9c63644bc1de0ab6a59af9f1 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:01:59 -0800 Subject: linux/cpumask.h: update bitmap wrappers to take unsigned int Since the various bitmap_* functions now take an unsigned int as nbits parameter, it makes sense to also update the various wrappers, even though they're marked as obsolete. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/cpumask.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index b950e9d6008b..ff9044286d88 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -905,13 +905,13 @@ static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp) } #define cpus_setall(dst) __cpus_setall(&(dst), NR_CPUS) -static inline void __cpus_setall(cpumask_t *dstp, int nbits) +static inline void __cpus_setall(cpumask_t *dstp, unsigned int nbits) { bitmap_fill(dstp->bits, nbits); } #define cpus_clear(dst) __cpus_clear(&(dst), NR_CPUS) -static inline void __cpus_clear(cpumask_t *dstp, int nbits) +static inline void __cpus_clear(cpumask_t *dstp, unsigned int nbits) { bitmap_zero(dstp->bits, nbits); } @@ -927,21 +927,21 @@ static inline int __cpu_test_and_set(int cpu, cpumask_t *addr) #define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS) static inline int __cpus_and(cpumask_t *dstp, const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { return bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits); } #define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS) static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits); } #define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS) static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits); } @@ -949,40 +949,40 @@ static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p, #define cpus_andnot(dst, src1, src2) \ __cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS) static inline int __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { return bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits); } #define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS) static inline int __cpus_equal(const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { return bitmap_equal(src1p->bits, src2p->bits, nbits); } #define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS) static inline int __cpus_intersects(const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { return bitmap_intersects(src1p->bits, src2p->bits, nbits); } #define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS) static inline int __cpus_subset(const cpumask_t *src1p, - const cpumask_t *src2p, int nbits) + const cpumask_t *src2p, unsigned int nbits) { return bitmap_subset(src1p->bits, src2p->bits, nbits); } #define cpus_empty(src) __cpus_empty(&(src), NR_CPUS) -static inline int __cpus_empty(const cpumask_t *srcp, int nbits) +static inline int __cpus_empty(const cpumask_t *srcp, unsigned int nbits) { return bitmap_empty(srcp->bits, nbits); } #define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS) -static inline int __cpus_weight(const cpumask_t *srcp, int nbits) +static inline int __cpus_weight(const cpumask_t *srcp, unsigned int nbits) { return bitmap_weight(srcp->bits, nbits); } -- cgit v1.2.3 From eb5698837881687841c9e477e4162ac3387c6b59 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:01 -0800 Subject: lib/bitmap.c: update bitmap_onto to unsigned Change the nbits parameter of bitmap_onto to unsigned int for consistency with other bitmap_* functions. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 1406d5453781..d0c6214eb190 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -164,7 +164,7 @@ extern void bitmap_remap(unsigned long *dst, const unsigned long *src, extern int bitmap_bitremap(int oldbit, const unsigned long *old, const unsigned long *new, int bits); extern void bitmap_onto(unsigned long *dst, const unsigned long *orig, - const unsigned long *relmap, int bits); + const unsigned long *relmap, unsigned int bits); extern void bitmap_fold(unsigned long *dst, const unsigned long *orig, int sz, int bits); extern int bitmap_find_free_region(unsigned long *bitmap, unsigned int bits, int order); -- cgit v1.2.3 From b26ad5836c3a0a9d456eb60b9f841ca15403ee59 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:04 -0800 Subject: lib/bitmap.c: change parameters of bitmap_fold to unsigned Change the sz and nbits parameters of bitmap_fold to unsigned int for consistency with other bitmap_* functions, and to save another few bytes in the generated code. [akpm@linux-foundation.org: fix kerneldoc] Signed-off-by: Rasmus Villemoes Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index d0c6214eb190..95dcd2f76e1a 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -166,7 +166,7 @@ extern int bitmap_bitremap(int oldbit, extern void bitmap_onto(unsigned long *dst, const unsigned long *orig, const unsigned long *relmap, unsigned int bits); extern void bitmap_fold(unsigned long *dst, const unsigned long *orig, - int sz, int bits); + unsigned int sz, unsigned int nbits); extern int bitmap_find_free_region(unsigned long *bitmap, unsigned int bits, int order); extern void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int order); extern int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int order); -- cgit v1.2.3 From f6a1f5db8d7a7a94ff07251996959d27daba4ee7 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:10 -0800 Subject: lib/bitmap.c: simplify bitmap_ord_to_pos Make the return value and the ord and nbits parameters of bitmap_ord_to_pos unsigned. Also, simplify the implementation and as a side effect make the result fully defined, returning nbits for ord >= weight, in analogy with what find_{first,next}_bit does. This is a better sentinel than the former ("unofficial") 0. No current users are affected by this change. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 95dcd2f76e1a..1e74fe7aa167 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -171,7 +171,7 @@ extern int bitmap_find_free_region(unsigned long *bitmap, unsigned int bits, int extern void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int order); extern int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int order); extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits); -extern int bitmap_ord_to_pos(const unsigned long *bitmap, int n, int bits); +extern unsigned int bitmap_ord_to_pos(const unsigned long *bitmap, unsigned int ord, unsigned int nbits); extern int bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp, int nmaskbits); -- cgit v1.2.3 From 9814ec135dedb70a9daa41e68798d540d7ba71f2 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:13 -0800 Subject: lib/bitmap.c: make the bits parameter of bitmap_remap unsigned Also, rename bits to nbits. Both changes for consistency with other bitmap_* functions. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 1e74fe7aa167..5f5c00de39f0 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -160,7 +160,7 @@ extern int bitmap_parselist(const char *buf, unsigned long *maskp, extern int bitmap_parselist_user(const char __user *ubuf, unsigned int ulen, unsigned long *dst, int nbits); extern void bitmap_remap(unsigned long *dst, const unsigned long *src, - const unsigned long *old, const unsigned long *new, int bits); + const unsigned long *old, const unsigned long *new, unsigned int nbits); extern int bitmap_bitremap(int oldbit, const unsigned long *old, const unsigned long *new, int bits); extern void bitmap_onto(unsigned long *dst, const unsigned long *orig, -- cgit v1.2.3 From af3cd13501eb04ca61d017ff4406f1cbffafdc04 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:15 -0800 Subject: lib/string.c: remove strnicmp() Now that all in-tree users of strnicmp have been converted to strncasecmp, the wrapper can be removed. Signed-off-by: Rasmus Villemoes Cc: David Howells Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/string.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/string.h b/include/linux/string.h index 2e22a2e58f3a..b9bc9a5d9e21 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -40,9 +40,6 @@ extern int strcmp(const char *,const char *); #ifndef __HAVE_ARCH_STRNCMP extern int strncmp(const char *,const char *,__kernel_size_t); #endif -#ifndef __HAVE_ARCH_STRNICMP -#define strnicmp strncasecmp -#endif #ifndef __HAVE_ARCH_STRCASECMP extern int strcasecmp(const char *s1, const char *s2); #endif -- cgit v1.2.3 From 114fc1afb2de7dec40da137dc2a55cd38fc220f2 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 12 Feb 2015 15:02:29 -0800 Subject: hexdump: make it return number of bytes placed in buffer This patch makes hexdump return the number of bytes placed in the buffer excluding trailing NUL. In the case of overflow it returns the desired amount of bytes to produce the entire dump. Thus, it mimics snprintf(). This will be useful for users that would like to repeat with a bigger buffer. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/printk.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 4d5bf5726578..baa3f97d8ce8 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -417,9 +417,9 @@ enum { DUMP_PREFIX_ADDRESS, DUMP_PREFIX_OFFSET }; -extern void hex_dump_to_buffer(const void *buf, size_t len, - int rowsize, int groupsize, - char *linebuf, size_t linebuflen, bool ascii); +extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize, + int groupsize, char *linebuf, size_t linebuflen, + bool ascii); #ifdef CONFIG_PRINTK extern void print_hex_dump(const char *level, const char *prefix_str, int prefix_type, int rowsize, int groupsize, -- cgit v1.2.3 From 3248340d3f10871d2ae2d1df9e323f284fa1fdb6 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 12 Feb 2015 15:02:40 -0800 Subject: lib/halfmd4.c: simplify includes We only need EXPORT_SYMBOL, so compiler.h and export.h suffice. This means linux/types.h is no longer implicitly included, so add an include of uapi/linux/types.h to linux/cryptohash.h for __u32. Other users of cryptohash.h cannot be affected, since they must already have been including uapi/linux/types.h in order for gcc not to complain about unknown types. Signed-off-by: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/cryptohash.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cryptohash.h b/include/linux/cryptohash.h index 2cd9f1cf9fa3..f4754282c9c2 100644 --- a/include/linux/cryptohash.h +++ b/include/linux/cryptohash.h @@ -1,6 +1,8 @@ #ifndef __CRYPTOHASH_H #define __CRYPTOHASH_H +#include + #define SHA_DIGEST_WORDS 5 #define SHA_MESSAGE_BYTES (512 /*bits*/ / 8) #define SHA_WORKSPACE_WORDS 16 -- cgit v1.2.3