linux-toradex.git/mm/slab.c, branch v5.12

kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations

2021-03-13T19:27:30+00:00

cache_alloc_debugcheck_after() performs checks on an object, including
adjusting the returned pointer.  None of this should apply to KFENCE
objects.  While for non-bulk allocations, the checks are skipped when we
allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
is called via cache_alloc_debugcheck_after_bulk().

Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.

Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
Signed-off-by: Marco Elver 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: Andrey Konovalov 
Cc: Jann Horn 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, kfence: insert KFENCE hooks for SLAB

2021-02-26T17:41:02+00:00

Inserts KFENCE hooks into the SLAB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
Signed-off-by: Marco Elver 
Signed-off-by: Alexander Potapenko 
Reviewed-by: Dmitry Vyukov 
Co-developed-by: Marco Elver 

Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Dave Hansen 
Cc: Eric Dumazet 
Cc: Greg Kroah-Hartman 
Cc: Hillf Danton 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Joern Engel 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Mark Rutland 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: SeongJae Park 
Cc: Thomas Gleixner 
Cc: Vlastimil Babka 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kasan: move _RET_IP_ to inline wrappers

2021-02-24T21:38:31+00:00

Generic mm functions that call KASAN annotations that might report a bug
pass _RET_IP_ to them as an argument. This allows KASAN to include the
name of the function that called the mm function in its report's header.

Now that KASAN has inline wrappers for all of its annotations, move
_RET_IP_ to those wrappers to simplify annotation call sites.

Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67
Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov 
Reviewed-by: Marco Elver 
Reviewed-by: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Branislav Rankov 
Cc: Catalin Marinas 
Cc: Dmitry Vyukov 
Cc: Evgenii Stepanov 
Cc: Kevin Brodsky 
Cc: Peter Collingbourne 
Cc: Vincenzo Frascino 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT

2021-02-24T21:38:29+00:00

In general it's unknown in advance if a slab page will contain accounted
objects or not.  In order to avoid memory waste, an obj_cgroup vector is
allocated dynamically when a need to account of a new object arises.  Such
approach is memory efficient, but requires an expensive cmpxchg() to set
up the memcg/objcgs pointer, because an allocation can race with a
different allocation on another cpu.

But in some common cases it's known for sure that a slab page will contain
accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT
flag set.  It includes such popular objects like vm_area_struct, anon_vma,
task_struct, etc.

In such cases we can pre-allocate the objcgs vector and simple assign it
to the page without any atomic operations, because at this early stage the
page is not visible to anyone else.

A very simplistic benchmark (allocating 10000000 64-bytes objects in a
row) shows ~15% win.  In the real life it seems that most workloads are
not very sensitive to the speed of (accounted) slab allocations.

[guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes]
  Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com
[akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch]

Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com
Signed-off-by: Roman Gushchin 
Acked-by: Johannes Weiner 
Reviewed-by: Shakeel Butt 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/slab: minor coding style tweaks

2021-02-24T21:38:27+00:00

Fix some coding style issues, improve code reading.  Adds whitespace to
clearly separate the parameters.

Link: https://lkml.kernel.org/r/1612841499-32166-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/sl?b.c: remove ctor argument from kmem_cache_flags

2021-02-24T21:38:27+00:00

This argument hasn't been used since e153362a50a3 ("slub: Remove objsize
check in kmem_cache_flags()") so simply remove it.

Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov 
Reviewed-by: Miaohe Lin 
Reviewed-by: Vlastimil Babka 
Acked-by: Christoph Lameter 
Acked-by: David Rientjes 
Cc: Pekka Enberg 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, tracing: record slab name for kmem_cache_free()

2021-02-24T21:38:26+00:00

Currently, a trace record generated by the RCU core is as below.

... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66

It doesn't tell us what the RCU core has freed.

This patch adds the slab name to trace_kmem_cache_free().
The new format is as follows.

... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node

We can use it to understand what the RCU core is going to free. For
example, some users maybe interested in when the RCU core starts
freeing reclaimable slabs like dentry to reduce memory pressure.

Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.com
Signed-off-by: Jacob Wen 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Steven Rostedt 
Cc: "Paul E. McKenney" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: Add mem_dump_obj() to print source of memory block

2021-01-22T23:16:01+00:00

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: 
Reported-by: Andrii Nakryiko 
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
[ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
[ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
[ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
Acked-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Tested-by: Naresh Kamboju 
Signed-off-by: Paul E. McKenney

mm: introduce debug_pagealloc_{map,unmap}_pages() helpers

2020-12-15T20:13:43+00:00

Patch series "arch, mm: improve robustness of direct map manipulation", v7.

During recent discussion about KVM protected memory, David raised a
concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
scope [1].

Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
possible that __kernel_map_pages() would fail, but since this function is
void, the failure will go unnoticed.

Moreover, there's lack of consistency of __kernel_map_pages() semantics
across architectures as some guard this function with #ifdef
DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
debugging is disabled at run time and some allow modifying the direct map
regardless of DEBUG_PAGEALLOC settings.

This set straightens this out by restoring dependency of
__kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
accordingly.

Since currently the only user of __kernel_map_pages() outside
DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
there more explicit.

[1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com

This patch (of 4):

When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
direct mapping after free_pages().  The pages than need to be mapped back
before they could be used.  Theese mapping operations use
__kernel_map_pages() guarded with with debug_pagealloc_enabled().

The only place that calls __kernel_map_pages() without checking whether
DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
availability of this function when ARCH_HAS_SET_DIRECT_MAP is set.  Still,
on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
enabled but set_direct_map_invalid_noflush() may render some pages not
present in the direct map and hibernation code won't be able to save such
pages.

To make page allocation debugging and hibernation interaction more robust,
the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
made more explicit.

Start with combining the guard condition and the call to
__kernel_map_pages() into debug_pagealloc_map_pages() and
debug_pagealloc_unmap_pages() functions to emphasize that
__kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
these new functions to map/unmap pages when page allocation debugging is
enabled.

Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.org
Signed-off-by: Mike Rapoport 
Reviewed-by: David Hildenbrand 
Acked-by: Kirill A. Shutemov 
Acked-by: Vlastimil Babka 
Cc: Albert Ou 
Cc: Andy Lutomirski 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christian Borntraeger 
Cc: Christoph Lameter 
Cc: "David S. Miller" 
Cc: Dave Hansen 
Cc: David Rientjes 
Cc: "Edgecombe, Rick P" 
Cc: "H. Peter Anvin" 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Joonsoo Kim 
Cc: Len Brown 
Cc: Michael Ellerman 
Cc: Palmer Dabbelt 
Cc: Paul Mackerras 
Cc: Paul Walmsley 
Cc: Pavel Machek 
Cc: Pekka Enberg 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Thomas Gleixner 
Cc: Vasily Gorbik 
Cc: Will Deacon 
Cc: Rafael J. Wysocki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/slab: rerform init_on_free earlier

2020-12-15T20:13:37+00:00

Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.

Lets move init_on_free clearing before calling kasan_slab_free().  In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.

Link: https://lkml.kernel.org/r/20201210183729.1261524-1-alex.popov@linux.com
Signed-off-by: Alexander Popov 
Reviewed-by: Alexander Potapenko 
Acked-by: David Rientjes 
Acked-by: Joonsoo Kim 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds