summaryrefslogtreecommitdiff
path: root/mm/slab.c
AgeCommit message (Collapse)Author
2010-08-09gcc-4.6: mm: fix unused but set warningsAndi Kleen
No real bugs, just some dead code and some fixups. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Allow removal of slab caches during boot Revert "slub: Allow removal of slab caches during boot" slub numa: Fix rare allocation from unexpected node slab: use deferable timers for its periodic housekeeping slub: Use kmem_cache flags to detect if slab is in debugging mode. slub: Allow removal of slab caches during boot slub: Check kasprintf results in kmem_cache_init() SLUB: Constants need UL slub: Use a constant for a unspecified node. SLOB: Free objects to their own list slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACING
2010-07-20slab: use deferable timers for its periodic housekeepingArjan van de Ven
slab has a "once every 2 second" timer for its housekeeping. As the number of logical processors is growing, its more and more common that this 2 second timer becomes the primary wakeup source. This patch turns this housekeeping timer into a deferable timer, which means that the timer does not interrupt idle, but just runs at the next event that wakes the cpu up. The impact is that the timer likely runs a bit later, but during the delay no code is running so there's not all that much reason for a difference in housekeeping to occur because of this delay. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-06-09tracing: Remove kmemtrace ftrace pluginLi Zefan
We have been resisting new ftrace plugins and removing existing ones, and kmemtrace has been superseded by kmem trace events and perf-kmem, so we remove it. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> [ remove kmemtrace from the makefile, handle slob too ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-27numa: slab: use numa_mem_id() for slab local memory nodeLee Schermerhorn
Example usage of generic "numa_mem_id()": The mainline slab code, since ~ 2.6.19, does not handle memoryless nodes well. Specifically, the "fast path"--____cache_alloc()--will never succeed as slab doesn't cache offnode object on the per cpu queues, and for memoryless nodes, all memory will be "off node" relative to numa_node_id(). This adds significant overhead to all kmem cache allocations, incurring a significant regression relative to earlier kernels [from before slab.c was reorganized]. This patch uses the generic topology function "numa_mem_id()" to return the "effective local memory node" for the calling context. This is the first node in the local node's generic fallback zonelist-- the same node that "local" mempolicy-based allocations would use. This lets slab cache these "local" allocations and avoid fallback/refill on every allocation. N.B.: Slab will need to handle node and memory hotplug events that could change the value returned by numa_mem_id() for any given node if recent changes to address memory hotplug don't already address this. E.g., flush all per cpu slab queues before rebuilding the zonelists while the "machine" is held in the stopped state. Performance impact on "hackbench 400 process 200" 2.6.34-rc3-mmotm-100405-1609 no-patch this-patch ia64 no memoryless nodes [avg of 10]: 11.713 11.637 ~0.65 diff ia64 cpus all on memless nodes [10]: 228.259 26.484 ~8.6x speedup The slowdown of the patched kernel from ~12 sec to ~28 seconds when configured with memoryless nodes is the result of all cpus allocating from a single node's mm pagepool. The cache lines of the single node are distributed/interleaved over the memory of the real physical nodes, but the zone lock, list heads, ... of the single node with memory still each live in a single cache line that is accessed from all processors. x86_64 [8x6 AMD] [avg of 40]: 2.883 2.845 Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27slab: convert cpu notifier to return encapsulate errno valueAkinobu Mita
By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for slab. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27cpusets: new round-robin rotor for SLAB allocationsJack Steiner
We have observed several workloads running on multi-node systems where memory is assigned unevenly across the nodes in the system. There are numerous reasons for this but one is the round-robin rotor in cpuset_mem_spread_node(). For example, a simple test that writes a multi-page file will allocate pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it allocates on odd nodes & skips even nodes). An example is shown below. The program "lfile" writes a file consisting of 10 pages. The program then mmaps the file & uses get_mempolicy(..., MPOL_F_NODE) to determine the nodes where the file pages were allocated. The output is shown below: # ./lfile allocated on nodes: 2 4 6 0 1 2 6 0 2 There is a single rotor that is used for allocating both file pages & slab pages. Writing the file allocates both a data page & a slab page (buffer_head). This advances the RR rotor 2 nodes for each page allocated. A quick confirmation seems to confirm this is the cause of the uneven allocation: # echo 0 >/dev/cpuset/memory_spread_slab # ./lfile allocated on nodes: 6 7 8 9 0 1 2 3 4 5 This patch introduces a second rotor that is used for slab allocations. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25cpuset,mm: fix no node to alloc memory when changing cpuset's memsMiao Xie
Before applying this patch, cpuset updates task->mems_allowed and mempolicy by setting all new bits in the nodemask first, and clearing all old unallowed bits later. But in the way, the allocator may find that there is no node to alloc memory. The reason is that cpuset rebinds the task's mempolicy, it cleans the nodes which the allocater can alloc pages on, for example: (mpol: mempolicy) task1 task1's mpol task2 alloc page 1 alloc on node0? NO 1 1 change mems from 1 to 0 1 rebind task1's mpol 0-1 set new bits 0 clear disallowed bits alloc on node1? NO 0 ... can't alloc page goto oom This patch fixes this problem by expanding the nodes range first(set newly allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a variable to tell the write-side task that read-side task is reading nodemask, and the write-side task clears newly disallowed nodes after read-side task ends the current memory allocation. [akpm@linux-foundation.org: fix spello] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-22Merge branches 'slab/align', 'slab/cleanups', 'slab/fixes', 'slab/memhotadd' ↵Pekka Enberg
and 'slub/fixes' into slab-for-linus
2010-05-19mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h>David Woodhouse
Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-04-14slab: Fix missing DEBUG_SLAB last userShiyong Li
Even with SLAB_RED_ZONE and SLAB_STORE_USER enabled, kernel would NOT store redzone and last user data around allocated memory space if "arch cache line > sizeof(unsigned long long)". As a result, last user information is unexpectedly MISSED while dumping slab corruption log. This fix makes sure that redzone and last user tags get stored unless the required alignment breaks redzone's. Signed-off-by: Shiyong Li <shi-yong.li@motorola.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-04-09slab: Generify kernel pointer validationPekka Enberg
As suggested by Linus, introduce a kern_ptr_validate() helper that does some sanity checks to make sure a pointer is a valid kernel pointer. This is a preparational step for fixing SLUB kmem_ptr_validate(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07slab: add memory hotplug supportDavid Rientjes
Slab lacks any memory hotplug support for nodes that are hotplugged without cpus being hotplugged. This is possible at least on x86 CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate node. It can also be done manually by writing the start address to /sys/devices/system/memory/probe for kernels that have CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and then onlining the new memory region. When a node is hotadded, a nodelist for that node is allocated and initialized for each slab cache. If this isn't completed due to a lack of memory, the hotadd is aborted: we have a reasonable expectation that kmalloc_node(nid) will work for all caches if nid is online and memory is available. Since nodelists must be allocated and initialized prior to the new node's memory actually being online, the struct kmem_list3 is allocated off-node due to kmalloc_node()'s fallback. When an entire node would be offlined, its nodelists are subsequently drained. If slab objects still exist and cannot be freed, the offline is aborted. It is possible that objects will be allocated between this drain and page isolation, so it's still possible that the offline will still fail, however. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-03-28slab: Fix continuation linesJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-03-04Merge branches 'slab/cleanups', 'slab/failslab', 'slab/fixes' and ↵Pekka Enberg
'slub/percpu' into slab-for-linus
2010-02-26failslab: add ability to filter slab cachesDmitry Monakhov
This patch allow to inject faults only for specific slabs. In order to preserve default behavior cache filter is off by default (all caches are faulty). One may define specific set of slabs like this: # mark skbuff_head_cache as faulty echo 1 > /sys/kernel/slab/skbuff_head_cache/failslab # Turn on cache filter (off by default) echo 1 > /sys/kernel/debug/failslab/cache-filter # Turn on fault injection echo 1 > /sys/kernel/debug/failslab/times echo 1 > /sys/kernel/debug/failslab/probability Acked-by: David Rientjes <rientjes@google.com> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-01-30slab: fix regression in touched logicNick Piggin
When factoring common code into transfer_objects in commit 3ded175 ("slab: add transfer_objects() function"), the 'touched' logic got a bit broken. When refilling from the shared array (taking objects from the shared array), we are making use of the shared array so it should be marked as touched. Subsequently pulling an element from the cpu array and allocating it should also touch the cpu array, but that is taken care of after the alloc_done label. (So yes, the cpu array was getting touched = 1 twice). So revert this logic to how it worked in earlier kernels. This also affects the behaviour in __drain_alien_cache, which would previously 'touch' the shared array and now does not. I think it is more logical not to touch there, because we are pushing objects into the shared array rather than pulling them off. So there is no good reason to postpone reaping them -- if the shared array is getting utilized, then it will get 'touched' in the alloc path (where this patch now restores the touch). Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-01-11slab: initialize unused alien cache entry as NULL at alloc_alien_cache().Haicheng Li
Comparing with existing code, it's a simpler way to use kzalloc_node() to ensure that each unused alien cache entry is NULL. CC: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-12-28SLAB: Fix lockdep annotation breakagePekka Enberg
Commit ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations for CPU hotplug") broke init_node_lock_keys() off-slab logic which causes lockdep false positives. Fix that up by reverting the logic back to original while keeping CPU hotplug fixes intact. Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-and-tested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-12-17Merge branch 'cpumask-cleanups' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: cpumask: rename tsk_cpumask to tsk_cpus_allowed cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt cpumask: avoid dereferencing struct cpumask cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c cpumask: avoid deprecated function in mm/slab.c cpumask: use cpu_online in kernel/perf_event.c
2009-12-17Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6Linus Torvalds
* 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: fix kconfig for crc32 build error kmemleak: Reduce the false positives by checking for modified objects kmemleak: Show the age of an unreferenced object kmemleak: Release the object lock before calling put_object() kmemleak: Scan the _ftrace_events section in modules kmemleak: Simplify the kmemleak_scan_area() function prototype kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACE
2009-12-17cpumask: avoid deprecated function in mm/slab.cRusty Russell
These days we use cpumask_empty() which takes a pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christoph Lameter <cl@linux-foundation.org>
2009-12-14Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf sched: Fix build failure on sparc perf bench: Add "all" pseudo subsystem and "all" pseudo suite perf tools: Introduce perf_session class perf symbols: Ditch dso->find_symbol perf symbols: Allow lookups by symbol name too perf symbols: Add missing "Variables" entry to map_type__name perf symbols: Add support for 'variable' symtabs perf symbols: Introduce ELF counterparts to symbol_type__is_a perf symbols: Introduce symbol_type__is_a perf symbols: Rename kthreads to kmaps, using another abstraction for it perf tools: Allow building for ARM hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value perf tools: Allow cross compiling tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING Trivial conflict due to different fixes to modify_user_hw_breakpoint() in include/linux/hw_breakpoint.h
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-12Merge branches 'slab/fixes', 'slab/kmemleak', 'slub/perf' and 'slub/stats' ↵Pekka Enberg
into for-linus
2009-12-11tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACELi Zefan
For slab, if CONFIG_KMEMTRACE and CONFIG_DEBUG_SLAB are not set, __do_kmalloc() will not track callers: # ./perf record -f -a -R -e kmem:kmalloc ^C # ./perf trace ... perf-2204 [000] 147.376774: kmalloc: call_site=c0529d2d ... perf-2204 [000] 147.400997: kmalloc: call_site=c0529d2d ... Xorg-1461 [001] 147.405413: kmalloc: call_site=0 ... Xorg-1461 [001] 147.405609: kmalloc: call_site=0 ... konsole-1776 [001] 147.405786: kmalloc: call_site=0 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: linux-mm@kvack.org <linux-mm@kvack.org> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <4B21F8AE.6020804@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-11tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACINGLi Zefan
Define kmem_trace_alloc_{,node}_notrace() if CONFIG_TRACING is enabled, otherwise perf-kmem will show wrong stats ifndef CONFIG_KMEM_TRACE, because a kmalloc() memory allocation may be traced by both trace_kmalloc() and trace_kmem_cache_alloc(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: linux-mm@kvack.org <linux-mm@kvack.org> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <4B21F89A.7000801@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06slab, kmemleak: pass the correct pointer to kmemleak_erase()J. R. Okajima
In ____cache_alloc(), the variable 'ac' may be changed after cache_alloc_refill() and the following kmemleak_erase() may get an incorrect pointer. Update 'ac' after cache_alloc_refill() unconditionally. See the following URL for the discussion of this patch: http://marc.info/?l=linux-kernel&m=125873373124187&w=2 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-12-06slab, kmemleak: stop calling kmemleak_erase() unconditionallyJ. R. Okajima
When the gotten object is NULL (probably due to ENOMEM), kmemleak_erase() is unnecessary here, It just sets NULL to where already is NULL. Add a condition. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-12-06SLAB: Fix unlikely() annotation in __cache_alloc_node()Tim Blechmann
Branch profiling on my nehalem machine showed 99% incorrect branch hints: 28459 7678524 99 __cache_alloc_node slab.c 3551 Discussion on lkml [1] led to the solution to remove this hint. [1] http://patchwork.kernel.org/patch/63517/ Signed-off-by: Tim Blechmann <tim@klingt.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-11-30SLAB: Fix lockdep annotations for CPU hotplugPekka Enberg
As reported by Paul McKenney: I am seeing some lockdep complaints in rcutorture runs that include frequent CPU-hotplug operations. The tests are otherwise successful. My first thought was to send a patch that gave each array_cache structure's ->lock field its own struct lock_class_key, but you already have a init_lock_keys() that seems to be intended to deal with this. ------------------------------------------------------------------------ ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc4-autokern1 #1 --------------------------------------------- syslogd/2908 is trying to acquire lock: (&nc->lock){..-...}, at: [<c0000000001407f4>] .kmem_cache_free+0x118/0x2d4 but task is already holding lock: (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324 other info that might help us debug this: 3 locks held by syslogd/2908: #0: (&u->readlock){+.+.+.}, at: [<c0000000004556f8>] .unix_dgram_recvmsg+0x70/0x338 #1: (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324 #2: (&parent->list_lock){-.-...}, at: [<c000000000140f64>] .__drain_alien_cache+0x50/0xb8 stack backtrace: Call Trace: [c0000000e8ccafc0] [c0000000000101e4] .show_stack+0x70/0x184 (unreliable) [c0000000e8ccb070] [c0000000000afebc] .validate_chain+0x6ec/0xf58 [c0000000e8ccb180] [c0000000000b0ff0] .__lock_acquire+0x8c8/0x974 [c0000000e8ccb280] [c0000000000b2290] .lock_acquire+0x140/0x18c [c0000000e8ccb350] [c000000000468df0] ._spin_lock+0x48/0x70 [c0000000e8ccb3e0] [c0000000001407f4] .kmem_cache_free+0x118/0x2d4 [c0000000e8ccb4a0] [c000000000140b90] .free_block+0x130/0x1a8 [c0000000e8ccb540] [c000000000140f94] .__drain_alien_cache+0x80/0xb8 [c0000000e8ccb5e0] [c0000000001411e0] .kfree+0x214/0x324 [c0000000e8ccb6a0] [c0000000003ca860] .skb_release_data+0xe8/0x104 [c0000000e8ccb730] [c0000000003ca2ec] .__kfree_skb+0x20/0xd4 [c0000000e8ccb7b0] [c0000000003cf2c8] .skb_free_datagram+0x1c/0x5c [c0000000e8ccb830] [c00000000045597c] .unix_dgram_recvmsg+0x2f4/0x338 [c0000000e8ccb920] [c0000000003c0f14] .sock_recvmsg+0xf4/0x13c [c0000000e8ccbb30] [c0000000003c28ec] .SyS_recvfrom+0xb4/0x130 [c0000000e8ccbcb0] [c0000000003bfb78] .sys_recv+0x18/0x2c [c0000000e8ccbd20] [c0000000003ed388] .compat_sys_recv+0x14/0x28 [c0000000e8ccbd90] [c0000000003ee1bc] .compat_sys_socketcall+0x178/0x220 [c0000000e8ccbe30] [c0000000000085d4] syscall_exit+0x0/0x40 This patch fixes the issue by setting up lockdep annotations during CPU hotplug. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-10-29percpu: make percpu symbols under kernel/ and mm/ uniqueTejun Heo
This patch updates percpu related symbols under kernel/ and mm/ such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * kernel/lockdep.c: s/lock_stats/cpu_lock_stats/ * kernel/sched.c: s/init_rq_rt/init_rt_rq_var/ (any better idea?) s/sched_group_cpus/sched_groups/ * kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a * kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/ s/watchdog_task/softlockup_watchdog/ s/timestamp/ts/ for local variables * kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/ * mm/slab.c: s/reap_work/slab_reap_work/ s/reap_node/slab_reap_node/ * mm/vmstat.c: local variable changed to avoid collision with vmstat_work Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: (slab/vmstat) Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de>
2009-10-28kmemleak: Simplify the kmemleak_scan_area() function prototypeCatalin Marinas
This function was taking non-necessary arguments which can be determined by kmemleak. The patch also modifies the calling sites. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-10-28kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACECatalin Marinas
With the slab allocator, if off-slab management is enabled for the kmem_caches used by kmemleak, it leads to recursive calls into kmemleak_alloc(). Off-slab management can be triggered by other config options increasing the slab size, e.g. DEBUG_PAGEALLOC. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2009-09-22mm: replace various uses of num_physpages by totalram_pagesJan Beulich
Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-29SLAB: Fix lockdep annotationsPekka Enberg
Commit 8429db5... ("slab: setup cpu caches later on when interrupts are enabled") broke mm/slab.c lockdep annotations: [ 11.554715] ============================================= [ 11.555249] [ INFO: possible recursive locking detected ] [ 11.555560] 2.6.31-rc1 #896 [ 11.555861] --------------------------------------------- [ 11.556127] udevd/1899 is trying to acquire lock: [ 11.556436] (&nc->lock){-.-...}, at: [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b [ 11.557101] [ 11.557102] but task is already holding lock: [ 11.557706] (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292 [ 11.558109] [ 11.558109] other info that might help us debug this: [ 11.558720] 2 locks held by udevd/1899: [ 11.558983] #0: (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292 [ 11.559734] #1: (&parent->list_lock){-.-...}, at: [<ffffffff810c36c7>] __drain_alien_cache+0x3b/0xbd [ 11.560442] [ 11.560443] stack backtrace: [ 11.561009] Pid: 1899, comm: udevd Not tainted 2.6.31-rc1 #896 [ 11.561276] Call Trace: [ 11.561632] [<ffffffff81065ed6>] __lock_acquire+0x15ec/0x168f [ 11.561901] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f [ 11.562171] [<ffffffff81063c52>] ? trace_hardirqs_on_caller+0x113/0x13e [ 11.562490] [<ffffffff8150c337>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 11.562807] [<ffffffff8106603a>] lock_acquire+0xc1/0xe5 [ 11.563073] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b [ 11.563385] [<ffffffff8150c8fc>] _spin_lock+0x31/0x66 [ 11.563696] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b [ 11.563964] [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b [ 11.564235] [<ffffffff8109bf8c>] ? __free_pages+0x1b/0x24 [ 11.564551] [<ffffffff810c3564>] slab_destroy+0x57/0x5c [ 11.564860] [<ffffffff810c3641>] free_block+0xd8/0x123 [ 11.565126] [<ffffffff810c372e>] __drain_alien_cache+0xa2/0xbd [ 11.565441] [<ffffffff810c3ce5>] kfree+0x14c/0x292 [ 11.565752] [<ffffffff8144a007>] skb_release_data+0xc6/0xcb [ 11.566020] [<ffffffff81449cf0>] __kfree_skb+0x19/0x86 [ 11.566286] [<ffffffff81449d88>] consume_skb+0x2b/0x2d [ 11.566631] [<ffffffff8144cbe0>] skb_free_datagram+0x14/0x3a [ 11.566901] [<ffffffff81462eef>] netlink_recvmsg+0x164/0x258 [ 11.567170] [<ffffffff81443461>] sock_recvmsg+0xe5/0xfe [ 11.567486] [<ffffffff810ab063>] ? might_fault+0xaf/0xb1 [ 11.567802] [<ffffffff81053a78>] ? autoremove_wake_function+0x0/0x38 [ 11.568073] [<ffffffff810d84ca>] ? core_sys_select+0x3d/0x2b4 [ 11.568378] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f [ 11.568693] [<ffffffff81442dc1>] ? sockfd_lookup_light+0x1b/0x54 [ 11.568961] [<ffffffff81444416>] sys_recvfrom+0xa3/0xf8 [ 11.569228] [<ffffffff81063c8a>] ? trace_hardirqs_on+0xd/0xf [ 11.569546] [<ffffffff8100af2b>] system_call_fastpath+0x16/0x1b# Fix that up. Closes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13654 Tested-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-26fix RCU-callback-after-kmem_cache_destroy problem in sl[aou]bPaul E. McKenney
Jesper noted that kmem_cache_destroy() invokes synchronize_rcu() rather than rcu_barrier() in the SLAB_DESTROY_BY_RCU case, which could result in RCU callbacks accessing a kmem_cache after it had been destroyed. Cc: <stable@kernel.org> Acked-by: Matt Mackall <mpm@selenic.com> Reported-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-18mm: Extend gfp masking to the page allocatorBenjamin Herrenschmidt
The page allocator also needs the masking of gfp flags during boot, so this moves it out of slab/slub and uses it with the page allocator as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-17Merge branches 'slab/documentation', 'slab/fixes', 'slob/cleanups' and ↵Pekka Enberg
'slub/fixes' into for-linus
2009-06-16Merge branch 'akpm'Linus Torvalds
* akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: *bfin*: fix __dev{init,exit} markings fbdev: *bfin*: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16page allocator: slab: use nr_online_nodes to check for a NUMA platformMel Gorman
SLAB currently avoids checking a bitmap repeatedly by checking once and storing a flag. When the addition of nr_online_nodes as a cheaper version of num_online_nodes(), this check can be replaced by nr_online_nodes. (Christoph did a patch that this is lifted almost verbatim from) Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16page allocator: do not check NUMA node ID when the caller knows the node is ↵Mel Gorman
valid Callers of alloc_pages_node() can optionally specify -1 as a node to mean "allocate from the current node". However, a number of the callers in fast paths know for a fact their node is valid. To avoid a comparison and branch, this patch adds alloc_pages_exact_node() that only checks the nid with VM_BUG_ON(). Callers that know their node is valid are then converted. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-15Merge commit 'linus/master' into HEADVegard Nossum
Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15kmemcheck: add hooks for the page allocatorVegard Nossum
This adds support for tracking the initializedness of memory that was allocated with the page allocator. Highmem requests are not tracked. Cc: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> [build fix for !CONFIG_KMEMCHECK] Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15slab: add hooks for kmemcheckPekka Enberg
We now have SLAB support for kmemcheck! This means that it doesn't matter whether one chooses SLAB or SLUB, or indeed whether Linus chooses to chuck SLAB or SLUB.. ;-) Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-13slab: move struct kmem_cache to headersPekka Enberg
Move the SLAB struct kmem_cache definition to <linux/slab_def.h> like with SLUB so kmemcheck can access ->ctor and ->flags. Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-12slab: setup cpu caches later on when interrupts are enabledPekka Enberg
Fixes the following boot-time warning: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c [ 0.000000] [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff8108f2be>] smp_call_function+0x3d/0x68 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c [ 0.000000] [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454 [ 0.000000] [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b [ 0.000000] [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593 [ 0.000000] [<ffffffff810f69cf>] enable_cpucache+0x68/0xad [ 0.000000] [<ffffffff818133c3>] kmem_cache_init+0x434/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12slab,slub: don't enable interrupts during early bootPekka Enberg
As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12slab: fix gfp flag in setup_cpu_cache()Pekka Enberg
Fixes the following warning during bootup when compiling with CONFIG_SLAB: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0x91/0xb9() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #491 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81087d84>] ? lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff81087d84>] lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff810f5b03>] kmem_cache_alloc_node_notrace+0x26/0xdf [ 0.000000] [<ffffffff81487f4e>] ? setup_cpu_cache+0x7e/0x210 [ 0.000000] [<ffffffff81487fe3>] setup_cpu_cache+0x113/0x210 [ 0.000000] [<ffffffff810f73ff>] kmem_cache_create+0x409/0x486 [ 0.000000] [<ffffffff818131c1>] kmem_cache_init+0x232/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11Merge branch 'for-linus' of git://linux-arm.org/linux-2.6Linus Torvalds
* 'for-linus' of git://linux-arm.org/linux-2.6: kmemleak: Add the corresponding MAINTAINERS entry kmemleak: Simple testing module for kmemleak kmemleak: Enable the building of the memory leak detector kmemleak: Remove some of the kmemleak false positives kmemleak: Add modules support kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash kmemleak: Add the vmalloc memory allocation/freeing hooks kmemleak: Add the slub memory allocation/freeing hooks kmemleak: Add the slob memory allocation/freeing hooks kmemleak: Add the slab memory allocation/freeing hooks kmemleak: Add documentation on the memory leak detector kmemleak: Add the base support Manual conflict resolution (with the slab/earlyboot changes) in: drivers/char/vt.c init/main.c mm/slab.c