summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-10-05Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh generic-ipi: Fix deadlock in __smp_call_function_single
2010-10-05modules: Fix module_bug_list list corruption raceLinus Torvalds
With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01kfifo: fix scatterlist usageIra W. Snyder
The kfifo_dma family of functions use sg_mark_end() on the last element in their scatterlist. This forces use of a fresh scatterlist for each DMA operation, which makes recycling a single scatterlist impossible. Change the behavior of the kfifo_dma functions to match the usage of the dma_map_sg function. This means that users must respect the returned nents value. The sample code is updated to reflect the change. This bug is trivial to cause: call kfifo_dma_in_prepare() such that it prepares a scatterlist with a single entry comprising the whole fifo. This is the case when you map the entirety of a newly created empty fifo. This causes the setup_sgl() function to mark the first scatterlist entry as the end of the chain, no matter what comes after it. Afterwards, add and remove some data from the fifo such that another call to kfifo_dma_in_prepare() will create two scatterlist entries. It returns nents=2. However, due to the previous sg_mark_end() call, sg_is_last() will now return true for the first scatterlist element. This causes the sample code to print a single scatterlist element when it should print two. By removing the call to sg_mark_end(), we make the API as similar as possible to the DMA mapping API. All users are required to respect the returned nents. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-22rmap: fix walk during forkAndrea Arcangeli
The below bug in fork led to the rmap walk finding the parent huge-pmd twice instead of just once, because the anon_vma_chain objects of the child vma still point to the vma->vm_mm of the parent. The patch fixes it by making the rmap walk accurate during fork. It's not a big deal normally but it worth being accurate considering the cost is the same. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-21Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix nohz balance kick sched: Fix user time incorrectly accounted as system time on 32-bit
2010-09-21Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hw breakpoints: Fix pid namespace bug x86: Fix instruction breakpoint encoding oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) kprobes: Fix Kconfig dependency
2010-09-21sched: Fix nohz balance kickSuresh Siddha
There's a situation where the nohz balancer will try to wake itself: cpu-x is idle which is also ilb_cpu got a scheduler tick during idle and the nohz_kick_needed() in trigger_load_balance() checks for rq_x->nr_running which might not be zero (because of someone waking a task on this rq etc) and this leads to the situation of the cpu-x sending a kick to itself. And this can cause a lockup. Avoid this by not marking ourself eligible for kicking. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1284400941.2684.19.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17hw breakpoints: Fix pid namespace bugMatt Helsley
Hardware breakpoints can't be registered within pid namespaces because tsk->pid is passed rather than the pid in the current namespace. (See https://bugzilla.kernel.org/show_bug.cgi?id=17281 ) This is a quick fix demonstrating the problem but is not the best method of solving the problem since passing pids internally is not the best way to avoid pid namespace bugs. Subsequent patches will show a better solution. Much thanks to Frederic Weisbecker <fweisbec@gmail.com> for doing the bulk of the work finding this bug. Reported-by: Robin Green <greenrd@greenrd.org> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: 2.6.33-2.6.35 <stable@kernel.org> LKML-Reference: <f63454af09fb1915717251570423eb9ddd338340.1284407762.git.matthltc@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-09-16Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add documentation
2010-09-15sched: Fix user time incorrectly accounted as system time on 32-bitStanislaw Gruszka
We have 32-bit variable overflow possibility when multiply in task_times() and thread_group_times() functions. When the overflow happens then the scaled utime value becomes erroneously small and the scaled stime becomes i erroneously big. Reported here: https://bugzilla.redhat.com/show_bug.cgi?id=633037 https://bugzilla.kernel.org/show_bug.cgi?id=16559 Reported-by: Michael Chapman <redhat-bugzilla@very.puzzling.org> Reported-by: Ciriaco Garcia de Celis <sysman@etherpilot.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: <stable@kernel.org> # 2.6.32.19+ (partially) and 2.6.33+ LKML-Reference: <20100914143513.GB8415@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-14compat: Make compat_alloc_user_space() incorporate the access_ok()H. Peter Anvin
compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@kernel.org>
2010-09-13sched: Improve latencies under load by decreasing minimum scheduling granularityIngo Molnar
Mathieu reported bad latencies with make -j10 kind of kbuild workloads - which is mostly caused by us scheduling with a too coarse granularity. Reduce the minimum granularity some more, to make sure we can meet the latency target. I got the following results (make -j10 kbuild load, average of 3 runs): vanilla: maximum latency: 38278.9 µs average latency: 7730.1 µs patched: maximum latency: 22702.1 µs average latency: 6684.8 µs Mathieu also measured it: | | * wakeup-latency.c (SIGEV_THREAD) with make -j10 | | - Mainline 2.6.35.2 kernel | | maximum latency: 45762.1 µs | average latency: 7348.6 µs | | - With only Peter's smaller min_gran (shown below): | | maximum latency: 29100.6 µs | average latency: 6684.1 µs | Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTi=8m4g01wZPacySoF7U0PevTNVgJoZZrHiUD-pN@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-13workqueue: add documentationTejun Heo
Update copyright notice and add Documentation/workqueue.txt. Randy Dunlap, Dave Chinner: misc fixes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-By: Florian Mickler <florian@mickler.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Dave Chinner <david@fromorbit.com>
2010-09-11Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: Avoid hitting OOM during preallocation of memory PM QoS: Correct pr_debug() misuse and improve parameter checks PM: Prevent waiting forever on asynchronous resume after failing suspend
2010-09-11PM / Hibernate: Avoid hitting OOM during preallocation of memoryRafael J. Wysocki
There is a problem in hibernate_preallocate_memory() that it calls preallocate_image_memory() with an argument that may be greater than the total number of available non-highmem memory pages. If that's the case, the OOM condition is guaranteed to trigger, which in turn can cause significant slowdown to occur during hibernation. To avoid that, make preallocate_image_memory() adjust its argument before calling preallocate_image_pages(), so that the total number of saveable non-highem pages left is not less than the minimum size of a hibernation image. Change hibernate_preallocate_memory() to try to allocate from highmem if the number of pages allocated by preallocate_image_memory() is too low. Modify free_unnecessary_pages() to take all possible memory allocation patterns into account. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: M. Vefa Bicakci <bicave@superonline.com>
2010-09-11Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc: Fix a preemption leak in restore_sched_clock_state() sched: Move sched_avg_update() to update_cpu_load()
2010-09-11PM QoS: Correct pr_debug() misuse and improve parameter checksmark gross
Correct some pr_debug() misuse and add a stronger parameter check to pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter for pointing out the problem! Signed-off-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-09-10generic-ipi: Fix deadlock in __smp_call_function_singleHeiko Carstens
Just got my 6 way machine to a state where cpu 0 is in an endless loop within __smp_call_function_single. All other cpus are idle. The call trace on cpu 0 looks like this: __smp_call_function_single scheduler_tick update_process_times tick_sched_timer __run_hrtimer hrtimer_interrupt clock_comparator_work do_extint ext_int_handler ----> timer irq cpu_idle __smp_call_function_single() got called from nohz_balancer_kick() (inlined) with the remote cpu being 1, wait being 0 and the per cpu variable remote_sched_softirq_cb (call_single_data) of the current cpu (0). Then it loops forever when it tries to grab the lock of the call_single_data, since it is already locked and enqueued on cpu 0. My theory how this could have happened: for some reason the scheduler decided to call __smp_call_function_single() on it's own cpu, and sends an IPI to itself. The interrupt stays pending since IRQs are disabled. If then the hypervisor schedules the cpu away it might happen that upon rescheduling both the IPI and the timer IRQ are pending. If then interrupts are enabled again it depends which one gets scheduled first. If the timer interrupt gets delivered first we end up with the local deadlock as seen in the calltrace above. Let's make __smp_call_function_single() check if the target cpu is the current cpu and execute the function immediately just like smp_call_function_single does. That should prevent at least the scenario described here. It might also be that the scheduler is not supposed to call __smp_call_function_single with the remote cpu being the current cpu, but that is a different issue. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <jaxboe@fusionio.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100910114729.GB2827@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-10Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread perf symbols: Fix multiple initialization of symbol system perf: Fix CPU hotplug perf, trace: Fix module leak tracing/kprobe: Fix handling of C-unlike argument names tracing/kprobes: Fix handling of argument names perf probe: Fix handling of arguments names perf probe: Fix return probe support tracing/kprobe: Fix a memory leak in error case tracing: Do not allow llseek to set_ftrace_filter
2010-09-09tracing: t_start: reset FTRACE_ITER_HASH in case of seek/preadChris Wright
Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without having properly started the iterator to iterate the hash. This case is degenerate and, as discovered by Robert Swiecki, can cause t_hash_show() to misuse a pointer. This causes a NULL ptr deref with possible security implications. Tracked as CVE-2010-3079. Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-09swap: revert special hibernation allocationHugh Dickins
Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09kernel/groups.c: fix integer overflow in groups_searchJerome Marchand
gid_t is a unsigned int. If group_info contains a gid greater than MAX_INT, groups_search() function may look on the wrong side of the search tree. This solves some unfair "permission denied" problems. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09cgroups: fix API thinkoMichael S. Tsirkin
Add cgroup_attach_task_all() The existing cgroup_attach_task_current_cg() API is called by a thread to attach another thread to all of its cgroups; this is unsuitable for cases where a privileged task wants to attach itself to the cgroups of a less privileged one, since the call must be made from the context of the target task. This patch adds a more generic cgroup_attach_task_all() API that allows both the source task and to-be-moved task to be specified. cgroup_attach_task_current_cg() becomes a specialization of the more generic new function. [menage@google.com: rewrote changelog] [akpm@linux-foundation.org: address reviewer comments] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Ben Blum <bblum@google.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09gcov: fix null-pointer dereference for certain module typesPeter Oberparleiter
The gcov-kernel infrastructure expects that each object file is loaded only once. This may not be true, e.g. when loading multiple kernel modules which are linked to the same object file. As a result, loading such kernel modules will result in incorrect gcov results while unloading will cause a null-pointer dereference. This patch fixes these problems by changing the gcov-kernel infrastructure so that multiple profiling data sets can be associated with one debugfs entry. It applies to 2.6.36-rc1. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reported-by: Werner Spies <werner.spies@thalesgroup.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09sched: Move sched_avg_update() to update_cpu_load()Suresh Siddha
Currently sched_avg_update() (which updates rt_avg stats in the rq) is getting called from scale_rt_power() (in the load balance context) which doesn't take rq->lock. Fix it by moving the sched_avg_update() to more appropriate update_cpu_load() where the CFS load gets updated as well. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09perf: Fix CPU hotplugPeter Zijlstra
Since we have UP_PREPARE, we should also have UP_CANCELED. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09perf, trace: Fix module leakLi Zefan
Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: gcc-4.6: kernel/*: Fix unused but set warnings mutex: Fix annotations to include it in kernel-locking docbook pid: make setpgid() system call use RCU read-side critical section MAINTAINERS: Add RCU's public git tree
2010-09-08Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Try to handle unknown nmis with an enabled PMU perf, x86: Fix handle_irq return values perf, x86: Fix accidentally ack'ing a second event on intel perf counter oprofile, x86: fix init_sysfs() function stub lockup_detector: Sync touch_*_watchdog back to old semantics tracing: Fix a race in function profile oprofile, x86: fix init_sysfs error handling perf_events: Fix time tracking for events with pid != -1 and cpu != -1 perf: Initialize callchains roots's childen hits oprofile: fix crash when accessing freed task structs
2010-09-08tracing/kprobe: Fix handling of C-unlike argument namesMasami Hiramatsu
Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobes: Fix handling of argument namesMasami Hiramatsu
Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobe: Fix a memory leak in error caseMasami Hiramatsu
Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing: Do not allow llseek to set_ftrace_filterSteven Rostedt
Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask workqueue: fix GCWQ_DISASSOCIATED initialization workqueue: Add a workqueue chapter to the tracepoint docbook workqueue: fix cwq->nr_active underflow workqueue: improve destroy_workqueue() debuggability workqueue: mark lock acquisition on worker_maybe_bind_and_lock() workqueue: annotate lock context change workqueue: free rescuer on destroy_workqueue
2010-09-05gcc-4.6: kernel/*: Fix unused but set warningsAndi Kleen
No real bugs I believe, just some dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: andi@firstfloor.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03mutex: Fix annotations to include it in kernel-locking docbookRandy Dunlap
Fix kernel-doc notation in linux/mutex.h and kernel/mutex.c, then add these 2 files to the kernel-locking docbook as the Mutex API reference chapter. Add one API function to mutex-design.txt and correct a typo in that file. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20100902154816.6cc2f9ad.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01lockup_detector: Sync touch_*_watchdog back to old semanticsDon Zickus
During my rewrite, the semantics of touch_nmi_watchdog and touch_softlockup_watchdog changed enough to break some drivers (mostly over preemptable regions). These are cases where long delays on one CPU (due to print_delay for example) can cause long delays on other CPUs - so we must 'touch' the nmi_watchdog flag of those other CPUs as well. This change brings those touch_*_watchdog() functions back in line with to how they used to work. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: peterz@infradead.org Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31pid: make setpgid() system call use RCU read-side critical sectionPaul E. McKenney
[ 23.584719] [ 23.584720] =================================================== [ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ] [ 23.585176] --------------------------------------------------- [ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection! [ 23.585176] [ 23.585176] other info that might help us debug this: [ 23.585176] [ 23.585176] [ 23.585176] rcu_scheduler_active = 1, debug_locks = 1 [ 23.585176] 1 lock held by rc.sysinit/728: [ 23.585176] #0: (tasklist_lock){.+.+..}, at: [<ffffffff8104771f>] sys_setpgid+0x5f/0x193 [ 23.585176] [ 23.585176] stack backtrace: [ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2 [ 23.585176] Call Trace: [ 23.585176] [<ffffffff8105b436>] lockdep_rcu_dereference+0x99/0xa2 [ 23.585176] [<ffffffff8104c324>] find_task_by_pid_ns+0x50/0x6a [ 23.585176] [<ffffffff8104c35b>] find_task_by_vpid+0x1d/0x1f [ 23.585176] [<ffffffff81047727>] sys_setpgid+0x67/0x193 [ 23.585176] [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b [ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas It turns out that the setpgid() system call fails to enter an RCU read-side critical section before doing a PID-to-task_struct translation. This commit therefore does rcu_read_lock() before the translation, and also does rcu_read_unlock() after the last use of the returned pointer. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Howells <dhowells@redhat.com>
2010-08-31tracing: Fix a race in function profileLi Zefan
While we are reading trace_stat/functionX and someone just disabled function_profile at that time, we can trigger this: divide error: 0000 [#1] PREEMPT SMP ... EIP is at function_stat_show+0x90/0x230 ... This fix just takes the ftrace_profile_lock and checks if rec->counter is 0. If it's 0, we know the profile buffer has been reset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: stable@kernel.org LKML-Reference: <4C723644.4040708@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-31workqueue: use zalloc_cpumask_var() for gcwq->mayday_maskTejun Heo
alloc_mayday_mask() was using alloc_cpumask_var() making gcwq->mayday_mask contain garbage after initialization on CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the previously fixed GCWQ_DISASSOCIATED initialization bug could make rescuers fall into infinite loop trying to bind to an offline cpu. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: CAI Qian <caiqian@redhat.com>
2010-08-31workqueue: fix GCWQ_DISASSOCIATED initializationTejun Heo
init_workqueues() incorrectly marks workqueues for all possible CPUs associated. Combined with mayday_mask initialization bug, this can make rescuers keep trying to bind to an offline gcwq indefinitely. Fix init_workqueues() such that only online CPUs have their gcwqs have GCWQ_DISASSOCIATED cleared. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: CAI Qian <caiqian@redhat.com>
2010-08-30perf_events: Fix time tracking for events with pid != -1 and cpu != -1Stephane Eranian
Per-thread events with a cpu filter, i.e., cpu != -1, were not reporting correct timings when the thread never ran on the monitored cpu. The time enabled was reported as a negative value. This patch fixes the problem by updating tstamp_stopped, tstamp_running in event_sched_out() for events with filters and which are marked as INACTIVE. The function group_sched_out() is modified to systematically call into event_sched_out() to avoid duplicating the timing adjustment code twice. With the patch, I now get: $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles noploop 2 noploop for 2 seconds CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) Signed-off-by: Stephane Eranian <eranian@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@google.com LKML-Reference: <4c76802d.aae9d80a.115d.70fe@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-28Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM QoS: Fix inline documentation. PM QoS: Fix kzalloc() parameters swapped in pm_qos_power_open()
2010-08-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: pxa27x_keypad - remove input_free_device() in pxa27x_keypad_remove() Input: mousedev - fix regression of inverting axes Input: uinput - add devname alias to allow module on-demand load Input: hil_kbd - fix compile error USB: drop tty argument from usb_serial_handle_sysrq_char() Input: sysrq - drop tty argument form handle_sysrq() Input: sysrq - drop tty argument from sysrq ops handlers
2010-08-26PM QoS: Fix inline documentation.Saravana Kannan
Fix the pm_qos_add_request() kerneldoc comment that doesn't reflect the behavior of the function after the last PM QoS update. Signed-off-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-08-25Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag tracing/trace_stack: Fix stack trace on ppc64
2010-08-25Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks
2010-08-25tracing/trace_stack: Fix stack trace on ppc64Anton Blanchard
save_stack_trace() stores the instruction pointer, not the function descriptor. On ppc64 the trace stack code currently dereferences the instruction pointer and shows 8 bytes of instructions in our backtraces: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 0x6000000048000004 1) 5312 160 0x60000000ebad01b0 2) 5152 160 0x2c23000041c20030 3) 4992 240 0x600000007c781b79 4) 4752 160 0xe84100284800000c 5) 4592 192 0x600000002fa30000 6) 4400 256 0x7f1800347b7407e0 7) 4144 208 0xe89f0108f87f0070 8) 3936 272 0xe84100282fa30000 Since we aren't dealing with function descriptors, use %pS instead of %pF to fix it: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 ftrace_call+0x4/0x8 1) 5312 160 .current_io_context+0x28/0x74 2) 5152 160 .get_io_context+0x48/0xa0 3) 4992 240 .cfq_set_request+0x94/0x4c4 4) 4752 160 .elv_set_request+0x60/0x84 5) 4592 192 .get_request+0x2d4/0x468 6) 4400 256 .get_request_wait+0x7c/0x258 7) 4144 208 .__make_request+0x49c/0x610 8) 3936 272 .generic_make_request+0x390/0x434 Signed-off-by: Anton Blanchard <anton@samba.org> Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com LKML-Reference: <20100825013238.GE28360@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25workqueue: fix cwq->nr_active underflowTejun Heo
cwq->nr_active is used to keep track of how many work items are active for the cpu workqueue, where 'active' is defined as either pending on global worklist or executing. This is used to implement the max_active limit and workqueue freezing. If a work item is queued after nr_active has already reached max_active, the work item doesn't increment nr_active and is put on the delayed queue and gets activated later as previous active work items retire. try_to_grab_pending() which is used in the cancellation path unconditionally decremented nr_active whether the work item being cancelled is currently active or delayed, so cancelling a delayed work item makes nr_active underflow. This breaks max_active enforcement and triggers BUG_ON() in destroy_workqueue() later on. This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which is set while a work item in on the delayed list and making try_to_grab_pending() decrement nr_active iff the work item is currently active. The addition of the flag enlarges cwq alignment to 256 bytes which is getting a bit too large. It's scheduled to be reduced back to 128 bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next devel cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Berg <johannes@sipsolutions.net>
2010-08-24Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: watchdog: Don't throttle the watchdog tracing: Fix timer tracing