summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-07-16splice: fix racy pipe->buffers usesEric Dumazet
commit 047fe3605235888f3ebcda0c728cb31937eadfe6 upstream. Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> [bwh: Backported to 3.2: - Adjust context in vmsplice_to_pipe() - Update one more call to splice_shrink_spd(), from skb_splice_bits()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16tracing: change CPU ring buffer state from tracing_cpumaskVaibhav Nagarnaik
commit 71babb2705e2203a64c27ede13ae3508a0d2c16c upstream. According to Documentation/trace/ftrace.txt: tracing_cpumask: This is a mask that lets the user only trace on specified CPUS. The format is a hex string representing the CPUS. The tracing_cpumask currently doesn't affect the tracing state of per-CPU ring buffers. This patch enables/disables CPU recording as its corresponding bit in tracing_cpumask is set/unset. Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laurent Chavey <chavey@google.com> Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16mm: correctly synchronize rss-counters at exit/execKonstantin Khlebnikov
commit 4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091 upstream. do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does put_user(clear_child_tid) which can update task->rss_stat and thus make mm->rss_stat inconsistent. This triggers the "BUG:" printk in check_mm(). Let's fix this bug in the safest way, and optimize/cleanup this later. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22ntp: Correct TAI offset during leap secondRichard Cochran
commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream. When repeating a UTC time value during a leap second (when the UTC time should be 23:59:60), the TAI timescale should not stop. The kernel NTP code increments the TAI offset one second too late. This patch fixes the issue by incrementing the offset during the leap second itself. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()Seiji Aguchi
commit 62be73eafaa045d3233337303fb140f7f8a61135 upstream. This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(), to serialize the crash-logging process via smp_send_stop() and to thus retrieve a more stable crash image of all CPUs stopped. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: dle-develop@lists.sourceforge.net <dle-develop@lists.sourceforge.net> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Tony Luck <tony.luck@intel.com> Cc: a.p.zijlstra@chello.nl <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5C4C569E8A4B9B42A84A977CF070A35B2E4D7A5CE2@USINDEVS01.corp.hds.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22tracing: Have tracing_off() actually turn tracing offSteven Rostedt
commit f2bf1f6f5f89d031245067512449fc889b2f4bb2 upstream. A recent update to have tracing_on/off() only affect the ftrace ring buffers instead of all ring buffers had a cut and paste error. The tracing_off() did the exact same thing as tracing_on() and would not actually turn off tracing. Unfortunately, tracing_off() is more important to be working than tracing_on() as this is a key development tool, as it lets the developer turn off tracing as soon as a problem is discovered. It is also used by panic and oops code. This bug also breaks the 'echo func:traceoff > set_ftrace_filter' Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17sched: Fix the relax_domain_level boot parameterDimitri Sivanich
commit a841f8cef4bb124f0f5563314d0beaf2e1249d72 upstream. It does not get processed because sched_domain_level_max is 0 at the time that setup_relax_domain_level() is run. Simply accept the value as it is, as we don't know the value of sched_domain_level_max until sched domain construction is completed. Fix sched_relax_domain_level in cpuset. The build_sched_domain() routine calls the set_domain_attribute() routine prior to setting the sd->level, however, the set_domain_attribute() routine relies on the sd->level to decide whether idle load balancing will be off/on. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecondJohn Stultz
commit fad0c66c4bb836d57a5f125ecd38bed653ca863a upstream. Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC. Adjust wall_to_monotonic when NTP inserted a leapsecond. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10mm/fork: fix overflow in vma length when copying mmap on cloneSiddhesh Poyarekar
commit 7edc8b0ac16cbaed7cb4ea4c6b95ce98d2997e84 upstream. The vma length in dup_mmap is calculated and stored in a unsigned int, which is insufficient and hence overflows for very large maps (beyond 16TB). The following program demonstrates this: #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #define GIG 1024 * 1024 * 1024L #define EXTENT 16393 int main(void) { int i, r; void *m; char buf[1024]; for (i = 0; i < EXTENT; i++) { m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); if (m == (void *)-1) printf("MMAP Failed: %d\n", m); else printf("%d : MMAP returned %p\n", i, m); r = fork(); if (r == 0) { printf("%d: successed\n", i); return 0; } else if (r < 0) printf("FORK Failed: %d\n", r); else if (r > 0) wait(NULL); } return 0; } Increase the storage size of the result to unsigned long, which is sufficient for storing the difference between addresses. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is ↵Tejun Heo
active commit 544ecf310f0e7f51fa057ac2a295fc1b3b35a9d3 upstream. worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running isn't zero when every worker is idle. This can trigger spuriously while a cpu is going down due to the way trustee sets %WORKER_ROGUE and zaps nr_running. It first sets %WORKER_ROGUE on all workers without updating nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and then zaps nr_running. If the last running worker enters idle inbetween, it would see stale nr_running which hasn't been zapped yet and trigger the WARN_ON_ONCE(). Fix it by performing the sanity check iff the trustee is idle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-17Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf, x86 and scheduler updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing: Do not enable function event with enable perf stat: handle ENXIO error for perf_event_open perf: Turn off compiler warnings for flex and bison generated files perf stat: Fix case where guest/host monitoring is not supported by kernel perf build-id: Fix filename size calculation * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable x86: Fix section annotation of acpi_map_cpu2node() x86/microcode: Ensure that module is only loaded on supported Intel CPUs * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-15genirq: export handle_edge_irq() and irq_to_desc()Jiri Kosina
Export handle_edge_irq() and irq_to_desc() to modules to allow them to do things such as __irq_set_handler_locked(...., handle_edge_irq); This fixes ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined! ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined! when gpio-pch is being built as a module. This was introduced by commit df9541a60af0 ("gpio: pch9: Use proper flow type handlers") that added __irq_set_handler_locked(d->irq, handle_edge_irq); but handle_edge_irq() was not exported for modules (and inlined __irq_set_handler_locked() requires irq_to_desc() exported as well) Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10namespaces, pid_ns: fix leakage on fork() failureMike Galbraith
Fork() failure post namespace creation for a child cloned with CLONE_NEWPID leaks pid_namespace/mnt_cache due to proc being mounted during creation, but not unmounted during cleanup. Call pid_ns_release_proc() during cleanup. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10tracing: Do not enable function event with enableSteven Rostedt
With the adding of function tracing event to perf, it caused a side effect that produces the following warning when enabling all events in ftrace: # echo 1 > /sys/kernel/debug/tracing/events/enable [console] event trace: Could not enable event function This is because when enabling all events via the debugfs system it ignores events that do not have a ->reg() function assigned. This was to skip over the ftrace internal events (as they are not TRACE_EVENTs). But as the ftrace function event now has a ->reg() function attached to it for use with perf, it is no longer ignored. Worse yet, this ->reg() function is being called when it should not be. It returns an error and causes the above warning to be printed. By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE) and have all ftrace internel event structures have it set, setting the events/enable will no longe try to incorrectly enable the function event and does not warn. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-10compat: Fix RT signal mask corruption via sigprocmaskJan Kiszka
compat_sys_sigprocmask reads a smaller signal mask from userspace than sigprogmask accepts for setting. So the high word of blocked.sig[0] will be cleared, releasing any potentially blocked RT signal. This was discovered via userspace code that relies on get/setcontext. glibc's i386 versions of those functions use sigprogmask instead of rt_sigprogmask to save/restore signal mask and caused RT signal unblocking this way. As suggested by Linus, this replaces the sys_sigprocmask based compat version with one that open-codes the required logic, including the merge of the existing blocked set with the new one provided on SIG_SETMASK. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-09sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list ↵Igor Mammedov
assumption If we have one cpu that failed to boot and boot cpu gave up on waiting for it and then another cpu is being booted, kernel might crash with following OOPS: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff812c3630>] __bitmap_weight+0x30/0x80 Call Trace: [<ffffffff8108b9b6>] build_sched_domains+0x7b6/0xa50 The crash happens in init_sched_groups_power() that expects sched_groups to be circular linked list. However it is not always true, since sched_groups preallocated in __sdt_alloc are initialized in build_sched_groups and it may exit early if (cpu != cpumask_first(sched_domain_span(sd))) return 0; without initializing sd->groups->next field. Fix bug by initializing next field right after sched_group was allocated. Also-Reported-by: Jiang Liu <liuj97@gmail.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Cc: a.p.zijlstra@chello.nl Cc: pjt@google.com Cc: seto.hidetoshi@jp.fujitsu.com Link: http://lkml.kernel.org/r/1336559908-32533-1-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-29Merge tag 'pm-for-3.4-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael J. Wysocki: "Fix for an issue causing hibernation to hang on systems with highmem (that practically means i386) due to broken memory management (bug introduced in 3.2, so -stable material) and PM documentation update making the freezer documentation follow the code again after some recent updates." * tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Freezer / Docs: Update documentation about freezing of tasks PM / Hibernate: fix the number of pages used for hibernate/thaw buffering
2012-04-27Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Permit call_rcu() from CPU_DYING notifiers
2012-04-27Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix OOPS when build_sched_domains() percpu allocation fails sched: Fix more load-balancing fallout
2012-04-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_event_for_each() to use sibling perf symbols: Read plt symbols from proper symtab_type binary tracing: Fix stacktrace of latency tracers (irqsoff and friends) perf tools: Add 'G' and 'H' modifiers to event parsing tracing: Fix regression with tracing_on perf tools: Drop CROSS_COMPILE from flex and bison calls perf report: Fix crash showing warning related to kernel maps tracing: Fix build breakage without CONFIG_PERF_EVENTS (again)
2012-04-27Merge branch 'for-v3.4-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull build fixes for less mainstream architectures from Paul Gortmaker: "These are fixes for frv(1), blackfin(2), powerpc(1) and xtensa(4). Fortunately the touches are nearly all specific to files just used by the arch in question. The two touches to shared/common files [kernel/irq/debug.h and drivers/pci/Makefile] are trivial to assess as no risk to anyone. Half of them relate to xtensa directly. It was only when I fixed the last xtensa issue that I realized that the arch has been broken for a significant time, and isn't a specific v3.4 regression. So if you wanted, we could leave xtensa lying bleeding in the street for a couple more weeks and queue those for 3.5. But given they are no risk to anyone outside of xtensa, I figured to just leave them in. If you are OK with taking the xtensa fixes, then please pull to get: - one last implicit include uncovered by system.h that is in a file specific to just one powerpc defconfig. (I'd sync'd with BenH). - fix an oversight in the PCI makefile where shared code wasn't being compiled for ARCH=frv - fix a missing include for GPIO in blackfin framebuffer. - audit and tag endif in blackfin ezkit board file, in order to find and fix the misplaced endif masking a block of code. - fix irq/debug.h choice of temporary macro names to be more internal so they don't conflict with names used by xtensa. - fix a reference to an undeclared local var in xtensa's signal.c - fix an implicit bug.h usage in xtensa's asm/io.h uncovered by my removing bug.h from kernel.h - fix xtensa to properly indicate it is using asm-generic/hardirq.h in order to resolve the link error - undefined ack_bad_irq The xtensa still fails final link as my latest binutils does something evil when ld forward-relocates unlikely() blocks, but in theory people who have older/valid toolchains could now use the thing." * 'for-v3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: xtensa: fix build fail on undefined ack_bad_irq blackfin: fix ifdef fustercluck in mach-bf538/boards/ezkit.c blackfin: fix compile error in bfin-lq035q1-fb.c pci: frv architecture needs generic setup-bus infrastructure irq: hide debug macros so they don't collide with others. xtensa: fix build error in xtensa/include/asm/io.h xtensa: fix build failure in xtensa/kernel/signal.c powerpc: fix system.h fallout in sysdev/scom.c [chroma_defconfig]
2012-04-26perf: Fix perf_event_for_each() to use siblingMichael Ellerman
In perf_event_for_each() we call a function on an event, and then iterate over the siblings of the event. However we don't call the function on the siblings, we call it repeatedly on the original event - it seems "obvious" that we should be calling it with sibling as the argument. It looks like this broke in commit 75f937f24bd9 ("Fix ctx->mutex vs counter->mutex inversion"). The only effect of the bug is that the PERF_IOC_FLAG_GROUP parameter to the ioctls doesn't work. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1334109253-31329-1-git-send-email-michael@ellerman.id.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26sched: Fix OOPS when build_sched_domains() percpu allocation failshe, bo
Under extreme memory used up situations, percpu allocation might fail. We hit it when system goes to suspend-to-ram, causing a kworker panic: EIP: [<c124411a>] build_sched_domains+0x23a/0xad0 Kernel panic - not syncing: Fatal exception Pid: 3026, comm: kworker/u:3 3.0.8-137473-gf42fbef #1 Call Trace: [<c18cc4f2>] panic+0x66/0x16c [...] [<c1244c37>] partition_sched_domains+0x287/0x4b0 [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210 [<c123712d>] cpuset_cpu_inactive+0x1d/0x30 [...] With this fix applied build_sched_domains() will return -ENOMEM and the suspend attempt fails. Signed-off-by: he, bo <bo.he@intel.com> Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo [ So, we fail to deallocate a CPU because we cannot allocate RAM :-/ I don't like that kind of sad behavior but nevertheless it should not crash under high memory load. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26sched: Fix more load-balancing falloutPeter Zijlstra
Commits 367456c756a6 ("sched: Ditch per cgroup task lists for load-balancing") and 5d6523ebd ("sched: Fix load-balance wreckage") left some more wreckage. By setting loop_max unconditionally to ->nr_running load-balancing could take a lot of time on very long runqueues (hackbench!). So keep the sysctl as max limit of the amount of tasks we'll iterate. Furthermore, the min load filter for migration completely fails with cgroups since inequality in per-cpu state can easily lead to such small loads :/ Furthermore the change to add new tasks to the tail of the queue instead of the head seems to have some effect.. not quite sure I understand why. Combined these fixes solve the huge hackbench regression reported by Tim when hackbench is ran in a cgroup. Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1335365763.28150.267.camel@twins [ got rid of the CONFIG_PREEMPT tuning and made small readability edits ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-25Merge branch 'tip/perf/urgent-2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent
2012-04-25Merge branch 'rcu/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
2012-04-24PM / Hibernate: fix the number of pages used for hibernate/thaw bufferingBojan Smojver
Hibernation regression fix, since 3.2. Calculate the number of required free pages based on non-high memory pages only, because that is where the buffers will come from. Commit 081a9d043c983f161b78fdc4671324d1342b86bc introduced a new buffer page allocation logic during hibernation, in order to improve the performance. The amount of pages allocated was calculated based on total amount of pages available, although only non-high memory pages are usable for this purpose. This caused hibernation code to attempt to over allocate pages on platforms that have high memory, which led to hangs. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
2012-04-23irq: hide debug macros so they don't collide with others.Paul Gortmaker
The file kernel/irq/debug.h temporarily defines P, PS, PD and then undefines them. However these names aren't really "internal" enough, and collide with other more legit users such as the ones in the xtensa arch, causing: In file included from kernel/irq/internals.h:58:0, from kernel/irq/irqdesc.c:18: kernel/irq/debug.h:8:0: warning: "PS" redefined [enabled by default] arch/xtensa/include/asm/regs.h:59:0: note: this is the location of the previous definition Add a handful of underscores to do a better job of hiding these temporary macros. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19tracing: Fix stacktrace of latency tracers (irqsoff and friends)Steven Rostedt
While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC), we discovered that the stacktrace output of the latency tracers (preemptirqsoff) was empty. This bug was caused by the creation of the dynamic length stack trace again (like commit 12b5da3 "tracing: Fix ent_size in trace output" was). This bug is caused by the latency tracers requiring the next event to determine the time between the current event and the next. But by grabbing the next event, the iter->ent_size is set to the next event instead of the current one. As the stacktrace event is the last event, this makes the ent_size zero and causes nothing to be printed for the stack trace. The dynamic stacktrace uses the ent_size to determine how much of the stack can be printed. The ent_size of zero means no stack. The simple fix is to save the iter->ent_size before finding the next event. Note, mirage335 asked to remain anonymous from LKML and git, so I will not add the Reported-by and Tested-by tags, even though he did report the issue and tested the fix. Cc: stable@vger.kernel.org # 3.1+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-19tick: Fix the spurious broadcast timer ticks after resumeSuresh Siddha
During resume, tick_resume_broadcast() programs the broadcast timer in oneshot mode unconditionally. On the platforms where broadcast timer is not really required, this will generate spurious broadcast timer ticks upon resume. For example, on the always running apic timer platforms with HPET, I see spurious hpet tick once every ~5minutes (which is the 32-bit hpet counter wraparound time). Similar to boot time, during resume make the oneshot mode setting of the broadcast clock event device conditional on the state of active broadcast users. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: svenjoac@gmx.de Cc: torvalds@linux-foundation.org Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-19tick: Ensure that the broadcast device is initializedThomas Gleixner
Santosh found another trap when we avoid to initialize the broadcast device in the switch_to_oneshot code. The broadcast device might be still in SHUTDOWN state when we actually need to use it. That obviously breaks, as set_next_event() is called on a shutdown device. This did not break on x86, but Suresh analyzed it: From the review, most likely on Sven's system we are force enabling the hpet using the pci quirk's method very late. And in this case, hpet_clockevent (which will be global_clock_event) handler can be null, specifically as this platform might not be using deeper c-states and using the reliable APIC timer. Prior to commit 'fa4da365bc7772c', that handler will be set to 'tick_handle_oneshot_broadcast' when we switch the broadcast timer to oneshot mode, even though we don't use it. Post commit 'fa4da365bc7772c', we stopped switching the broadcast mode to oneshot as this is not really needed and his platform's global_clock_event's handler will remain null. While on my SNB laptop, same is set to 'clockevents_handle_noop' because hpet gets enabled very early. (noop handler on my platform set when the early enabled hpet timer gets replaced by the lapic timer). But the commit 'fa4da365bc7772c' tracked the broadcast timer mode in the SW as oneshot, even though it didn't touch the HW timer. During resume however, tick_resume_broadcast() saw the SW broadcast mode as oneshot and actually programmed the broadcast device also into oneshot mode. So this triggered the null pointer de-reference after the hpet wraps around and depending on what the hpet counter is set to. On the normal platforms where hpet gets enabled early we should be seeing a spurious interrupt (in my SNB laptop I see one spurious interrupt after around 5 minutes ;) which is 32-bit hpet counter wraparound time), but that's a separate issue. Enforce the mode setting when trying to set an event. Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: torvalds@linux-foundation.org Cc: svenjoac@gmx.de Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos
2012-04-18tick: Fix oneshot broadcast setup reallyThomas Gleixner
Sven Joachim reported, that suspend/resume on rc3 trips over a NULL pointer dereference. Linus spotted the clockevent handler being NULL. commit fa4da365b(clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()) tried to fix a problem with the broadcast device setup, which was introduced in commit 77b0d60c5( clockevents: Leave the broadcast device in shutdown mode when not needed). The initial commit avoided to set up the broadcast device when no broadcast request bits were set, but that left the broadcast device disfunctional. In consequence deep idle states which need the broadcast device were not woken up. commit fa4da365b tried to fix that by initializing the state of the broadcast facility, but that missed the fact, that nothing initializes the event handler and some other state of the underlying clock event device. The fix is to revert both commits and make only the mode setting of the clock event device conditional on the state of active broadcast users. That initializes everything except the low level device mode, but this happens when the broadcast functionality is invoked by deep idle. Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos
2012-04-17rcu: Permit call_rcu() from CPU_DYING notifiersPaul E. McKenney
As of: 29494be71afe ("rcu,cleanup: simplify the code when cpu is dying") RCU adopts callbacks from the dying CPU in its CPU_DYING notifier, which means that any callbacks posted by later CPU_DYING notifiers are ignored until the CPU comes back online. A WARN_ON_ONCE() was added to __call_rcu() by: e56014000816 ("rcu: Simplify offline processing") to check for this condition. Although this condition did not trigger (at least as far as I know) during -next testing, it did recently trigger in mainline: https://lkml.org/lkml/2012/4/2/34 What is needed longer term is for RCU's CPU_DEAD notifier to adopt any callbacks that were posted by CPU_DYING notifiers, however, the Linux kernel has been running with this sort of thing happening for quite some time. So the only thing that qualifies as a regression is the WARN_ON_ONCE(), which this commit removes. Making RCU's CPU_DEAD notifier adopt callbacks posted by CPU_DYING notifiers is a topic for the 3.5 release of the Linux kernel. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-16tracing: Fix regression with tracing_onSteven Rostedt
The change to make tracing_on affect only the ftrace ring buffer, caused a bug where it wont affect any ring buffer. The problem was that the buffer of the trace_array was passed to the write function and not the trace array itself. The trace_array can change the buffer when running a latency tracer. If this happens, then the buffer being disabled may not be the buffer currently used by ftrace. This will cause the tracing_on file to become useless. The simple fix is to pass the trace_array to the write function instead of the buffer. Then the actual buffer may be changed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-13Merge branch 'systemh-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull system.h fixups for less common arch's from Paul Gortmaker: "Here is what is hopefully the last of the system.h related fixups. The fixes for Alpha and ia64 are code relocations consistent with what was done for the more mainstream architectures. Note that the diffstat lines removed vs lines added are not the same since I've fixed some of the whitespace issues in the relocated code blocks. However they are functionally the same. Compile tested locally, plus these two have been in linux-next for a while. There is also a trivial one line system.h related fix for the Tilera arch from Chris Metcalf to fix an implict include.." * 'systemh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: irq_work: fix compile failure on tile from missing include ia64: populate the cmpxchg header with appropriate code alpha: fix build failures from system.h dismemberment
2012-04-13tracing: Fix build breakage without CONFIG_PERF_EVENTS (again)Mark Brown
Today's -next fails to link for me: kernel/built-in.o:(.data+0x178e50): undefined reference to `perf_ftrace_event_register' It looks like multiple fixes have been merged for the issue fixed by commit fa73dc9 (tracing: Fix build breakage without CONFIG_PERF_EVENTS) though I can't identify the other changes that have gone in at the minute, it's possible that the changes which caused the breakage fixed by the previous commit got dropped but the fix made it in. Link: http://lkml.kernel.org/r/1334307179-21255-1-git-send-email-broonie@opensource.wolfsonmicro.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-13irq_work: fix compile failure on tile from missing includeChris Metcalf
Building with IRQ_WORK configured results in kernel/irq_work.c: In function ‘irq_work_run’: kernel/irq_work.c:110: error: implicit declaration of function ‘irqs_disabled’ The appropriate header just needs to be included. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-12Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull a fix for the recent irqdomain bug fixes from Grant Likely: "I flubbed one patch in the last pull request which broke a format string on 64 bit platforms. Here's the fix." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: irq_domain: fix type mismatch in debugfs output format
2012-04-12irq_domain: fix type mismatch in debugfs output formatGrant Likely
sizeof(void*) returns an unsigned long, but it was being used as a width parameter to a "%-*s" format string which requires an int. On 64 bit platforms this causes a type mismatch: linux/kernel/irq/irqdomain.c:575: warning: field width should have type 'int', but argument 6 has type 'long unsigned int' This change casts the size to an int so printf gets the right data type. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: David Daney <david.daney@cavium.com>
2012-04-12Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "The itimer removal one is not strictly a fix, but I really wanted to avoid a rebase of the urgent ones." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "clocksource: Load the ACPI PM clocksource asynchronously" clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot() itimer: Use printk_once instead of WARN_ONCE nohz: Fix stale jiffies update in tick_nohz_restart() tick: Document TICK_ONESHOT config option proc: stats: Use arch_idle_time for idle and iowait times if available itimer: Schedule silent NULL pointer fixup in setitimer() for removal
2012-04-12Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge fixes from Andrew Morton. * emailed from Andrew Morton <akpm@linux-foundation.org>: (14 patches) panic: fix stack dump print on direct call to panic() drivers/rtc/rtc-pl031.c: enable clock on all ST variants Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()" hugetlb: fix race condition in hugetlb_fault() drivers/rtc/rtc-twl.c: use static register while reading time drivers/rtc/rtc-s3c.c: add placeholder for driver private data drivers/rtc/rtc-s3c.c: fix compilation error MAINTAINERS: add PCDP console maintainer memcg: do not open code accesses to res_counter members drivers/rtc/rtc-efi.c: fix section mismatch warning drivers/rtc/rtc-r9701.c: reset registers if invalid values are detected drivers/char/random.c: fix boot id uniqueness race memcg: fix broken boolen expression memcg: fix up documentation on global LRU
2012-04-12panic: fix stack dump print on direct call to panic()Jason Wessel
Commit 6e6f0a1f0fa6 ("panic: don't print redundant backtraces on oops") causes a regression where no stack trace will be printed at all for the case where kernel code calls panic() directly while not processing an oops, and of course there are 100's of instances of this type of call. The original commit executed the check (!oops_in_progress), but this will always be false because just before the dump_stack() there is a call to bust_spinlocks(1), which does the following: void __attribute__((weak)) bust_spinlocks(int yes) { if (yes) { ++oops_in_progress; The proper way to resolve the problem that original commit tried to solve is to avoid printing a stack dump from panic() when the either of the following conditions is true: 1) TAINT_DIE has been set (this is done by oops_end()) This indicates and oops has already been printed. 2) oops_in_progress > 1 This guards against the rare case where panic() is invoked a second time, or in between oops_begin() and oops_end() Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-12Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull irqdomain bug fixes from Grant Likely: "This branch fixes a bug in irq_create_mapping() where an error return from irq_alloc_desc_from() gets ignored. It also removes irq_virq_count to fix a bug on powerpc where the irqdomain code does not find irqs allocated above the CONFIG_NR_IRQS boundary. The remaining patches get rid of an completely pointless export and fix some minor bugs in the irqdomain debug output." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: irq_domain: Move irq_virq_count into NOMAP revmap irqdomain: Fix debugfs formatting irq_domain: correct the debugfs file name irq: Kill pointless irqd_to_hw export irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from().
2012-04-12irq_domain: Move irq_virq_count into NOMAP revmapGrant Likely
This patch replaces the old global setting of irq_virq_count that is only used by the NOMAP mapping and instead uses a revmap_data property so that the maximum NOMAP allocation can be set per NOMAP irq_domain. There is exactly one user of irq_virq_count in-tree right now: PS3. Also, irq_virq_count is only useful for the NOMAP mapping. So, instead of having a single global irq_virq_count values, this change drops it entirely and added a max_irq argument to irq_domain_add_nomap(). That makes it a property of an individual nomap irq domain instead of a global system settting. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Milton Miller <miltonm@bga.com>
2012-04-11cred: copy_process() should clear child->replacement_session_keyringOleg Nesterov
keyctl_session_to_parent(task) sets ->replacement_session_keyring, it should be processed and cleared by key_replace_session_keyring(). However, this task can fork before it notices TIF_NOTIFY_RESUME and the new child gets the bogus ->replacement_session_keyring copied by dup_task_struct(). This is obviously wrong and, if nothing else, this leads to put_cred(already_freed_cred). change copy_creds() to clear this member. If copy_process() fails before this point the wrong ->replacement_session_keyring doesn't matter, exit_creds() won't be called. Cc: <stable@vger.kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-11irqdomain: Fix debugfs formattingGrant Likely
This patch fixes the irq_domain_mapping debugfs output to pad pointer values with leading zeros so that pointer values are displayed correctly. Otherwise you get output similar to "0x 5e0000000000000". Also, when the irq_domain is set to 'null' Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: David Daney <david.daney@cavium.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2012-04-10irq_domain: correct the debugfs file nameMika Westerberg
The actual name of the irq_domain mapping debugfs file is "irq_domain_mapping" not "virq_mapping". Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-04-10irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from().David Daney
In commit 4bbdd45a (irq_domain/powerpc: eliminate irq_map; use irq_alloc_desc() instead) code was added that ignores error returns from irq_alloc_desc_from() by (silently) casting the return value to unsigned. The negitive value error return now suddenly looks like a valid irq number. Commits cc79ca69 (irq_domain: Move irq_domain code from powerpc to kernel/irq) and 1bc04f2c (irq_domain: Add support for base irq and hwirq in legacy mappings) move this code to its current location in irqdomain.c The result of all of this is a null pointer dereference OOPS if one of the error cases is hit. The fix: Don't cast away the negativeness of the return value and then check for errors. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Rob Herring <rob.herring@calxeda.com> [grant.likely: dropped addition of new 'irq' variable] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-04-10clockevents: tTack broadcast device mode change in ↵Suresh Siddha
tick_broadcast_switch_to_oneshot() In the commit 77b0d60c5adf39c74039e2142a1d3cd1e4d53799, "clockevents: Leave the broadcast device in shutdown mode when not needed", we were bailing out too quickly in tick_broadcast_switch_to_oneshot(), with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'. This breaks the platforms which need broadcast device oneshot services during deep idle states. tick_broadcast_oneshot_control() thinks that it is in periodic mode and fails to take proper decisions based on the CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep idle entry/exit. Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT', before leaving the broadcast HW device in shutdown mode if there are no active requests for the moment. Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: johnstul@us.ibm.com Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-10itimer: Use printk_once instead of WARN_ONCEThomas Gleixner
David pointed out, that WARN_ONCE() to report usage of an deprecated misfeature make folks unhappy. Use printk_once() instead. Andrew told me to stop grumbling and to remove the silly typecast while touching the file. Reported-by: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>