summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-10-29perf: Fix perf ring buffer memory orderingPeter Zijlstra
The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky <victork@il.ibm.com> Tested-by: Victor Kaplansky <victork@il.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: michael@ellerman.id.au Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-27Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "This tree contains a clockevents regression fix for certain ARM subarchitectures" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Sanitize ticks to nsec conversion
2013-10-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The tree contains three fixes: - Two tooling fixes - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in this merge window. (Patches were proposed to fix it but it was all a bit late and we felt it's safer to just delay the ABI one more kernel release and do it right)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Disable PERF_RECORD_MMAP2 support perf scripting perl: Fix build error on Fedora 12 perf probe: Fix to initialize fname always before use it
2013-10-27Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on kernels built with GCC 3.x (there are still such distros)" Side note: it's not just a fix for old gcc versions, it's also removing an incredibly broken/subtle check that LLVM had issues with, and that made no sense. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-26Merge tag 'pm+acpi-3.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from "These fix two bugs in the intel_pstate driver, a hibernate bug leading to nasty resume failures sometimes and acpi-cpufreq initialization bug that causes problems to happen during module unload when intel_pstate is in use. Specifics: - Fix for rounding errors in intel_pstate causing CPU utilization to be underestimated from Brennan Shacklett. - intel_pstate fix to always use the correct max pstate value when computing the min pstate from Dirk Brandewie. - Hibernation fix for deadlocking resume in cases when the probing of the device containing the image is deferred from Russ Dill. - acpi-cpufreq fix to prevent the module from staying in memory when the driver cannot be registered and then attempting to unregister things that have never been registered on exit" * tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: acpi-cpufreq: Fail initialization if driver cannot be registered PM / hibernate: Move software_resume to late_initcall_sync intel_pstate: Correct calculation of min pstate value intel_pstate: Improve accuracy by not truncating until final result
2013-10-25PM / hibernate: Move software_resume to late_initcall_syncRuss Dill
software_resume is being called after deferred_probe_initcall in drivers base. If the probing of the device that contains the resume image is deferred, and the system has been instructed to wait for it to show up, this wait will occur in software_resume. This causes a deadlock. Move software_resume into late_initcall_sync so that it happens after all the other late_initcalls. Signed-off-by: Russ Dill <Russ.Dill@ti.com> Acked-by: Pavel Machek <Pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-23clockevents: Sanitize ticks to nsec conversionThomas Gleixner
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use clockevents_config_and_register() where possible" caused a regression for some of the converted subarchs. The reason is, that the clockevents core code converts the minimal hardware tick delta to a nanosecond value for core internal usage. This conversion is affected by integer math rounding loss, so the backwards conversion to hardware ticks will likely result in a value which is less than the configured hardware limitation. The affected subarchs used their own workaround (SIGH!) which got lost in the conversion. The solution for the issue at hand is simple: adding evt->mult - 1 to the shifted value before the integer divison in the core conversion function takes care of it. But this only works for the case where for the scaled math mult/shift pair "mult <= 1 << shift" is true. For the case where "mult > 1 << shift" we can apply the rounding add only for the minimum delta value to make sure that the backward conversion is not less than the given hardware limit. For the upper bound we need to omit the rounding add, because the backwards conversion is always larger than the original latch value. That would violate the upper bound of the hardware device. Though looking closer at the details of that function reveals another bogosity: The upper bounds check is broken as well. Checking for a resulting "clc" value greater than KTIME_MAX after the conversion is pointless. The conversion does: u64 clc = (latch << evt->shift) / evt->mult; So there is no sanity check for (latch << evt->shift) exceeding the 64bit boundary. The latch argument is "unsigned long", so on a 64bit arch the handed in argument could easily lead to an unnoticed shift overflow. With the above rounding fix applied the calculation before the divison is: u64 clc = (latch << evt->shift) + evt->mult - 1; So we need to make sure, that neither the shift nor the rounding add is overflowing the u64 boundary. [ukl: move assignment to rnd after eventually changing mult, fix build issue and correct comment with the right math] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: nicolas.ferre@atmel.com Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: john.stultz@linaro.org Cc: kernel@pengutronix.de Cc: Ronald Wahl <ronald.wahl@raritan.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2013-10-22Merge branch 'for-3.12-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two late fixes for cgroup. One fixes descendant walk introduced during this rc1 cycle. The other fixes a post 3.9 bug during task attach which can lead to hang. Both fixes are critical and the fixes are relatively straight-forward" * 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix to break the while loop in cgroup_attach_task() correctly cgroup: fix cgroup post-order descendant walk of empty subtree
2013-10-20Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: " * Fix build error on Fedora 12. * Fix to initialize fname always before use it, bug introduced during this merge window, from Masami Hiramatsu. * Disable PERF_RECORD_MMAP2 support, from Stephane Eranian. " Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-18mutex: Avoid gcc version dependent __builtin_constant_p() usageTetsuo Handa
Commit 040a0a37 ("mutex: Add support for wound/wait style locks") used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot handle such expression correctly, leading to boot failure when built with CONFIG_DEBUG_MUTEXES=y. Fix it by explicitly passing a bool which tells whether p != NULL or not. [ PeterZ: This is a sad patch, but provided it actually generates similar code I suppose its the best we can do bar whole sale deprecating gcc-3. ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: peterz@infradead.org Cc: imirkin@alum.mit.edu Cc: daniel.vetter@ffwll.ch Cc: robdclark@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-17perf: Disable PERF_RECORD_MMAP2 supportStephane Eranian
For now, we disable the extended MMAP record support (MMAP2). We have identified cases where it would not report the correct mapping information, clone(VM_CLONE) but with separate pids. We will revisit the support once we find a solution for this case. The patch changes the kernel to return EINVAL if attr->mmap2 is set. The patch also modifies the perf tool to use regular PERF_RECORD_MMAP for synthetic events and it also prevents the tool from requesting attr->mmap2 mode because the kernel would reject it. The support will be revisited once the kenrel interface is updated. In V2, we reduce the patch to the strict minimum. In V3, we avoid calling perf_event_open() with mmap2 set because we know it will fail and require fallback retry. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-13cgroup: fix to break the while loop in cgroup_attach_task() correctlyAnjana V Kumar
Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Cc: <stable@vger.kernel.org> # 3.9+ Reported-by: Eunki Kim <eunki_kim@samsung.com> Reported-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-10-08Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixlets: On the kernel side: - fix a race - fix a bug in the handling of the perf ring-buffer data page On the tooling side: - fix the handling of certain corrupted perf.data files - fix a bug in 'perf probe' - fix a bug in 'perf record + perf sched' - fix a bug in 'make install' - fix a bug in libaudit feature-detection on certain distros" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf session: Fix infinite loop on invalid perf.data file perf tools: Fix installation of libexec components perf probe: Fix to find line information for probe list perf tools: Fix libaudit test perf stat: Set child_pid after perf_evlist__prepare_workload() perf tools: Add default handler for mmap2 events perf/x86: Clean up cap_user_time* setting perf: Fix perf_pmu_migrate_context
2013-10-04Merge tag 'pm+acpi-3.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - The resume part of user space driven hibernation (s2disk) is now broken after the change that moved the creation of memory bitmaps to after the freezing of tasks, because I forgot that the resume utility loaded the image before freezing tasks and needed the bitmaps for that. The fix adds special handling for that case. - One of recent commits changed the export of acpi_bus_get_device() to EXPORT_SYMBOL_GPL(), which was technically correct but broke existing binary modules using that function including one in particularly widespread use. Change it back to EXPORT_SYMBOL(). - The intel_pstate driver sometimes fails to disable turbo if its no_turbo sysfs attribute is set. Fix from Srinivas Pandruvada. - One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0 which still (incorrectly) treats non-NULL as non-error. Fix from Philipp Zabel. - The SPEAr cpufreq driver uses a wrong variable type in one place preventing it from catching errors returned by one of the functions called by it. Fix from Sachin Kamat. * tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: Use EXPORT_SYMBOL() for acpi_bus_get_device() intel_pstate: fix no_turbo cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2 cpufreq: SPEAr: Fix incorrect variable type PM / hibernate: Fix user space driven resume regression
2013-10-04perf: Fix perf_pmu_migrate_contextPeter Zijlstra
While auditing the list_entry usage due to a trinity bug I found that perf_pmu_migrate_context violates the rules for perf_event::event_entry. The problem is that perf_event::event_entry is a RCU list element, and hence we must wait for a full RCU grace period before re-using the element after deletion. Therefore the usage in perf_pmu_migrate_context() which re-uses the entry immediately is broken. For now introduce another list_head into perf_event for this specific usage. This doesn't actually fix the trinity report because that never goes through this code. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-mkj72lxagw1z8fvjm648iznw@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02Merge branch 'irq/urgent-v2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/urgent Pull a hardirq-nesting fix from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-01irq: Force hardirq exit's softirq processing on its own stackFrederic Weisbecker
The commit facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b ("irq: Sanitize invoke_softirq") converted irq exit calls of do_softirq() to __do_softirq() on all architectures, assuming it was only used there for its irq disablement properties. But as a side effect, the softirqs processed in the end of the hardirq are always called on the inline current stack that is used by irq_exit() instead of the softirq stack provided by the archs that override do_softirq(). The result is mostly safe if the architecture runs irq_exit() on a separate irq stack because then softirqs are processed on that same stack that is near empty at this stage (assuming hardirq aren't nesting). Otherwise irq_exit() runs in the task stack and so does the softirq too. The interrupted call stack can be randomly deep already and the softirq can dig through it even further. To add insult to the injury, this softirq can be interrupted by a new hardirq, maximizing the chances for a stack overrun as reported in powerpc for example: do_IRQ: stack overflow: 1920 CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1 Call Trace: [c0000000050a8740] .show_stack+0x130/0x200 (unreliable) [c0000000050a8810] .dump_stack+0x28/0x3c [c0000000050a8880] .do_IRQ+0x2b8/0x2c0 [c0000000050a8930] hardware_interrupt_common+0x154/0x180 --- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp] LR = .cp_start_xmit+0x390/0x820 [8139cp] [c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640 [c0000000050a8e00] .sch_direct_xmit+0x110/0x260 [c0000000050a8ea0] .dev_queue_xmit+0x260/0x630 [c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge] [c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge] [c0000000050a9070] .dev_hard_start_xmit+0x394/0x640 [c0000000050a9130] .dev_queue_xmit+0x428/0x630 [c0000000050a91d0] .ip_finish_output+0x2a4/0x550 [c0000000050a9290] .ip_local_out+0x50/0x70 [c0000000050a9310] .ip_queue_xmit+0x148/0x420 [c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0 [c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0 [c0000000050a9520] .tcp_rcv_established+0x1e8/0x930 [c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570 [c0000000050a96c0] .tcp_v4_rcv+0x734/0x930 [c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360 [c0000000050a9840] .ip_rcv_finish+0x148/0x400 [c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00 [c0000000050a99d0] .netif_receive_skb+0x44/0x110 [c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge] [c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge] [c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge] [c0000000050a9c70] .nf_iterate+0x114/0x130 [c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0 [c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge] [c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00 [c0000000050a9fa0] .netif_receive_skb+0x44/0x110 [c0000000050aa040] .napi_gro_receive+0xe8/0x120 [c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp] [c0000000050aa1d0] .net_rx_action+0x1dc/0x310 [c0000000050aa2b0] .__do_softirq+0x158/0x330 [c0000000050aa3b0] .irq_exit+0xc8/0x110 [c0000000050aa430] .do_IRQ+0xdc/0x2c0 [c0000000050aa4e0] hardware_interrupt_common+0x154/0x180 --- Exception: 501 at .bad_range+0x1c/0x110 LR = .get_page_from_freelist+0x908/0xbb0 [c0000000050aa7d0] .list_del+0x18/0x50 (unreliable) [c0000000050aa850] .get_page_from_freelist+0x908/0xbb0 [c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0 [c0000000050aaba0] .alloc_pages_vma+0xd0/0x210 [c0000000050aac60] .handle_pte_fault+0x814/0xb70 [c0000000050aad50] .__get_user_pages+0x1a4/0x640 [c0000000050aae60] .get_user_pages_fast+0xec/0x160 [c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm] [c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm] [c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm] [c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm] [c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm] [c0000000050ab320] kvm_start_lightweight+0x1ec/0x1fc [kvm] [c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm] [c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm] [c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm] [c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm] [c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0 [c0000000050abd80] .SyS_ioctl+0xd4/0xf0 [c0000000050abe30] syscall_exit+0x0/0x98 Since this is a regression, this patch proposes a minimalistic and low-risk solution by blindly forcing the hardirq exit processing of softirqs on the softirq stack. This way we should reduce significantly the opportunities for task stack overflow dug by softirqs. Longer term solutions may involve extending the hardirq stack coverage to irq_exit(), etc... Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: #3.9.. <stable@vger.kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org>
2013-09-30pidns: fix free_pid() to handle the first fork failureOleg Nesterov
"case 0" in free_pid() assumes that disable_pid_allocation() should clear PIDNS_HASH_ADDING before the last pid goes away. However this doesn't happen if the first fork() fails to create the child reaper which should call disable_pid_allocation(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30kernel/kmod.c: check for NULL in call_usermodehelper_exec()Tetsuo Handa
If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer dereference happens upon core dump because argv_split("") returns argv[0] == NULL. This bug was once fixed by commit 264b83c07a84 ("usermodehelper: check subprocess_info->path != NULL") but was by error reintroduced by commit 7f57cfa4e2aa ("usermodehelper: kill the sub_info->path[0] check"). This bug seems to exist since 2.6.19 (the version which core dump to pipe was added). Depending on kernel version and config, some side effect might happen immediately after this oops (e.g. kernel panic with 2.6.32-358.18.1.el6). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30PM / hibernate: Fix user space driven resume regressionRafael J. Wysocki
Recent commit 8fd37a4 (PM / hibernate: Create memory bitmaps after freezing user space) broke the resume part of the user space driven hibernation (s2disk), because I forgot that the resume utility loaded the image into memory without freezing user space (it still freezes tasks after loading the image). This means that during user space driven resume we need to create the memory bitmaps at the "device open" time rather than at the "freeze tasks" time, so make that happen (that's a special case anyway, so it needs to be treated in a special way). Reported-and-tested-by: Ronald <ronald645@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-28Merge branches 'sched-urgent-for-linus', 'timers-urgent-for-linus' and ↵Linus Torvalds
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler, timer and x86 fixes from Ingo Molnar: - A context tracking ARM build and functional fix - A handful of ARM clocksource/clockevent driver fixes - An AMD microcode patch level sysfs reporting fixlet * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm: Fix build error with context tracking calls * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast clocksource: of: Respect device tree node status clocksource: exynos_mct: Set IRQ affinity when the CPU goes online arm: clocksource: mvebu: Use the main timer as clock source from DT * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix patch level reporting for family 15h
2013-09-28kernel/params: fix handling of signed integer typesJean Delvare
Commit 6072ddc8520b ("kernel: replace strict_strto*() with kstrto*()") broke the handling of signed integer types, fix it. Signed-off-by: Jean Delvare <khali@linux-fr.org> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-28Merge branch 'context_tracking/fixes' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/urgent Pull context tracking ARM fix from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-27arm: Fix build error with context tracking callsFrederic Weisbecker
ad65782fba50 (context_tracking: Optimize main APIs off case with static key) converted context tracking main APIs to inline function and left ARM asm callers behind. This can be easily fixed by making ARM calling the post static keys context tracking function. We just need to replicate the static key checks there. We'll remove these later when ARM will support the context tracking static keys. Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Kevin Hilman <khilman@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Anil Kumar <anilk4.v@gmail.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Benoit Cousson <b-cousson@ti.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Kevin Hilman <khilman@linaro.org>
2013-09-25Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three small fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/balancing: Fix cfs_rq->task_h_load calculation sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in fix_small_imbalance() sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()
2013-09-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Assorted standalone fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add model number for Avoton Silvermont perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group() perf: Update ABI comment tools lib lk: Uninclude linux/magic.h in debugfs.c perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms() perf probe: Fix finder to find lines of given function perf session: Check for SIGINT in more loops perf tools: Fix compile with libelf without get_phdrnum perf tools: Fix buildid cache handling of kallsyms with kcore perf annotate: Fix objdump line parsing offset validation perf tools: Fill in new definitions for madvise()/mmap() flags perf tools: Sharpen the libaudit dependencies test
2013-09-24kernel/reboot.c: re-enable the function of variable reboot_defaultChuansheng Liu
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") did some cleanup for reboot= command line, but it made the reboot_default inoperative. The default value of variable reboot_default should be 1, and if command line reboot= is not set, system will use the default reboot mode. [akpm@linux-foundation.org: fix comment layout] Signed-off-by: Li Fei <fei.li@intel.com> Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Acked-by: Robin Holt <robinmholt@linux.com> Cc: <stable@vger.kernel.org> [3.11.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24audit: fix endless wait in audit_log_start()Konstantin Khlebnikov
After commit 829199197a43 ("kernel/audit.c: avoid negative sleep durations") audit emitters will block forever if userspace daemon cannot handle backlog. After the timeout the waiting loop turns into busy loop and runs until daemon dies or returns back to work. This is a minimal patch for that bug. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Chuck Anderson <chuck.anderson@oracle.com> Cc: Dan Duval <dan.duval@oracle.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24watchdog: update watchdog_thresh properlyMichal Hocko
watchdog_tresh controls how often nmi perf event counter checks per-cpu hrtimer_interrupts counter and blows up if the counter hasn't changed since the last check. The counter is updated by per-cpu watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh period which guarantees that hrtimer is scheduled 2 times per the main period. Both hrtimer and perf event are started together when the watchdog is enabled. So far so good. But... But what happens when watchdog_thresh is updated from sysctl handler? proc_dowatchdog will set a new sampling period and hrtimer callback (watchdog_timer_fn) will use the new value in the next round. The problem, however, is that nobody tells the perf event that the sampling period has changed so it is ticking with the period configured when it has been set up. This might result in an ear ripping dissonance between perf and hrtimer parts if the watchdog_thresh is increased. And even worse it might lead to KABOOM if the watchdog is configured to panic on such a spurious lockup. This patch fixes the issue by updating both nmi perf even counter and hrtimers if the threshold value has changed. The nmi one is disabled and then reinitialized from scratch. This has an unpleasant side effect that the allocation of the new event might fail theoretically so the hard lockup detector would be disabled for such cpus. On the other hand such a memory allocation failure is very unlikely because the original event is deallocated right before. It would be much nicer if we just changed perf event period but there doesn't seem to be any API to do that right now. It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we cannot use on_each_cpu() and do the same thing from the per-cpu context. The update from the current CPU should be safe because perf_event_disable removes the event atomically before it clears the per-cpu watchdog_ev so it cannot change anything under running handler feet. The hrtimer is simply restarted (thanks to Don Zickus who has pointed this out) if it is queued because we cannot rely it will fire&adopt to the new sampling period before a new nmi event triggers (when the treshold is decreased). [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place] Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Fabio Estevam <festevam@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24watchdog: update watchdog attributes atomicallyMichal Hocko
proc_dowatchdog doesn't synchronize multiple callers which might lead to confusion when two parallel callers might confuse watchdog_enable_all_cpus resp watchdog_disable_all_cpus (eg watchdog gets enabled even if watchdog_thresh was set to 0 already). This patch adds a local mutex which synchronizes callers to the sysctl handler. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-20sched/balancing: Fix cfs_rq->task_h_load calculationVladimir Davydov
Patch a003a2 (sched: Consider runnable load average in move_tasks()) sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is always 0. This mistype leads to all tasks having weight 0 when load balancing in a cpu-cgroup enabled setup. There obviously should be sum of weights of all runnable tasks there instead. Fix it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1379173186-11944-1-git-send-email-vdavydov@parallels.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in ↵Vladimir Davydov
fix_small_imbalance() In busiest->group_imb case we can come to fix_small_imbalance() with local->avg_load > busiest->avg_load. This can result in wrong imbalance fix-up, because there is the following check there where all the members are unsigned: if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >= (scaled_busy_load_per_task * imbn)) { env->imbalance = busiest->load_per_task; return; } As a result we can end up constantly bouncing tasks from one cpu to another if there are pinned tasks. Fix it by substituting the subtraction with an equivalent addition in the check. [ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus belonging to different cores on an HT-enabled machine with N logical cpus: just look at se.nr_migrations growth. ] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ef167822e5c5b2d96cf5b0e3e4f4bdff3f0414a2.1379252740.git.vdavydov@parallels.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20sched/balancing: Fix 'local->avg_load > sds->avg_load' case in ↵Vladimir Davydov
calculate_imbalance() In busiest->group_imb case we can come to calculate_imbalance() with local->avg_load >= busiest->avg_load >= sds->avg_load. This can result in imbalance overflow, because it is calculated as follows env->imbalance = min( max_pull * busiest->group_power, (sds->avg_load - local->avg_load) * local->group_power) / SCHED_POWER_SCALE; As a result we can end up constantly bouncing tasks from one cpu to another if there are pinned tasks. Fix this by skipping the assignment and assuming imbalance=0 in case local->avg_load > sds->avg_load. [ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus belonging to different cores on an HT-enabled machine with N logical cpus: just look at se.nr_migrations growth. ] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8f596cc6bc0e5e655119dc892c9bfcad26e971f4.1379252740.git.vdavydov@parallels.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'Peter Zijlstra
Solve the problems around the broken definition of perf_event_mmap_page:: cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially fixed by: 860f085b74e9 ("perf: Fix broken union in 'struct perf_event_mmap_page'") The problem with the fix (merged in v3.12-rc1 and not yet released officially), noticed by Vince Weaver is that the new behavior is not detectable by new user-space, and that due to the reuse of the field names it's easy to mis-compile a binary if old headers are used on a new kernel or new headers are used on an old kernel. To solve all that make this change explicit, detectable and self-contained, by iterating the ABI the following way: - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not confuse old user-space binaries. RDPMC will be marked as unavailable to old binaries but that's within the ABI, this is a capability bit. - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new libraries can reliably detect that bit 0 is deprecated and perma-zero without having to check the kernel version. - Use bits 2, 3, 4 for the newly defined, correct functionality: cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */ cap_user_time : 1, /* The time_* fields are used */ cap_user_time_zero : 1, /* The time_zero field is used */ - Rename all the bitfield names in perf_event.h to be different from the old names, to make sure it's not possible to mis-compile it accidentally with old assumptions. The 'size' field can then be used in the future to add new fields and it will act as a natural ABI version indicator as well. Also adjust tools/perf/ userspace for the new definitions, noticed by Adrian Hunter. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-18Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "An NTP related lockup fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix HRTICK related deadlock from ntp lock changes
2013-09-18Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix comment for sched_info_depart sched/Documentation: Update sched-design-CFS.txt documentation sched/debug: Take PID namespace into account sched/fair: Fix small race where child->se.parent,cfs_rq might point to invalid ones
2013-09-16sched: Fix comment for sched_info_departMichael S. Tsirkin
sched_info_depart seems to be only called from sched_info_switch(), so only on involuntary task switch. Fix the comment to match. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Link: http://lkml.kernel.org/r/20130916083036.GA1113@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-13Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio changes from Ben LaHaise: "First off, sorry for this pull request being late in the merge window. Al had raised a couple of concerns about 2 items in the series below. I addressed the first issue (the race introduced by Gu's use of mm_populate()), but he has not provided any further details on how he wants to rework the anon_inode.c changes (which were sent out months ago but have yet to be commented on). The bulk of the changes have been sitting in the -next tree for a few months, with all the issues raised being addressed" * git://git.kvack.org/~bcrl/aio-next: (22 commits) aio: rcu_read_lock protection for new rcu_dereference calls aio: fix race in ring buffer page lookup introduced by page migration support aio: fix rcu sparse warnings introduced by ioctx table lookup patch aio: remove unnecessary debugging from aio_free_ring() aio: table lookup: verify ctx pointer staging/lustre: kiocb->ki_left is removed aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" aio: be defensive to ensure request batching is non-zero instead of BUG_ON() aio: convert the ioctx list to table lookup v3 aio: double aio_max_nr in calculations aio: Kill ki_dtor aio: Kill ki_users aio: Kill unneeded kiocb members aio: Kill aio_rw_vect_retry() aio: Don't use ctx->tail unnecessarily aio: io_cancel() no longer returns the io_event aio: percpu ioctx refcount aio: percpu reqs_available aio: reqs_active -> reqs_available aio: fix build when migration is disabled ...
2013-09-13Remove GENERIC_HARDIRQ config optionMartin Schwidefsky
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-12Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge more patches from Andrew Morton: "The rest of MM. Plus one misc cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) mm/Kconfig: add MMU dependency for MIGRATION. kernel: replace strict_strto*() with kstrto*() mm, thp: count thp_fault_fallback anytime thp fault fails thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() thp: do_huge_pmd_anonymous_page() cleanup thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() mm: cleanup add_to_page_cache_locked() thp: account anon transparent huge pages into NR_ANON_PAGES truncate: drop 'oldsize' truncate_pagecache() parameter mm: make lru_add_drain_all() selective memcg: document cgroup dirty/writeback memory statistics memcg: add per cgroup writeback pages accounting memcg: check for proper lock held in mem_cgroup_update_page_stat memcg: remove MEMCG_NR_FILE_MAPPED memcg: reduce function dereference memcg: avoid overflow caused by PAGE_ALIGN memcg: rename RESOURCE_MAX to RES_COUNTER_MAX memcg: correct RESOURCE_MAX to ULLONG_MAX mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup ...
2013-09-12kernel: replace strict_strto*() with kstrto*()Jingoo Han
The usage of strict_strto*() is not preferred, because strict_strto*() is obsolete. Thus, kstrto*() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12memcg: reduce function dereferenceSha Zhengju
This function dereferences res far too often, so optimize it. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12memcg: avoid overflow caused by PAGE_ALIGNSha Zhengju
Since PAGE_ALIGN is aligning up(the next page boundary), so after PAGE_ALIGN, the value might be overflow, such as write the MAX value to *.limit_in_bytes. $ cat /cgroup/memory/memory.limit_in_bytes 18446744073709551615 # echo 18446744073709551615 > /cgroup/memory/memory.limit_in_bytes bash: echo: write error: Invalid argument Some user programs might depend on such behaviours(like libcg, we read the value in snapshot, then use the value to reset cgroup later), and that will cause confusion. So we need to fix it. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12memcg: rename RESOURCE_MAX to RES_COUNTER_MAXSha Zhengju
RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...
2013-09-12Merge tag 'pm+acpi-fixes-3.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "All of these commits are fixes that have emerged recently and some of them fix bugs introduced during this merge window. Specifics: 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu" * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled ACPI / bind: Prefer device objects with _STA to those without it ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field. ACPI / hotplug / PCI: Don't trim devices before scanning the namespace ...
2013-09-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes. The -g perf report lockup you reported is only partially addressed, patches that fix the excessive runtime are still being worked on" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix uncore PCI fixed counter handling uprobes: Fix utask->depth accounting in handle_trampoline() perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING perf: Fix up MMAP2 buffer space reservation perf tools: Add attr->mmap2 support perf kvm: Fix sample_type manipulation perf evlist: Fix id pos in perf_evlist__open() perf trace: Handle perf.data files with no tracepoints perf session: Separate progress bar update when processing events perf trace: Check if MAP_32BIT is defined perf hists: Fix formatting of long symbol names perf evlist: Fix parsing with no sample_id_all bit set perf tools: Add test for parsing with no sample_id_all bit perf trace: Check control+C more often
2013-09-12Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Performance regression fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix load balancing performance regression in should_we_balance()
2013-09-12sched/debug: Take PID namespace into accountPeter Zijlstra
Emmanuel reported that /proc/sched_debug didn't report the right PIDs when using namespaces, cure this. Reported-by: Emmanuel Deloget <emmanuel.deloget@efixo.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130909110141.GM31370@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12sched/fair: Fix small race where child->se.parent,cfs_rq might point to ↵Daisuke Nishimura
invalid ones There is a small race between copy_process() and cgroup_attach_task() where child->se.parent,cfs_rq points to invalid (old) ones. parent doing fork() | someone moving the parent to another cgroup -------------------------------+--------------------------------------------- copy_process() + dup_task_struct() -> parent->se is copied to child->se. se.parent,cfs_rq of them point to old ones. cgroup_attach_task() + cgroup_task_migrate() -> parent->cgroup is updated. + cpu_cgroup_attach() + sched_move_task() + task_move_group_fair() +- set_task_rq() -> se.parent,cfs_rq of parent are updated. + cgroup_fork() -> parent->cgroup is copied to child->cgroup. (*1) + sched_fork() + task_fork_fair() -> se.parent,cfs_rq of child are accessed while they point to old ones. (*2) In the worst case, this bug can lead to "use-after-free" and cause a panic, because it's new cgroup's refcount that is incremented at (*1), so the old cgroup(and related data) can be freed before (*2). In fact, a panic caused by this bug was originally caught in RHEL6.4. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0 [...] Call Trace: [<ffffffff81051f25>] place_entity+0x75/0xa0 [<ffffffff81056a3a>] task_fork_fair+0xaa/0x160 [<ffffffff81063c0b>] sched_fork+0x6b/0x140 [<ffffffff8106c3c2>] copy_process+0x5b2/0x1450 [<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130 [<ffffffff8106d2f4>] do_fork+0x94/0x460 [<ffffffff81072a9e>] ? sys_wait4+0xae/0x100 [<ffffffff81009598>] sys_clone+0x28/0x30 [<ffffffff8100b393>] stub_clone+0x13/0x20 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>