summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-10-22x86/kexec: Fix kexec crash in syscall kexec_file_load()Lee, Chun-Yi
commit e3c41e37b0f4b18cbd4dac76cbeece5a7558b909 upstream. The original bug is a page fault crash that sometimes happens on big machines when preparing ELF headers: BUG: unable to handle kernel paging request at ffffc90613fc9000 IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260 The bug is caused by us under-counting the number of memory ranges and subsequently not allocating enough ELF header space for them. The bug is typically masked on smaller systems, because the ELF header allocation is rounded up to the next page. This patch modifies the code in fill_up_crash_elf_data() by using walk_system_ram_res() instead of walk_system_ram_range() to correctly count the max number of crash memory ranges. That's because the walk_system_ram_range() filters out small memory regions that reside in the same page, but walk_system_ram_res() does not. Here's how I found the bug: After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(), the code uses walk_system_ram_res() to fill-in crash memory regions information to the program header, so it counts those small memory regions that reside in a page area. But, when the kernel was using walk_system_ram_range() in fill_up_crash_elf_data() to count the number of crash memory regions, it filters out small regions. I printed those small memory regions, for example: kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0 Based on the code in walk_system_ram_range(), this memory region will be filtered out: pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593 end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593 end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE So, the max_nr_ranges that's counted by the kernel doesn't include small memory regions - causing us to under-allocate the required space. That causes the page fault crash that happens in a later code path when preparing ELF headers. This bug is not easy to reproduce on small machines that have few CPUs, because the allocated page aligned ELF buffer has more free space to cover those small memory regions' PT_LOAD headers. Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, ↵Matt Fleming
instead of top-down commit a5caa209ba9c29c6421292e7879d2387a2ef39c9 upstream. Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, BUG: unable to handle kernel paging request at fffffffefe6086dd IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd Call Trace: [<ffffffff8104c90e>] efi_call+0x7e/0x100 [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90 [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70 [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392 [<ffffffff81f37e1b>] start_kernel+0x38a/0x417 [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef Here 0xfffffffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chun-Yi <jlee@suse.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22Use WARN_ON_ONCE for missing X86_FEATURE_NRIPSDirk Müller
commit d2922422c48df93f3edff7d872ee4f3191fefb08 upstream. The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI codeAndy Lutomirski
commit 83c133cf11fb0e68a51681447e372489f052d40e upstream. The NMI entry code that switches to the normal kernel stack needs to be very careful not to clobber any extra stack slots on the NMI stack. The code is fine under the assumption that SWAPGS is just a normal instruction, but that assumption isn't really true. Use SWAPGS_UNSAFE_STACK instead. This is part of a fix for some random crashes that Sasha saw. Fixes: 9b6e6a8334d5 ("x86/nmi/64: Switch stacks on userspace NMI entry") Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/paravirt: Replace the paravirt nop with a bona fide empty functionAndy Lutomirski
commit fc57a7c68020dcf954428869eafd934c0ab1536f upstream. PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an example, trimmed for readability): ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6> 2792: R_X86_64_PC32 pv_irq_ops+0x2c That's a call through a function pointer to regular C function that does nothing on native boots, but that function isn't protected against kprobes, isn't marked notrace, and is certainly not guaranteed to preserve any registers if the compiler is feeling perverse. This is bad news for a CLBR_NONE operation. Of course, if everything works correctly, once paravirt ops are patched, it gets nopped out, but what if we hit this code before paravirt ops are patched in? This can potentially cause breakage that is very difficult to debug. A more subtle failure is possible here, too: if _paravirt_nop uses the stack at all (even just to push RBP), it will overwrite the "NMI executing" variable if it's called in the NMI prologue. The Xen case, perhaps surprisingly, is fine, because it's already written in asm. Fix all of the cases that default to paravirt_nop (including adjust_exception_frame) with a big hammer: replace paravirt_nop with an asm function that is just a ret instruction. The Xen case may have other problems, so document them. This is part of a fix for some random crashes that Sasha saw. Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/platform: Fix Geode LX timekeeping in the generic x86 buildDavid Woodhouse
commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream. In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable") bypassed verification of the TSC on Geode LX. However, this code (now in the check_system_tsc_reliable() function in arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was set. OpenWRT has recently started building its generic Geode target for Geode GX, not LX, to include support for additional platforms. This broke the timekeeping on LX-based devices, because the TSC wasn't marked as reliable: https://dev.openwrt.org/ticket/20531 By adding a runtime check on is_geode_lx(), we can also include the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus fixing the problem. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Andres Salomon <dilinger@queued.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/alternatives: Make optimize_nops() interrupt safe and syncedThomas Gleixner
commit 66c117d7fa2ae429911e60d84bf31a90b2b96189 upstream. Richard reported the following crash: [ 0.036000] BUG: unable to handle kernel paging request at 55501e06 [ 0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38 [ 0.036000] Call Trace: [ 0.036000] [<c0409c80>] ? add_nops+0x90/0xa0 [ 0.036000] [<c040a054>] apply_alternatives+0x274/0x630 Chuck decoded: " 0: 8d 90 90 83 04 24 lea 0x24048390(%eax),%edx 6: 80 fc 0f cmp $0xf,%ah 9: a8 0f test $0xf,%al >> b: a0 06 1e 50 55 mov 0x55501e06,%al 10: 57 push %edi 11: 56 push %esi Interrupt 0x30 occurred while the alternatives code was replacing the initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized version, 0x8d,0x76,0x00. Only the first byte has been replaced so far, and it makes a mess out of the insn decoding." optimize_nops() is buggy in two aspects: - It's not disabling interrupts across the modification - It's lacking a sync_core() call Add both. Fixes: 4fd4b6e5537c 'x86/alternatives: Use optimized NOPs for padding' Reported-and-tested-by: "Richard W.M. Jones" <rjones@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/apic: Serialize LVTT and TSC_DEADLINE writesShaohua Li
commit 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 upstream. The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not guaranteed that the write to LVTT has reached the APIC before the TSC_DEADLINE MSR is written. In such a case the write to the MSR is ignored and as a consequence the local timer interrupt never fires. The SDM decribes this issue for xAPIC and x2APIC modes. The serialization methods recommended by the SDM differ. xAPIC: "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. 2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter. 3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2. 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline." x2APIC: "To allow for efficient access to the APIC registers in x2APIC mode, the serializing semantics of WRMSR are relaxed when writing to the APIC registers. Thus, system software should not use 'WRMSR to APIC registers in x2APIC mode' as a serializing instruction. Read and write accesses to the APIC registers will occur in program order. A WRMSR to an APIC register may complete before all preceding stores are globally visible; software can prevent this by inserting a serializing instruction, an SFENCE, or an MFENCE before the WRMSR." The xAPIC method is to just wait for the memory mapped write to hit the LVTT by checking whether the MSR write has reached the hardware. There is no reason why a proper MFENCE after the memory mapped write would not do the same. Andi Kleen confirmed that MFENCE is sufficient for the xAPIC case as well. Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE support. [ tglx: Massaged the changelog ] Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: <Kernel-team@fb.com> Cc: <lenb@kernel.org> Cc: <fenghua.yu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22dmaengine: dw: properly read DWC_PARAMS registerAndy Shevchenko
commit 6bea0f6d1c47b07be88dfd93f013ae05fcb3d8bf upstream. In case we have less than maximum allowed channels (8) and autoconfiguration is enabled the DWC_PARAMS read is wrong because it uses different arithmetic to what is needed for channel priority setup. Re-do the caclulations properly. This now works on AVR32 board well. Fixes: fed2574b3c9f (dw_dmac: introduce software emulation of LLP transfers) Cc: yitian.bu@tangramtek.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22blockdev: don't set S_DAX for misaligned partitionsJeff Moyer
commit f0b2e563bc419df7c1b3d2f494574c25125f6aed upstream. The dax code doesn't currently support misaligned partitions, so disable O_DIRECT via dax until such time as that support materializes. Suggested-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: dts: fix usb pin control for imx-rex dtsFelipe F. Tonello
commit 0af822110871400908d5b6f83a8908c45f881d8f upstream. This fixes a duplicated pin control causing this error: imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_GPIO_1 already requested by regulators:regulator@2; cannot claim for 2184000.usb imx6q-pinctrl 20e0000.iomuxc: pin-137 (2184000.usb) status -22 imx6q-pinctrl 20e0000.iomuxc: could not request pin 137 (MX6Q_PAD_GPIO_1) from group usbotggrp on device 20e0000.iomuxc imx_usb 2184000.usb: Error applying setting, reverse things back imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_EIM_D31 already requested by regulators:regulator@1; cannot claim for 2184200.usb imx6q-pinctrl 20e0000.iomuxc: pin-52 (2184200.usb) status -22 imx6q-pinctrl 20e0000.iomuxc: could not request pin 52 (MX6Q_PAD_EIM_D31) from group usbh1grp on device 20e0000.iomuxc imx_usb 2184200.usb: Error applying setting, reverse things back Signed-off-by: Felipe F. Tonello <eu@felipetonello.com> Fixes: e2047e33f2bd ("ARM: dts: add initial Rex Pro board support") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: EXYNOS: reset Little cores when cpu is upChanho Park
commit 833b5794e3303cc97a0d2d4ba97f26cc9d9b4b79 upstream. The cpu booting of exynos5422 has been still broken since we discussed it in last year[1]. This patch is inspired from Odroid XU3 code (Actually, it was from samsung exynos vendor kernel)[2]. This weird reset code was founded exynos5420 octa cores series SoCs and only required for the first boot core is the Little core (Cortex A7). Some of the exynos5420 boards and all of the exynos5422 boards will require this code. There is two ways to check the little core is the first cpu. One is checking GPG2CON[1] GPIO value and the other is checking the cluster number of the first cpu. I selected the latter because it's more easier than the former. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-June/350632.html [2] https://patchwork.kernel.org/patch/6782891/ Cc: Kevin Hilman <khilman@kernel.org> Cc: Javier Martinez Canillas <javier@osg.samsung.com> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Chanho Park <parkch98@gmail.com> [k.kozlowski: Adding stable for v4.1+, reformat comment] Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: dts: omap3-beagle: make i2c3, ddc and tfp410 gpio work againCarl Frederik Werner
commit 3a2fa775bd1d0579113666c1a2e37654a34018a0 upstream. Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio. According to the OMAP35x Technical Reference Manual CONTROL_PADCONF_I2C3_SDA[15:0] 0x480021C4 mode0: i2c3_sda CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170 the pinmux address of gpio 170 must be 0x480021C6. The former wrong address broke i2c3 (used by hdmi ddc), resulting in kernel message: omap_i2c 48060000.i2c: controller timed out Fixes: 8cecf52befd7 ("ARM: omap3-beagle.dts: add display information") Signed-off-by: Carl Frederik Werner <frederik@cfbw.eu> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: dts: omap5-uevm.dts: fix i2c5 pinctrl offsetsGrazvydas Ignotas
commit 1dbdad75074d16c3e3005180f81a01cdc04a7872 upstream. The i2c5 pinctrl offsets are wrong. If the bootloader doesn't set the pins up, communication with tca6424a doesn't work (controller timeouts) and it is not possible to enable HDMI. Fixes: 9be495c42609 ("ARM: dts: omap5-evm: Add I2c pinctrl data") Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpointsDoug Anderson
commit 7ae85dc7687c7e7119053d83d02c560ea217b772 upstream. In (23a4e40 arm: kgdb: Handle read-only text / modules) we moved to using patch_text() to set breakpoints so that we could handle the case when we had CONFIG_DEBUG_RODATA. That patch used patch_text(). Unfortunately, patch_text() assumes that we're not in atomic context when it runs since it needs to grab a mutex and also wait for other CPUs to stop (which it does with a completion). This would result in a stack crawl if you had CONFIG_DEBUG_ATOMIC_SLEEP and tried to set a breakpoint in kgdb. The crawl looked something like: BUG: scheduling while atomic: swapper/0/0/0x00010007 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc7-00133-geb63b34 #1073 Hardware name: Rockchip (Device Tree) (unwind_backtrace) from [<c00133d4>] (show_stack+0x20/0x24) (show_stack) from [<c05400e8>] (dump_stack+0x84/0xb8) (dump_stack) from [<c004913c>] (__schedule_bug+0x54/0x6c) (__schedule_bug) from [<c054065c>] (__schedule+0x80/0x668) (__schedule) from [<c0540cfc>] (schedule+0xb8/0xd4) (schedule) from [<c0543a3c>] (schedule_timeout+0x2c/0x234) (schedule_timeout) from [<c05417c0>] (wait_for_common+0xf4/0x188) (wait_for_common) from [<c0541874>] (wait_for_completion+0x20/0x24) (wait_for_completion) from [<c00a0104>] (__stop_cpus+0x58/0x70) (__stop_cpus) from [<c00a0580>] (stop_cpus+0x3c/0x54) (stop_cpus) from [<c00a06c4>] (__stop_machine+0xcc/0xe8) (__stop_machine) from [<c00a0714>] (stop_machine+0x34/0x44) (stop_machine) from [<c00173e8>] (patch_text+0x28/0x34) (patch_text) from [<c001733c>] (kgdb_arch_set_breakpoint+0x40/0x4c) (kgdb_arch_set_breakpoint) from [<c00a0d68>] (kgdb_validate_break_address+0x2c/0x60) (kgdb_validate_break_address) from [<c00a0e90>] (dbg_set_sw_break+0x1c/0xdc) (dbg_set_sw_break) from [<c00a2e88>] (gdb_serial_stub+0x9c4/0xba4) (gdb_serial_stub) from [<c00a11cc>] (kgdb_cpu_enter+0x1f8/0x60c) (kgdb_cpu_enter) from [<c00a18cc>] (kgdb_handle_exception+0x19c/0x1d0) (kgdb_handle_exception) from [<c0016f7c>] (kgdb_compiled_brk_fn+0x30/0x3c) (kgdb_compiled_brk_fn) from [<c00091a4>] (do_undefinstr+0x1a4/0x20c) (do_undefinstr) from [<c001400c>] (__und_svc_finish+0x0/0x34) It turns out that when we're in kgdb all the CPUs are stopped anyway so there's no reason we should be calling patch_text(). We can instead directly call __patch_text() which assumes that CPUs have already been stopped. Fixes: 23a4e4050ba9 ("arm: kgdb: Handle read-only text / modules") Reported-by: Aapo Vienamo <avienamo@nvidia.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22windfarm: decrement client count when unregisteringPaul Bolle
commit fe2b592173ff0274e70dc44d1d28c19bb995aa7c upstream. wf_unregister_client() increments the client count when a client unregisters. That is obviously incorrect. Decrement that client count instead. Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: 8429/1: disable GCC SRA optimizationArd Biesheuvel
commit a077224fd35b2f7fbc93f14cf67074fc792fbac2 upstream. While working on the 32-bit ARM port of UEFI, I noticed a strange corruption in the kernel log. The following snprintf() statement (in drivers/firmware/efi/efi.c:efi_md_typeattr_format()) snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", was producing the following output in the log: | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | | | | |UC] |RUN| | | | | | | |UC] As it turns out, this is caused by incorrect code being emitted for the string() function in lib/vsprintf.c. The following code if (!(spec.flags & LEFT)) { while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } } for (i = 0; i < len; ++i) { if (buf < end) *buf = *s; ++buf; ++s; } while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } when called with len == 0, triggers an issue in the GCC SRA optimization pass (Scalar Replacement of Aggregates), which handles promotion of signed struct members incorrectly. This is a known but as yet unresolved issue. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular case, it is causing the second while loop to be executed erroneously a single time, causing the additional space characters to be printed. So disable the optimization by passing -fno-ipa-sra. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: fix Thumb2 signal handling when ARMv6 is enabledRussell King
commit 9b55613f42e8d40d5c9ccb8970bde6af4764b2ab upstream. When a kernel is built covering ARMv6 to ARMv7, we omit to clear the IT state when entering a signal handler. This can cause the first few instructions to be conditionally executed depending on the parent context. In any case, the original test for >= ARMv7 is broken - ARMv6 can have Thumb-2 support as well, and an ARMv6T2 specific build would omit this code too. Relax the test back to ARMv6 or greater. This results in us always clearing the IT state bits in the PSR, even on CPUs where these bits are reserved. However, they're reserved for the IT state, so this should cause no harm. Fixes: d71e1352e240 ("Clear the IT state when invoking a Thumb-2 signal handler") Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Tested-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22hwmon: (nct6775) Swap STEP_UP_TIME and STEP_DOWN_TIME registers for most chipsGuenter Roeck
commit 728d29400488d54974d3317fe8a232b45fdb42ee upstream. The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but NCT6775. Reported-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22sched: access local runqueue directly in single_task_runningDominik Dingel
commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream. Commit 2ee507c47293 ("sched: Add function single_task_running to let a task check if it is the only task running on a cpu") referenced the current runqueue with the smp_processor_id. When CONFIG_DEBUG_PREEMPT is enabled, that is only allowed if preemption is disabled or the currrent task is bound to the local cpu (e.g. kernel worker). With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that generates a lot of kernel messages. To avoid adding preemption in that cases, as it would limit the usefulness, we change single_task_running to access directly the cpu local runqueue. Cc: Tim Chen <tim.c.chen@linux.intel.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8 Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22watchdog: sunxi: fix activation of system resetFrancesco Lavra
commit 0919e4445190da18496d31aac08b90828a47d45f upstream. Commit f2147de33470 ("watchdog: sunxi: support parameterized compatible strings") introduced a regression in sunxi_wdt_start(), by which the system reset function of the watchdog is not enabled upon starting the watchdog. As a result, the system is not reset when the watchdog expires. Fix it. Fixes: f2147de33470 ("watchdog: sunxi: support parameterized compatible strings") Signed-off-by: Francesco Lavra <francescolavra.fl@gmail.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf: Fix AUX buffer refcountingPeter Zijlstra
commit 57ffc5ca679f499f4704fd9b6a372916f59930ee upstream. Its currently possible to drop the last refcount to the aux buffer from NMI context, which results in the expected fireworks. The refcounting needs a bigger overhaul, but to cure the immediate problem, delay the freeing by using an irq_work. Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf header: Fixup reading of HEADER_NRCPUS featureArnaldo Carvalho de Melo
commit caa470475d9b59eeff093ae650800d34612c4379 upstream. The original patch introducing this header wrote the number of CPUs available and online in one order and then swapped those values when reading, fix it. Before: # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 # echo 0 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 3 # echo 0 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 2 After the fix, bringing back the CPUs online: # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 2 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 3 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)") Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf tools: Add empty Build files for architectures lacking themBen Hutchings
commit 93df8a1ed6231727c5db94a80b1a6bd5ee67cec3 upstream. perf currently fails to build on MIPS as there is no tools/perf/arch/mips/Build file. Adding an empty file fixes this as there are no MIPS-specific sources to build. It looks like the same is needed for Alpha and PA-RISC, though I haven't been able to test those. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 5e8c0fb6a957 ("perf build: Add arch x86 objects building") Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.uk Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf stat: Get correct cpu id for print_aggrKan Liang
commit 601083cffb7cabdcc55b8195d732f0f7028570fa upstream. print_aggr() fails to print per-core/per-socket statistics after commit 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") if events have differnt cpus. Because in print_aggr(), aggr_get_id needs index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be used to get aggregated id. Here is an example: Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has cpumask 0,18) $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2 Without this patch, it failes to get CPU 18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 7526851 cycles S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/ S1-C0 0 <not counted> cycles S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/ With this patch, it can get both CPU0 and CPU18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 6327768 cycles S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/ S1-C0 1 330228 cycles S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf hists: Update the column width for the "srcline" sort keyArnaldo Carvalho de Melo
commit e8e6d37e73e6b950c891c780745460b87f4755b6 upstream. When we introduce a new sort key, we need to update the hists__calc_col_len() function accordingly, otherwise the width will be limited to strlen(header). We can't update it when obtaining a line value for a column (for instance, in sort__srcline_cmp()), because we reset it all when doing a resort (see hists__output_recalc_col_len()), so we need to, from what is in the hist_entry fields, set each of the column widths. Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Fixes: 409a8be61560 ("perf tools: Add sort by src line/number") Link: http://lkml.kernel.org/n/tip-jgbe0yx8v1gs89cslr93pvz2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf tools: Fix copying of /proc/kcoreAdrian Hunter
commit b5cabbcbd157a4bf5a92dfc85134999a3b55342d upstream. A copy of /proc/kcore containing the kernel text can be made to the buildid cache. e.g. perf buildid-cache -v -k /proc/kcore To workaround objdump limitations, a copy is also made when annotating against /proc/kcore. The copying process stops working from libelf about v1.62 onwards (the problem was found with v1.63). The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails because additional validation has been added to gelf_getphdr(). The use of gelf_getphdr() is a misguided attempt to get default initialization of the Gelf_Phdr structure. That should not be necessary because every member of the Gelf_Phdr structure is subsequently assigned. So just remove the call to gelf_getphdr(). Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be removed also. Committer notes: Note to stable@kernel.org, from Adrian in the cover letter for this patchkit: The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think it is important enough for stable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22perf/x86/intel: Fix constraint accessPeter Zijlstra
commit ebfb4988f0378e2ac3b4a0aa1ea20d724293f392 upstream. Sasha reported that we can get here with .idx==-1, and cpuc->event_constraints unallocated. Suggested-by: Stephane Eranian <eranian@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b371b5943178 ("perf/x86: Fix event/group validation") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22toshiba_acpi: Fix hotkeys registration on some toshiba modelsAzael Avalos
commit 53147b6cabee5e8d1997b5682fcc0c3b72ddf9c2 upstream. Commit a2b3471b5b13 ("toshiba_acpi: Use the Hotkey Event Type function for keymap choosing") changed the *setup_keyboard function to query for the Hotkey Event Type to help choose the correct keymap, but turns out that here are certain Toshiba models out there not implementing this feature, and thus, failing to continue the input device registration and leaving such laptops without hotkey support. This patch changes such check, and instead of returning an error if the Hotkey Event Type is not present, we simply inform userspace about it, changing the message printed from err to notice, making the function responsible for registering the input device to continue. This issue was found on a Toshiba Portege Z30-B, but there might be some other models out there affected by this regression as well. Signed-off-by: Azael Avalos <coproscefalo@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22target: Fix v4.1 UNIT_ATTENTION se_node_acl->device_list[] NULL pointerNicholas Bellinger
This patch fixes a v4.1 only regression bug as reported by Martin where UNIT_ATTENTION checking for pre v4.2-rc1 RCU conversion code legacy se_node_acl->device_list[] was hitting a NULL pointer dereference in: [ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G I 4.1.6-fixxcopy+ #1 [ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012 [ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod] [ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000 [ 1858.639822] RIP: 0010:[<ffffffffa01d3774>] [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod] [ 1858.639884] RSP: 0018:ffff880317943ce0 EFLAGS: 00010282 [ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000 [ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408 [ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001 [ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000 [ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408 [ 1858.640100] FS: 0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000 [ 1858.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0 [ 1858.640210] Stack: [ 1858.640223] ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000 [ 1858.640267] ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400 [ 1858.640310] ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400 [ 1858.640354] Call Trace: [ 1858.640379] [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod] [ 1858.640429] [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod] [ 1858.640479] [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod] [ 1858.640526] [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0 [ 1858.640562] [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430 [ 1858.640595] [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560 [ 1858.640627] [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390 [ 1858.640661] [<ffffffff810913b3>] ? kthread+0xd3/0xf0 [ 1858.640689] [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180 Also, check for the same se_node_acl->device_list[] during EXTENDED_COPY operation as a non-holding persistent reservation port. Reported-by: Martin Svec <martin,svec@zoner.cz> Tested-by: Martin Svec <martin,svec@zoner.cz> Cc: Martin Svec <martin,svec@zoner.cz> Cc: Alex Gorbachev <ag@iss-integration.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22iser-target: Put the reference on commands waiting for unsol dataJenny Derzhavetz
commit 3e03c4b01da3e6a5f3081eb0aa252490fe83e352 upstream. The iscsi target core teardown sequence calls wait_conn for all active commands to finish gracefully by: - move the queue-pair to error state - drain all the completions - wait for the core to finish handling all session commands However, when tearing down a session while there are sequenced commands that are still waiting for unsolicited data outs, we can block forever as these are missing an extra reference put. We basically need the equivalent of iscsit_free_queue_reqs_for_conn() which is called after wait_conn has returned. Address this by an explicit walk on conn_cmd_list and put the extra reference. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22iser-target: remove command with state ISTATE_REMOVEJenny Derzhavetz
commit a4c15cd957cbd728f685645de7a150df5912591a upstream. As documented in iscsit_sequence_cmd: /* * Existing callers for iscsit_sequence_cmd() will silently * ignore commands with CMDSN_LOWER_THAN_EXP, so force this * return for CMDSN_MAXCMDSN_OVERRUN as well.. */ We need to silently finish a command when it's in ISTATE_REMOVE. This fixes an teardown hang we were seeing where a mis-behaved initiator (triggered by allocation error injections) sent us a cmdsn which was lower than expected. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sessNicholas Bellinger
commit 4416f89b8cfcb794d040fc3b68e5fb159b7d8d02 upstream. This patch is a >= v4.1 regression bug-fix where control CDB emulation logic in commit 38b57f82 now expects a se_cmd->se_sess pointer to exist when determining T10-PI support is to be exposed for initiator host ports. To address this bug, go ahead and add locally generated se_cmd descriptors for copy-offload block-copy to it's own stand-alone se_session nexus, while the parent EXTENDED_COPY se_cmd descriptor remains associated with it's originating se_cmd->se_sess nexus. Note a valid se_cmd->se_sess is also required for future support of WRITE_INSERT and READ_STRIP software emulation when submitting backend I/O to se_device that exposes T10-PI suport. Reported-by: Alex Gorbachev <ag@iss-integration.com> Tested-by: Alex Gorbachev <ag@iss-integration.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22scsi: fix scsi_error_handler vs. scsi_host_dev_release raceMichal Hocko
commit 537b604c8b3aa8b96fe35f87dd085816552e294c upstream. b9d5c6b7ef57 ("[SCSI] cleanup setting task state in scsi_error_handler()") has introduced a race between scsi_error_handler and scsi_host_dev_release resulting in the hang when the device goes away because scsi_error_handler might miss a wake up: CPU0 CPU1 scsi_error_handler scsi_host_dev_release kthread_stop() kthread_should_stop() test_bit(KTHREAD_SHOULD_STOP) set_bit(KTHREAD_SHOULD_STOP) wake_up_process() wait_for_completion() set_current_state(TASK_INTERRUPTIBLE) schedule() The most straightforward solution seems to be to invert the ordering of the set_current_state and kthread_should_stop. The issue has been noticed during reboot test on a 3.0 based kernel but the current code seems to be affected in the same way. [jejb: additional comment added] Reported-and-debugged-by: Mike Mayer <Mike.Meyer@teradata.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22target/iscsi: Fix np_ip bracket issue by removing np_ipAndy Grover
commit 76c28f1fcfeb42b47f798fe498351ee1d60086ae upstream. Revert commit 1997e6259, which causes double brackets on ipv6 inaddr_any addresses. Since we have np_sockaddr, if we need a textual representation we can use "%pISc". Change iscsit_add_network_portal() and iscsit_add_np() signatures to remove *ip_str parameter. Fix and extend some comments earlier in the function. Tested to work for :: and ::1 via iscsiadm, previously :: failed, see https://bugzilla.redhat.com/show_bug.cgi?id=1249107 . Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()John Stultz
commit 2619d7e9c92d524cb155ec89fd72875321512e5b upstream. The internal clocksteering done for fine-grained error correction uses a logarithmic approximation, so any time adjtimex() adjusts the clock steering, timekeeping_freqadjust() quickly approximates the correct clock frequency over a series of ticks. Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit: dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz") used the abs() function with a s64 error value to calculate the size of the approximated adjustment to be made. Per include/linux/kernel.h: "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()". Thus on 32-bit platforms, this resulted in the clocksteering to take a quite dampended random walk trying to converge on the proper frequency, which caused the adjustments to be made much slower then intended (most easily observed when large adjustments are made). This patch fixes the issue by using abs64() instead. Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Tested-by: Nuno Goncalves <nunojpg@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exitGautham R. Shenoy
commit 7e022e717f54897e396504306d0c9b61452adf4e upstream. In guest_exit_cont we call kvmhv_commence_exit which expects the trap number as the argument. However r3 doesn't contain the trap number at this point and as a result we would be calling the function with a spurious trap number. Fix this by copying r12 into r3 before calling kvmhv_commence_exit as r12 contains the trap number. Fixes: eddb60fb1443 Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()Thomas Huth
commit 3eb4ee68254235e4f47bc0410538fcdaede39589 upstream. Access to the kvm->buses (like with the kvm_io_bus_read() and -write() functions) has to be protected via the kvm->srcu lock. The kvmppc_h_logical_ci_load() and -store() functions are missing this lock so far, so let's add it there, too. This fixes the problem that the kernel reports "suspicious RCU usage" when lock debugging is enabled. Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22arm: KVM: Disable virtual timer even if the guest is not using itMarc Zyngier
commit 688bc577ac42ae3d07c889a1f0a72f0b23763d58 upstream. When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22kvm: fix double free for fast mmio eventfdJason Wang
commit eefd6b06b17c5478e7c24bea6f64beaa2c431ca6 upstream. We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS and once on KVM_FAST_MMIO_BUS but with a single iodev instance. This will lead to an issue: kvm_io_bus_destroy() knows nothing about the devices on two buses pointing to a single dev. Which will lead to double free[1] during exit. Fix this by allocating two instances of iodevs then registering one on KVM_MMIO_BUS and another on KVM_FAST_MMIO_BUS. CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000 RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292 RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900 RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000 R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880 FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0 Stack: ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622 ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8 Call Trace: [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm] [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm] [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff811f69f7>] __fput+0xe7/0x250 [<ffffffff811f6bae>] ____fput+0xe/0x10 [<ffffffff81093f04>] task_work_run+0xd4/0xf0 [<ffffffff81079358>] do_exit+0x368/0xa50 [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60 [<ffffffff81079ad5>] do_group_exit+0x45/0xb0 [<ffffffff81085c71>] get_signal+0x291/0x750 [<ffffffff810144d8>] do_signal+0x28/0xab0 [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0 [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20 [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170 [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0 [<ffffffff817cb9af>] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP <ffff88020e7f3bc8> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22kvm: factor out core eventfd assign/deassign logicJason Wang
commit 85da11ca587c8eb73993a1b503052391a73586f9 upstream. This patch factors out core eventfd assign/deassign logic and leaves the argument checking and bus index selection to callers. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22kvm: fix zero length mmio searchingJason Wang
commit 8f4216c7d28976f7ec1b2bcbfa0a9f787133c45e upstream. Currently, if we had a zero length mmio eventfd assigned on KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it always compares the kvm_io_range() with the length that guest wrote. This will cause e.g for vhost, kick will be trapped by qemu userspace instead of vhost. Fixing this by using zero length if an iodevice is zero length. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfdJason Wang
commit 8453fecbecae26edb3f278627376caab05d9a88d upstream. We only want zero length mmio eventfd to be registered on KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to make sure this. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22KVM: vmx: fix VPID is 0000H in non-root operationWanpeng Li
commit 04bb92e4b4cf06a66889d37b892b78f926faa9d4 upstream. Reference SDM 28.1: The current VPID is 0000H in the following situations: - Outside VMX operation. (This includes operation in system-management mode under the default treatment of SMIs and SMM with VMX operation; see Section 34.14.) - In VMX root operation. - In VMX non-root operation when the “enable VPID” VM-execution control is 0. The VPID should never be 0000H in non-root operation when "enable VPID" VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()") remove the codes which reserve 0000H for VMX root operation. This patch fix it by again reserving 0000H for VMX root operation. Fixes: 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4 Reported-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22arm: KVM: Fix incorrect device to IPA mappingMarek Majtyka
commit ca09f02f122b2ecb0f5ddfc5fd47b29ed657d4fd upstream. A critical bug has been found in device memory stage1 translation for VMs with more then 4GB of address space. Once vm_pgoff size is smaller then pa (which is true for LPAE case, u32 and u64 respectively) some more significant bits of pa may be lost as a shift operation is performed on u32 and later cast onto u64. Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12 expected pa(u64): 0x0000002010030000 produced pa(u64): 0x0000000010030000 The fix is to change the order of operations (casting first onto phys_addr_t and then shifting). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed changelog and patch formatting] Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03Linux 4.1.10v4.1.10Greg Kroah-Hartman
2015-10-03hp-wmi: limit hotkey enableKyle Evans
commit 8a1513b49321e503fd6c8b6793e3b1f9a8a3285b upstream. Do not write initialize magic on systems that do not have feature query 0xb. Fixes Bug #82451. Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd for code clearity. Add a new test function, hp_wmi_bios_2008_later() & simplify hp_wmi_bios_2009_later(), which fixes a bug in cases where an improper value is returned. Probably also fixes Bug #69131. Add missing __init tag. Signed-off-by: Kyle Evans <kvans32@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03zram: fix possible use after free in zcomp_create()Luis Henriques
commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream. zcomp_create() verifies the success of zcomp_strm_{multi,single}_create() through comp->stream, which can potentially be pointing to memory that was freed if these functions returned an error. While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic 'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create() could return other error codes. Function documentation updated accordingly. Fixes: beca3ec71fe5 ("zram: add multi stream functionality") Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03netlink: Replace rhash_portid with boundHerbert Xu
[ Upstream commit da314c9923fed553a007785a901fd395b7eb6c19 ] On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the acquire/release helpers as they don't help us at all and simply pessimises the code by using full memory barriers (on some architectures) where only a write or read barrier is needed. > depend on memory barriers embedded in other data structures like the > above. Here, especially, rhashtable_insert() would have write barrier > *before* the entry is hashed not necessarily *after*, which means that > in the above case, a socket which appears to have set bound to a > reader might not visible when the reader tries to look up the socket > on the hashtable. But you are right we do need an explicit write barrier here to ensure that the hashing is visible. > There's no reason to be overly smart here. This isn't a crazy hot > path, write barriers tend to be very cheap, store_release more so. > Please just do smp_store_release() and note what it's paired with. It's not about being overly smart. It's about actually understanding what's going on with the code. I've seen too many instances of people simply sprinkling synchronisation primitives around without any knowledge of what is happening underneath, which is just a recipe for creating hard-to-debug races. > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, > > } > > } > > > > - if (!nlk->portid) { > > + if (!nlk->bound) { > > I don't think you can skip load_acquire here just because this is the > second deref of the variable. That doesn't change anything. Race > condition could still happen between the first and second tests and > skipping the second would lead to the same kind of bug. The reason this one is OK is because we do not use nlk->portid or try to get nlk from the hash table before we return to user-space. However, there is a real bug here that none of these acquire/release helpers discovered. The two bound tests here used to be a single one. Now that they are separate it is entirely possible for another thread to come in the middle and bind the socket. So we need to repeat the portid check in order to maintain consistency. > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, > > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND)) > > return -EPERM; > > > > - if (!nlk->portid) > > + if (!nlk->bound) > > Don't we need load_acquire here too? Is this path holding a lock > which makes that unnecessary? Ditto. ---8<--- The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: Fix autobind race condition that leads to zero port ID") created some new races that can occur due to inconcsistencies between the two port IDs. Tejun is right that a barrier is unavoidable. Therefore I am reverting to the original patch that used a boolean to indicate that a user netlink socket has been bound. Barriers have been added where necessary to ensure that a valid portid and the hashed socket is visible. I have also changed netlink_insert to only return EBUSY if the socket is bound to a portid different to the requested one. This combined with only reading nlk->bound once in netlink_bind fixes a race where two threads that bind the socket at the same time with different port IDs may both succeed. Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Nacked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03netlink: Fix autobind race condition that leads to zero port IDHerbert Xu
[ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ] The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: Reset portid after netlink_insert failure") introduced a race condition where if two threads try to autobind the same socket one of them may end up with a zero port ID. This led to kernel deadlocks that were observed by multiple people. This patch reverts that commit and instead fixes it by introducing a separte rhash_portid variable so that the real portid is only set after the socket has been successfully hashed. Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>