Age | Commit message (Collapse) | Author |
|
commit e3c41e37b0f4b18cbd4dac76cbeece5a7558b909 upstream.
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:
BUG: unable to handle kernel paging request at ffffc90613fc9000
IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.
This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.
Here's how I found the bug:
After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.
But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.
I printed those small memory regions, for example:
kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
Based on the code in walk_system_ram_range(), this memory region
will be filtered out:
pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE
So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.
This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
instead of top-down
commit a5caa209ba9c29c6421292e7879d2387a2ef39c9 upstream.
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
that signals that the firmware PE/COFF loader supports splitting
code and data sections of PE/COFF images into separate EFI
memory map entries. This allows the kernel to map those regions
with strict memory protections, e.g. EFI_MEMORY_RO for code,
EFI_MEMORY_XP for data, etc.
Unfortunately, an unwritten requirement of this new feature is
that the regions need to be mapped with the same offsets
relative to each other as observed in the EFI memory map. If
this is not done crashes like this may occur,
BUG: unable to handle kernel paging request at fffffffefe6086dd
IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
Call Trace:
[<ffffffff8104c90e>] efi_call+0x7e/0x100
[<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
[<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
[<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
[<ffffffff81f37e1b>] start_kernel+0x38a/0x417
[<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
Here 0xfffffffefe6086dd refers to an address the firmware
expects to be mapped but which the OS never claimed was mapped.
The issue is that included in these regions are relative
addresses to other regions which were emitted by the firmware
toolchain before the "splitting" of sections occurred at
runtime.
Needless to say, we don't satisfy this unwritten requirement on
x86_64 and instead map the EFI memory map entries in reverse
order. The above crash is almost certainly triggerable with any
kernel newer than v3.13 because that's when we rewrote the EFI
runtime region mapping code, in commit d2f7cbe7b26a ("x86/efi:
Runtime services virtual mapping"). For kernel versions before
v3.13 things may work by pure luck depending on the
fragmentation of the kernel virtual address space at the time we
map the EFI regions.
Instead of mapping the EFI memory map entries in reverse order,
where entry N has a higher virtual address than entry N+1, map
them in the same order as they appear in the EFI memory map to
preserve this relative offset between regions.
This patch has been kept as small as possible with the intention
that it should be applied aggressively to stable and
distribution kernels. It is very much a bugfix rather than
support for a new feature, since when EFI_PROPERTIES_TABLE is
enabled we must map things as outlined above to even boot - we
have no way of asking the firmware not to split the code/data
regions.
In fact, this patch doesn't even make use of the more strict
memory protections available in UEFI v2.5. That will come later.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chun-Yi <jlee@suse.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d2922422c48df93f3edff7d872ee4f3191fefb08 upstream.
The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).
The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug. This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 83c133cf11fb0e68a51681447e372489f052d40e upstream.
The NMI entry code that switches to the normal kernel stack needs to
be very careful not to clobber any extra stack slots on the NMI
stack. The code is fine under the assumption that SWAPGS is just a
normal instruction, but that assumption isn't really true. Use
SWAPGS_UNSAFE_STACK instead.
This is part of a fix for some random crashes that Sasha saw.
Fixes: 9b6e6a8334d5 ("x86/nmi/64: Switch stacks on userspace NMI entry")
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc57a7c68020dcf954428869eafd934c0ab1536f upstream.
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):
ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6>
2792: R_X86_64_PC32 pv_irq_ops+0x2c
That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse. This is bad news for a CLBR_NONE operation.
Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in? This can potentially cause breakage
that is very difficult to debug.
A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.
The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.
Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.
The Xen case may have other problems, so document them.
This is part of a fix for some random crashes that Sasha saw.
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream.
In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable")
bypassed verification of the TSC on Geode LX. However, this code
(now in the check_system_tsc_reliable() function in
arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
set.
OpenWRT has recently started building its generic Geode target
for Geode GX, not LX, to include support for additional
platforms. This broke the timekeeping on LX-based devices,
because the TSC wasn't marked as reliable:
https://dev.openwrt.org/ticket/20531
By adding a runtime check on is_geode_lx(), we can also include
the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
fixing the problem.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 66c117d7fa2ae429911e60d84bf31a90b2b96189 upstream.
Richard reported the following crash:
[ 0.036000] BUG: unable to handle kernel paging request at 55501e06
[ 0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
[ 0.036000] Call Trace:
[ 0.036000] [<c0409c80>] ? add_nops+0x90/0xa0
[ 0.036000] [<c040a054>] apply_alternatives+0x274/0x630
Chuck decoded:
" 0: 8d 90 90 83 04 24 lea 0x24048390(%eax),%edx
6: 80 fc 0f cmp $0xf,%ah
9: a8 0f test $0xf,%al
>> b: a0 06 1e 50 55 mov 0x55501e06,%al
10: 57 push %edi
11: 56 push %esi
Interrupt 0x30 occurred while the alternatives code was replacing the
initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
optimized version, 0x8d,0x76,0x00. Only the first byte has been
replaced so far, and it makes a mess out of the insn decoding."
optimize_nops() is buggy in two aspects:
- It's not disabling interrupts across the modification
- It's lacking a sync_core() call
Add both.
Fixes: 4fd4b6e5537c 'x86/alternatives: Use optimized NOPs for padding'
Reported-and-tested-by: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 upstream.
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.
The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.
xAPIC:
"1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
x2APIC:
"To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the
APIC registers. Thus, system software should not use 'WRMSR to APIC
registers in x2APIC mode' as a serializing instruction. Read and write
accesses to the APIC registers will occur in program order. A WRMSR to
an APIC register may complete before all preceding stores are globally
visible; software can prevent this by inserting a serializing
instruction, an SFENCE, or an MFENCE before the WRMSR."
The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.
Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.
[ tglx: Massaged the changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6bea0f6d1c47b07be88dfd93f013ae05fcb3d8bf upstream.
In case we have less than maximum allowed channels (8) and autoconfiguration is
enabled the DWC_PARAMS read is wrong because it uses different arithmetic to
what is needed for channel priority setup.
Re-do the caclulations properly. This now works on AVR32 board well.
Fixes: fed2574b3c9f (dw_dmac: introduce software emulation of LLP transfers)
Cc: yitian.bu@tangramtek.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f0b2e563bc419df7c1b3d2f494574c25125f6aed upstream.
The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.
Suggested-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0af822110871400908d5b6f83a8908c45f881d8f upstream.
This fixes a duplicated pin control causing this error:
imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_GPIO_1 already
requested by regulators:regulator@2; cannot claim for 2184000.usb
imx6q-pinctrl 20e0000.iomuxc: pin-137 (2184000.usb) status -22
imx6q-pinctrl 20e0000.iomuxc: could not request pin 137
(MX6Q_PAD_GPIO_1) from group usbotggrp on device 20e0000.iomuxc
imx_usb 2184000.usb: Error applying setting, reverse things
back
imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_EIM_D31 already
requested by regulators:regulator@1; cannot claim for 2184200.usb
imx6q-pinctrl 20e0000.iomuxc: pin-52 (2184200.usb) status -22
imx6q-pinctrl 20e0000.iomuxc: could not request pin 52 (MX6Q_PAD_EIM_D31)
from group usbh1grp on device 20e0000.iomuxc
imx_usb 2184200.usb: Error applying setting, reverse things
back
Signed-off-by: Felipe F. Tonello <eu@felipetonello.com>
Fixes: e2047e33f2bd ("ARM: dts: add initial Rex Pro board support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 833b5794e3303cc97a0d2d4ba97f26cc9d9b4b79 upstream.
The cpu booting of exynos5422 has been still broken since we discussed
it in last year[1]. This patch is inspired from Odroid XU3
code (Actually, it was from samsung exynos vendor kernel)[2]. This weird
reset code was founded exynos5420 octa cores series SoCs and only
required for the first boot core is the Little core (Cortex A7).
Some of the exynos5420 boards and all of the exynos5422 boards will require
this code.
There is two ways to check the little core is the first cpu. One is
checking GPG2CON[1] GPIO value and the other is checking the cluster
number of the first cpu. I selected the latter because it's more easier
than the former.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-June/350632.html
[2] https://patchwork.kernel.org/patch/6782891/
Cc: Kevin Hilman <khilman@kernel.org>
Cc: Javier Martinez Canillas <javier@osg.samsung.com>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Chanho Park <parkch98@gmail.com>
[k.kozlowski: Adding stable for v4.1+, reformat comment]
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3a2fa775bd1d0579113666c1a2e37654a34018a0 upstream.
Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio.
According to the OMAP35x Technical Reference Manual
CONTROL_PADCONF_I2C3_SDA[15:0] 0x480021C4 mode0: i2c3_sda
CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170
the pinmux address of gpio 170 must be 0x480021C6.
The former wrong address broke i2c3 (used by hdmi ddc), resulting in
kernel message:
omap_i2c 48060000.i2c: controller timed out
Fixes: 8cecf52befd7 ("ARM: omap3-beagle.dts: add display information")
Signed-off-by: Carl Frederik Werner <frederik@cfbw.eu>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1dbdad75074d16c3e3005180f81a01cdc04a7872 upstream.
The i2c5 pinctrl offsets are wrong. If the bootloader doesn't set the
pins up, communication with tca6424a doesn't work (controller timeouts)
and it is not possible to enable HDMI.
Fixes: 9be495c42609 ("ARM: dts: omap5-evm: Add I2c pinctrl data")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ae85dc7687c7e7119053d83d02c560ea217b772 upstream.
In (23a4e40 arm: kgdb: Handle read-only text / modules) we moved to
using patch_text() to set breakpoints so that we could handle the case
when we had CONFIG_DEBUG_RODATA. That patch used patch_text().
Unfortunately, patch_text() assumes that we're not in atomic context
when it runs since it needs to grab a mutex and also wait for other
CPUs to stop (which it does with a completion).
This would result in a stack crawl if you had
CONFIG_DEBUG_ATOMIC_SLEEP and tried to set a breakpoint in kgdb. The
crawl looked something like:
BUG: scheduling while atomic: swapper/0/0/0x00010007
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc7-00133-geb63b34 #1073
Hardware name: Rockchip (Device Tree)
(unwind_backtrace) from [<c00133d4>] (show_stack+0x20/0x24)
(show_stack) from [<c05400e8>] (dump_stack+0x84/0xb8)
(dump_stack) from [<c004913c>] (__schedule_bug+0x54/0x6c)
(__schedule_bug) from [<c054065c>] (__schedule+0x80/0x668)
(__schedule) from [<c0540cfc>] (schedule+0xb8/0xd4)
(schedule) from [<c0543a3c>] (schedule_timeout+0x2c/0x234)
(schedule_timeout) from [<c05417c0>] (wait_for_common+0xf4/0x188)
(wait_for_common) from [<c0541874>] (wait_for_completion+0x20/0x24)
(wait_for_completion) from [<c00a0104>] (__stop_cpus+0x58/0x70)
(__stop_cpus) from [<c00a0580>] (stop_cpus+0x3c/0x54)
(stop_cpus) from [<c00a06c4>] (__stop_machine+0xcc/0xe8)
(__stop_machine) from [<c00a0714>] (stop_machine+0x34/0x44)
(stop_machine) from [<c00173e8>] (patch_text+0x28/0x34)
(patch_text) from [<c001733c>] (kgdb_arch_set_breakpoint+0x40/0x4c)
(kgdb_arch_set_breakpoint) from [<c00a0d68>] (kgdb_validate_break_address+0x2c/0x60)
(kgdb_validate_break_address) from [<c00a0e90>] (dbg_set_sw_break+0x1c/0xdc)
(dbg_set_sw_break) from [<c00a2e88>] (gdb_serial_stub+0x9c4/0xba4)
(gdb_serial_stub) from [<c00a11cc>] (kgdb_cpu_enter+0x1f8/0x60c)
(kgdb_cpu_enter) from [<c00a18cc>] (kgdb_handle_exception+0x19c/0x1d0)
(kgdb_handle_exception) from [<c0016f7c>] (kgdb_compiled_brk_fn+0x30/0x3c)
(kgdb_compiled_brk_fn) from [<c00091a4>] (do_undefinstr+0x1a4/0x20c)
(do_undefinstr) from [<c001400c>] (__und_svc_finish+0x0/0x34)
It turns out that when we're in kgdb all the CPUs are stopped anyway
so there's no reason we should be calling patch_text(). We can
instead directly call __patch_text() which assumes that CPUs have
already been stopped.
Fixes: 23a4e4050ba9 ("arm: kgdb: Handle read-only text / modules")
Reported-by: Aapo Vienamo <avienamo@nvidia.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fe2b592173ff0274e70dc44d1d28c19bb995aa7c upstream.
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.
Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a077224fd35b2f7fbc93f14cf67074fc792fbac2 upstream.
While working on the 32-bit ARM port of UEFI, I noticed a strange
corruption in the kernel log. The following snprintf() statement
(in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
was producing the following output in the log:
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | | | | |UC]
|RUN| | | | | | | |UC]
As it turns out, this is caused by incorrect code being emitted for
the string() function in lib/vsprintf.c. The following code
if (!(spec.flags & LEFT)) {
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
}
for (i = 0; i < len; ++i) {
if (buf < end)
*buf = *s;
++buf; ++s;
}
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
when called with len == 0, triggers an issue in the GCC SRA optimization
pass (Scalar Replacement of Aggregates), which handles promotion of signed
struct members incorrectly. This is a known but as yet unresolved issue.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
case, it is causing the second while loop to be executed erroneously a
single time, causing the additional space characters to be printed.
So disable the optimization by passing -fno-ipa-sra.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b55613f42e8d40d5c9ccb8970bde6af4764b2ab upstream.
When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
IT state when entering a signal handler. This can cause the first
few instructions to be conditionally executed depending on the parent
context.
In any case, the original test for >= ARMv7 is broken - ARMv6 can have
Thumb-2 support as well, and an ARMv6T2 specific build would omit this
code too.
Relax the test back to ARMv6 or greater. This results in us always
clearing the IT state bits in the PSR, even on CPUs where these bits
are reserved. However, they're reserved for the IT state, so this
should cause no harm.
Fixes: d71e1352e240 ("Clear the IT state when invoking a Thumb-2 signal handler")
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 728d29400488d54974d3317fe8a232b45fdb42ee upstream.
The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but
NCT6775.
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream.
Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id. When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).
With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.
To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0919e4445190da18496d31aac08b90828a47d45f upstream.
Commit f2147de33470 ("watchdog: sunxi: support parameterized compatible
strings") introduced a regression in sunxi_wdt_start(), by which
the system reset function of the watchdog is not enabled upon
starting the watchdog. As a result, the system is not reset when the
watchdog expires. Fix it.
Fixes: f2147de33470 ("watchdog: sunxi: support parameterized compatible strings")
Signed-off-by: Francesco Lavra <francescolavra.fl@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 57ffc5ca679f499f4704fd9b6a372916f59930ee upstream.
Its currently possible to drop the last refcount to the aux buffer
from NMI context, which results in the expected fireworks.
The refcounting needs a bigger overhaul, but to cure the immediate
problem, delay the freeing by using an irq_work.
Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit caa470475d9b59eeff093ae650800d34612c4379 upstream.
The original patch introducing this header wrote the number of CPUs available
and online in one order and then swapped those values when reading, fix it.
Before:
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 4
# echo 0 > /sys/devices/system/cpu/cpu2/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 3
# echo 0 > /sys/devices/system/cpu/cpu1/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 2
After the fix, bringing back the CPUs online:
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 2
# nrcpus avail : 4
# echo 1 > /sys/devices/system/cpu/cpu2/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 3
# nrcpus avail : 4
# echo 1 > /sys/devices/system/cpu/cpu1/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 4
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)")
Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 93df8a1ed6231727c5db94a80b1a6bd5ee67cec3 upstream.
perf currently fails to build on MIPS as there is no
tools/perf/arch/mips/Build file. Adding an empty file fixes this as
there are no MIPS-specific sources to build.
It looks like the same is needed for Alpha and PA-RISC, though I
haven't been able to test those.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 5e8c0fb6a957 ("perf build: Add arch x86 objects building")
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 601083cffb7cabdcc55b8195d732f0f7028570fa upstream.
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e8e6d37e73e6b950c891c780745460b87f4755b6 upstream.
When we introduce a new sort key, we need to update the
hists__calc_col_len() function accordingly, otherwise the width
will be limited to strlen(header).
We can't update it when obtaining a line value for a column (for
instance, in sort__srcline_cmp()), because we reset it all when doing a
resort (see hists__output_recalc_col_len()), so we need to, from what is
in the hist_entry fields, set each of the column widths.
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 409a8be61560 ("perf tools: Add sort by src line/number")
Link: http://lkml.kernel.org/n/tip-jgbe0yx8v1gs89cslr93pvz2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b5cabbcbd157a4bf5a92dfc85134999a3b55342d upstream.
A copy of /proc/kcore containing the kernel text can be made to the
buildid cache. e.g.
perf buildid-cache -v -k /proc/kcore
To workaround objdump limitations, a copy is also made when annotating
against /proc/kcore.
The copying process stops working from libelf about v1.62 onwards (the
problem was found with v1.63).
The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
because additional validation has been added to gelf_getphdr().
The use of gelf_getphdr() is a misguided attempt to get default
initialization of the Gelf_Phdr structure. That should not be
necessary because every member of the Gelf_Phdr structure is
subsequently assigned. So just remove the call to gelf_getphdr().
Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
removed also.
Committer notes:
Note to stable@kernel.org, from Adrian in the cover letter for this
patchkit:
The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
it is important enough for stable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ebfb4988f0378e2ac3b4a0aa1ea20d724293f392 upstream.
Sasha reported that we can get here with .idx==-1, and
cpuc->event_constraints unallocated.
Suggested-by: Stephane Eranian <eranian@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b371b5943178 ("perf/x86: Fix event/group validation")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 53147b6cabee5e8d1997b5682fcc0c3b72ddf9c2 upstream.
Commit a2b3471b5b13 ("toshiba_acpi: Use the Hotkey Event Type function
for keymap choosing") changed the *setup_keyboard function to query for
the Hotkey Event Type to help choose the correct keymap, but turns out
that here are certain Toshiba models out there not implementing this
feature, and thus, failing to continue the input device registration and
leaving such laptops without hotkey support.
This patch changes such check, and instead of returning an error if
the Hotkey Event Type is not present, we simply inform userspace about it,
changing the message printed from err to notice, making the function
responsible for registering the input device to continue.
This issue was found on a Toshiba Portege Z30-B, but there might be
some other models out there affected by this regression as well.
Signed-off-by: Azael Avalos <coproscefalo@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch fixes a v4.1 only regression bug as reported by Martin
where UNIT_ATTENTION checking for pre v4.2-rc1 RCU conversion code
legacy se_node_acl->device_list[] was hitting a NULL pointer
dereference in:
[ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G I 4.1.6-fixxcopy+ #1
[ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012
[ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod]
[ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000
[ 1858.639822] RIP: 0010:[<ffffffffa01d3774>] [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod]
[ 1858.639884] RSP: 0018:ffff880317943ce0 EFLAGS: 00010282
[ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000
[ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408
[ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001
[ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000
[ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408
[ 1858.640100] FS: 0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000
[ 1858.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0
[ 1858.640210] Stack:
[ 1858.640223] ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000
[ 1858.640267] ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400
[ 1858.640310] ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400
[ 1858.640354] Call Trace:
[ 1858.640379] [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod]
[ 1858.640429] [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod]
[ 1858.640479] [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod]
[ 1858.640526] [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0
[ 1858.640562] [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430
[ 1858.640595] [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560
[ 1858.640627] [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390
[ 1858.640661] [<ffffffff810913b3>] ? kthread+0xd3/0xf0
[ 1858.640689] [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180
Also, check for the same se_node_acl->device_list[] during EXTENDED_COPY
operation as a non-holding persistent reservation port.
Reported-by: Martin Svec <martin,svec@zoner.cz>
Tested-by: Martin Svec <martin,svec@zoner.cz>
Cc: Martin Svec <martin,svec@zoner.cz>
Cc: Alex Gorbachev <ag@iss-integration.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3e03c4b01da3e6a5f3081eb0aa252490fe83e352 upstream.
The iscsi target core teardown sequence calls wait_conn for
all active commands to finish gracefully by:
- move the queue-pair to error state
- drain all the completions
- wait for the core to finish handling all session commands
However, when tearing down a session while there are sequenced
commands that are still waiting for unsolicited data outs, we can
block forever as these are missing an extra reference put.
We basically need the equivalent of iscsit_free_queue_reqs_for_conn()
which is called after wait_conn has returned. Address this by an
explicit walk on conn_cmd_list and put the extra reference.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a4c15cd957cbd728f685645de7a150df5912591a upstream.
As documented in iscsit_sequence_cmd:
/*
* Existing callers for iscsit_sequence_cmd() will silently
* ignore commands with CMDSN_LOWER_THAN_EXP, so force this
* return for CMDSN_MAXCMDSN_OVERRUN as well..
*/
We need to silently finish a command when it's in ISTATE_REMOVE.
This fixes an teardown hang we were seeing where a mis-behaved
initiator (triggered by allocation error injections) sent us a
cmdsn which was lower than expected.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4416f89b8cfcb794d040fc3b68e5fb159b7d8d02 upstream.
This patch is a >= v4.1 regression bug-fix where control CDB
emulation logic in commit 38b57f82 now expects a se_cmd->se_sess
pointer to exist when determining T10-PI support is to be exposed
for initiator host ports.
To address this bug, go ahead and add locally generated se_cmd
descriptors for copy-offload block-copy to it's own stand-alone
se_session nexus, while the parent EXTENDED_COPY se_cmd descriptor
remains associated with it's originating se_cmd->se_sess nexus.
Note a valid se_cmd->se_sess is also required for future support
of WRITE_INSERT and READ_STRIP software emulation when submitting
backend I/O to se_device that exposes T10-PI suport.
Reported-by: Alex Gorbachev <ag@iss-integration.com>
Tested-by: Alex Gorbachev <ag@iss-integration.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 537b604c8b3aa8b96fe35f87dd085816552e294c upstream.
b9d5c6b7ef57 ("[SCSI] cleanup setting task state in
scsi_error_handler()") has introduced a race between scsi_error_handler
and scsi_host_dev_release resulting in the hang when the device goes
away because scsi_error_handler might miss a wake up:
CPU0 CPU1
scsi_error_handler scsi_host_dev_release
kthread_stop()
kthread_should_stop()
test_bit(KTHREAD_SHOULD_STOP)
set_bit(KTHREAD_SHOULD_STOP)
wake_up_process()
wait_for_completion()
set_current_state(TASK_INTERRUPTIBLE)
schedule()
The most straightforward solution seems to be to invert the ordering of
the set_current_state and kthread_should_stop.
The issue has been noticed during reboot test on a 3.0 based kernel but
the current code seems to be affected in the same way.
[jejb: additional comment added]
Reported-and-debugged-by: Mike Mayer <Mike.Meyer@teradata.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 76c28f1fcfeb42b47f798fe498351ee1d60086ae upstream.
Revert commit 1997e6259, which causes double brackets on ipv6
inaddr_any addresses.
Since we have np_sockaddr, if we need a textual representation we can
use "%pISc".
Change iscsit_add_network_portal() and iscsit_add_np() signatures to remove
*ip_str parameter.
Fix and extend some comments earlier in the function.
Tested to work for :: and ::1 via iscsiadm, previously :: failed, see
https://bugzilla.redhat.com/show_bug.cgi?id=1249107 .
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2619d7e9c92d524cb155ec89fd72875321512e5b upstream.
The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.
Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:
dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.
Per include/linux/kernel.h:
"abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).
This patch fixes the issue by using abs64() instead.
Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Tested-by: Nuno Goncalves <nunojpg@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e022e717f54897e396504306d0c9b61452adf4e upstream.
In guest_exit_cont we call kvmhv_commence_exit which expects the trap
number as the argument. However r3 doesn't contain the trap number at
this point and as a result we would be calling the function with a
spurious trap number.
Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
r12 contains the trap number.
Fixes: eddb60fb1443
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3eb4ee68254235e4f47bc0410538fcdaede39589 upstream.
Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.
Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 688bc577ac42ae3d07c889a1f0a72f0b23763d58 upstream.
When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.
The fix is to unconditionally turn off the virtual timer on guest
exit.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eefd6b06b17c5478e7c24bea6f64beaa2c431ca6 upstream.
We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS
and once on KVM_FAST_MMIO_BUS but with a single iodev
instance. This will lead to an issue: kvm_io_bus_destroy() knows
nothing about the devices on two buses pointing to a single dev. Which
will lead to double free[1] during exit. Fix this by allocating two
instances of iodevs then registering one on KVM_MMIO_BUS and another
on KVM_FAST_MMIO_BUS.
CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292
RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
Stack:
ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
Call Trace:
[<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
[<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
[<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
[<ffffffff811f69f7>] __fput+0xe7/0x250
[<ffffffff811f6bae>] ____fput+0xe/0x10
[<ffffffff81093f04>] task_work_run+0xd4/0xf0
[<ffffffff81079358>] do_exit+0x368/0xa50
[<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
[<ffffffff81079ad5>] do_group_exit+0x45/0xb0
[<ffffffff81085c71>] get_signal+0x291/0x750
[<ffffffff810144d8>] do_signal+0x28/0xab0
[<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
[<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
[<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
[<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
[<ffffffff817cb9af>] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
RSP <ffff88020e7f3bc8>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 85da11ca587c8eb73993a1b503052391a73586f9 upstream.
This patch factors out core eventfd assign/deassign logic and leaves
the argument checking and bus index selection to callers.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8f4216c7d28976f7ec1b2bcbfa0a9f787133c45e upstream.
Currently, if we had a zero length mmio eventfd assigned on
KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
always compares the kvm_io_range() with the length that guest
wrote. This will cause e.g for vhost, kick will be trapped by qemu
userspace instead of vhost. Fixing this by using zero length if an
iodevice is zero length.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8453fecbecae26edb3f278627376caab05d9a88d upstream.
We only want zero length mmio eventfd to be registered on
KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
make sure this.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04bb92e4b4cf06a66889d37b892b78f926faa9d4 upstream.
Reference SDM 28.1:
The current VPID is 0000H in the following situations:
- Outside VMX operation. (This includes operation in system-management
mode under the default treatment of SMIs and SMM with VMX operation;
see Section 34.14.)
- In VMX root operation.
- In VMX non-root operation when the “enable VPID” VM-execution control
is 0.
The VPID should never be 0000H in non-root operation when "enable VPID"
VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx:
move some vmx setting from vmx_init() to hardware_setup()") remove the
codes which reserve 0000H for VMX root operation.
This patch fix it by again reserving 0000H for VMX root operation.
Fixes: 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4
Reported-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca09f02f122b2ecb0f5ddfc5fd47b29ed657d4fd upstream.
A critical bug has been found in device memory stage1 translation for
VMs with more then 4GB of address space. Once vm_pgoff size is smaller
then pa (which is true for LPAE case, u32 and u64 respectively) some
more significant bits of pa may be lost as a shift operation is performed
on u32 and later cast onto u64.
Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12
expected pa(u64): 0x0000002010030000
produced pa(u64): 0x0000000010030000
The fix is to change the order of operations (casting first onto phys_addr_t
and then shifting).
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: fixed changelog and patch formatting]
Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
commit 8a1513b49321e503fd6c8b6793e3b1f9a8a3285b upstream.
Do not write initialize magic on systems that do not have
feature query 0xb. Fixes Bug #82451.
Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd
for code clearity.
Add a new test function, hp_wmi_bios_2008_later() & simplify
hp_wmi_bios_2009_later(), which fixes a bug in cases where
an improper value is returned. Probably also fixes Bug #69131.
Add missing __init tag.
Signed-off-by: Kyle Evans <kvans32@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream.
zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.
While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.
Fixes: beca3ec71fe5 ("zram: add multi stream functionality")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit da314c9923fed553a007785a901fd395b7eb6c19 ]
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way. You have to pair store_release
> and load_acquire. Besides, it isn't a particularly good idea to
OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.
> depend on memory barriers embedded in other data structures like the
> above. Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.
But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.
> There's no reason to be overly smart here. This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.
It's not about being overly smart. It's about actually understanding
what's going on with the code. I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.
> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
> > }
> > }
> >
> > - if (!nlk->portid) {
> > + if (!nlk->bound) {
>
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable. That doesn't change anything. Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.
The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.
However, there is a real bug here that none of these acquire/release
helpers discovered. The two bound tests here used to be a single
one. Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket. So we need to
repeat the portid check in order to maintain consistency.
> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
> > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> > return -EPERM;
> >
> > - if (!nlk->portid)
> > + if (!nlk->bound)
>
> Don't we need load_acquire here too? Is this path holding a lock
> which makes that unnecessary?
Ditto.
---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.
Tejun is right that a barrier is unavoidable. Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.
Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.
I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one. This
combined with only reading nlk->bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.
Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nacked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]
The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
Reset portid after netlink_insert failure") introduced a race
condition where if two threads try to autobind the same socket
one of them may end up with a zero port ID. This led to kernel
deadlocks that were observed by multiple people.
This patch reverts that commit and instead fixes it by introducing
a separte rhash_portid variable so that the real portid is only set
after the socket has been successfully hashed.
Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|