From d2f642b05832f54e46bb7cc601068038c27901ea Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Wed, 17 Oct 2018 17:54:00 -0700 Subject: ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup [ Upstream commit eef3dc34a1e0b01d53328b88c25237bcc7323777 ] When building the kernel with Clang, the following section mismatch warning appears: WARNING: vmlinux.o(.text+0x38b3c): Section mismatch in reference from the function omap44xx_prm_late_init() to the function .init.text:omap44xx_prm_enable_io_wakeup() The function omap44xx_prm_late_init() references the function __init omap44xx_prm_enable_io_wakeup(). This is often because omap44xx_prm_late_init lacks a __init annotation or the annotation of omap44xx_prm_enable_io_wakeup is wrong. Remove the __init annotation from omap44xx_prm_enable_io_wakeup so there is no more mismatch. Signed-off-by: Nathan Chancellor Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin --- arch/arm/mach-omap2/prm44xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/prm44xx.c b/arch/arm/mach-omap2/prm44xx.c index 30768003f854..8c505284bc0c 100644 --- a/arch/arm/mach-omap2/prm44xx.c +++ b/arch/arm/mach-omap2/prm44xx.c @@ -344,7 +344,7 @@ static void omap44xx_prm_reconfigure_io_chain(void) * to occur, WAKEUPENABLE bits must be set in the pad mux registers, and * omap44xx_prm_reconfigure_io_chain() must be called. No return value. */ -static void __init omap44xx_prm_enable_io_wakeup(void) +static void omap44xx_prm_enable_io_wakeup(void) { s32 inst = omap4_prmst_get_prm_dev_inst(); -- cgit v1.2.3 From d7589b83d628eeb0850d61c354dbf47e15befc10 Mon Sep 17 00:00:00 2001 From: Janusz Krzysztofik Date: Wed, 7 Nov 2018 22:30:31 +0100 Subject: ARM: OMAP1: ams-delta: Fix possible use of uninitialized field [ Upstream commit cec83ff1241ec98113a19385ea9e9cfa9aa4125b ] While playing with initialization order of modem device, it has been discovered that under some circumstances (early console init, I believe) its .pm() callback may be called before the uart_port->private_data pointer is initialized from plat_serial8250_port->private_data, resulting in NULL pointer dereference. Fix it by checking for uninitialized pointer before using it in modem_pm(). Fixes: aabf31737a6a ("ARM: OMAP1: ams-delta: update the modem to use regulator API") Signed-off-by: Janusz Krzysztofik Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin --- arch/arm/mach-omap1/board-ams-delta.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm/mach-omap1/board-ams-delta.c b/arch/arm/mach-omap1/board-ams-delta.c index a95499ea8706..fa1d41edce68 100644 --- a/arch/arm/mach-omap1/board-ams-delta.c +++ b/arch/arm/mach-omap1/board-ams-delta.c @@ -511,6 +511,9 @@ static void modem_pm(struct uart_port *port, unsigned int state, unsigned old) { struct modem_private_data *priv = port->private_data; + if (!priv) + return; + if (IS_ERR(priv->regulator)) return; -- cgit v1.2.3 From 9ba7a303ff290d21bade27317337505640e556c0 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Tue, 13 Nov 2018 15:38:22 +0000 Subject: s390/cpum_cf: Reject request for sampling in event initialization [ Upstream commit 613a41b0d16e617f46776a93b975a1eeea96417c ] On s390 command perf top fails [root@s35lp76 perf] # ./perf top -F100000 --stdio Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' [root@s35lp76 perf] # Using event -e rb0000 works as designed. Event rb0000 is the event number of the sampling facility for basic sampling. During system start up the following PMUs are installed in the kernel's PMU list (from head to tail): cpum_cf --> s390 PMU counter facility device driver cpum_sf --> s390 PMU sampling facility device driver uprobe kprobe tracepoint task_clock cpu_clock Perf top executes following functions and calls perf_event_open(2) system call with different parameters many times: cmd_top --> __cmd_top --> perf_evlist__add_default --> __perf_evlist__add_default --> perf_evlist__new_cycles (creates event type:0 (HW) config 0 (CPU_CYCLES) --> perf_event_attr__set_max_precise_ip Uses perf_event_open(2) to detect correct precise_ip level. Fails 3 times on s390 which is ok. Then functions cmd_top --> __cmd_top --> perf_top__start_counters -->perf_evlist__config --> perf_can_comm_exec --> perf_probe_api This functions test support for the following events: "cycles:u", "instructions:u", "cpu-clock:u" using --> perf_do_probe_api --> perf_event_open_cloexec Test the close on exec flag support with perf_event_open(2). perf_do_probe_api returns true if the event is supported. The function returns true because event cpu-clock is supported by the PMU cpu_clock. This is achieved by many calls to perf_event_open(2). Function perf_top__start_counters now calls perf_evsel__open() for every event, which is the default event cpu_cycles (config:0) and type HARDWARE (type:0) which a predfined frequence of 4000. Given the above order of the PMU list, the PMU cpum_cf gets called first and returns 0, which indicates support for this sampling. The event is fully allocated in the function perf_event_open (file kernel/event/core.c near line 10521 and the following check fails: event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); goto err_cred; } if (is_sampling_event(event)) { if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) { err = -EOPNOTSUPP; goto err_alloc; } } The check for the interrupt capabilities fails and the system call perf_event_open() returns -EOPNOTSUPP (-95). Add a check to return -ENODEV when sampling is requested in PMU cpum_cf. This allows common kernel code in the perf_event_open() system call to test the next PMU in above list. Fixes: 97b1198fece0 (" "s390, perf: Use common PMU interrupt disabled code") Signed-off-by: Thomas Richter Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin --- arch/s390/kernel/perf_cpum_cf.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c index 929c147e07b4..1b69bfdf59f9 100644 --- a/arch/s390/kernel/perf_cpum_cf.c +++ b/arch/s390/kernel/perf_cpum_cf.c @@ -344,6 +344,8 @@ static int __hw_perf_event_init(struct perf_event *event) break; case PERF_TYPE_HARDWARE: + if (is_sampling_event(event)) /* No sampling support */ + return -ENOENT; ev = attr->config; /* Count user space (problem-state) only */ if (!attr->exclude_user && attr->exclude_kernel) { -- cgit v1.2.3 From afca87e81681a55a1db525e1ea66cbf601a6fee9 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Thu, 8 Nov 2018 16:48:36 +0800 Subject: KVM: x86: fix empty-body warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 354cb410d87314e2eda344feea84809e4261570a ] We get the following warnings about empty statements when building with 'W=1': arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] Rework the debug helper macro to get rid of these warnings. Signed-off-by: Yi Wang Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin --- arch/x86/kvm/lapic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index a1afd80a68aa..3c70f6c76d3a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -56,7 +56,7 @@ #define APIC_BUS_CYCLE_NS 1 /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */ -#define apic_debug(fmt, arg...) +#define apic_debug(fmt, arg...) do {} while (0) #define APIC_LVT_NUM 6 /* 14 is the version for Xeon and Pentium 8.4.8*/ -- cgit v1.2.3 From e90c6ad207bcb7a599c259596c9a9e1bb15eb7bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Date: Mon, 8 Aug 2016 20:16:22 +0200 Subject: KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d048c098218e91ed0e10dfa1f0f80e2567fe4ef7 upstream. msr bitmap can be used to avoid a VM exit (interception) on guest MSR accesses. In some configurations of VMX controls, the guest can even directly access host's x2APIC MSRs. See SDM 29.5 VIRTUALIZING MSR-BASED APIC ACCESSES. L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI. To do so, L1 would first trick KVM to disable all possible interceptions by enabling APICv features and then would turn those features off; nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would not intercept previously enabled MSRs even though they were not safe with the new configuration. Correctly re-enabling interceptions is not enough as a second bug would still allow L1+L2 to access host's MSRs: msr bitmap was shared for all VMCSs, so L1 could trigger a race to get the desired combination of msr bitmap and VMX controls. This fix allocates a msr bitmap for every L1 VCPU, allows only safe x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would have to intercept everything anyway. Fixes: 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap") Reported-by: Jim Mattson Suggested-by: Wincy Van Reviewed-by: Wanpeng Li Signed-off-by: Radim Krčmář [bwh: Backported to 4.4: - handle_vmon() doesn't allocate a cached vmcs12 - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 96 +++++++++++++++++++++--------------------------------- 1 file changed, 38 insertions(+), 58 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c5a4b1978cbf..3df636400ec8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -431,6 +431,8 @@ struct nested_vmx { u16 posted_intr_nv; u64 msr_ia32_feature_control; + unsigned long *msr_bitmap; + struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -912,7 +914,6 @@ static unsigned long *vmx_msr_bitmap_legacy; static unsigned long *vmx_msr_bitmap_longmode; static unsigned long *vmx_msr_bitmap_legacy_x2apic; static unsigned long *vmx_msr_bitmap_longmode_x2apic; -static unsigned long *vmx_msr_bitmap_nested; static unsigned long *vmx_vmread_bitmap; static unsigned long *vmx_vmwrite_bitmap; @@ -2358,7 +2359,7 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) unsigned long *msr_bitmap; if (is_guest_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_nested; + msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; else if (vcpu->arch.apic_base & X2APIC_ENABLE) { if (is_long_mode(vcpu)) msr_bitmap = vmx_msr_bitmap_longmode_x2apic; @@ -6192,13 +6193,6 @@ static __init int hardware_setup(void) if (!vmx_msr_bitmap_longmode_x2apic) goto out4; - if (nested) { - vmx_msr_bitmap_nested = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx_msr_bitmap_nested) - goto out5; - } - vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); if (!vmx_vmread_bitmap) goto out6; @@ -6216,8 +6210,6 @@ static __init int hardware_setup(void) memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - if (nested) - memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE); if (setup_vmcs_config(&vmcs_config) < 0) { r = -EIO; @@ -6354,9 +6346,6 @@ out8: out7: free_page((unsigned long)vmx_vmread_bitmap); out6: - if (nested) - free_page((unsigned long)vmx_msr_bitmap_nested); -out5: free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic); out4: free_page((unsigned long)vmx_msr_bitmap_longmode); @@ -6382,8 +6371,6 @@ static __exit void hardware_unsetup(void) free_page((unsigned long)vmx_io_bitmap_a); free_page((unsigned long)vmx_vmwrite_bitmap); free_page((unsigned long)vmx_vmread_bitmap); - if (nested) - free_page((unsigned long)vmx_msr_bitmap_nested); free_kvm_area(); } @@ -6825,10 +6812,17 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; } + if (cpu_has_vmx_msr_bitmap()) { + vmx->nested.msr_bitmap = + (unsigned long *)__get_free_page(GFP_KERNEL); + if (!vmx->nested.msr_bitmap) + goto out_msr_bitmap; + } + if (enable_shadow_vmcs) { shadow_vmcs = alloc_vmcs(); if (!shadow_vmcs) - return -ENOMEM; + goto out_shadow_vmcs; /* mark vmcs as shadow */ shadow_vmcs->revision_id |= (1u << 31); /* init shadow vmcs */ @@ -6850,6 +6844,12 @@ static int handle_vmon(struct kvm_vcpu *vcpu) skip_emulated_instruction(vcpu); nested_vmx_succeed(vcpu); return 1; + +out_shadow_vmcs: + free_page((unsigned long)vmx->nested.msr_bitmap); + +out_msr_bitmap: + return -ENOMEM; } /* @@ -6919,6 +6919,10 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->nested.vmxon = false; free_vpid(vmx->nested.vpid02); nested_release_vmcs12(vmx); + if (vmx->nested.msr_bitmap) { + free_page((unsigned long)vmx->nested.msr_bitmap); + vmx->nested.msr_bitmap = NULL; + } if (enable_shadow_vmcs) free_vmcs(vmx->nested.current_shadow_vmcs); /* Unpin physical memory we referred to in current vmcs02 */ @@ -9248,8 +9252,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, { int msr; struct page *page; - unsigned long *msr_bitmap; + unsigned long *msr_bitmap_l1; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; + /* This shortcut is ok because we support only x2APIC MSRs so far. */ if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) return false; @@ -9258,58 +9264,32 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, WARN_ON(1); return false; } - msr_bitmap = (unsigned long *)kmap(page); + msr_bitmap_l1 = (unsigned long *)kmap(page); + + memset(msr_bitmap_l0, 0xff, PAGE_SIZE); if (nested_cpu_has_virt_x2apic_mode(vmcs12)) { if (nested_cpu_has_apic_reg_virt(vmcs12)) for (msr = 0x800; msr <= 0x8ff; msr++) nested_vmx_disable_intercept_for_msr( - msr_bitmap, - vmx_msr_bitmap_nested, + msr_bitmap_l1, msr_bitmap_l0, msr, MSR_TYPE_R); - /* TPR is allowed */ - nested_vmx_disable_intercept_for_msr(msr_bitmap, - vmx_msr_bitmap_nested, + + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, APIC_BASE_MSR + (APIC_TASKPRI >> 4), MSR_TYPE_R | MSR_TYPE_W); + if (nested_cpu_has_vid(vmcs12)) { - /* EOI and self-IPI are allowed */ nested_vmx_disable_intercept_for_msr( - msr_bitmap, - vmx_msr_bitmap_nested, + msr_bitmap_l1, msr_bitmap_l0, APIC_BASE_MSR + (APIC_EOI >> 4), MSR_TYPE_W); nested_vmx_disable_intercept_for_msr( - msr_bitmap, - vmx_msr_bitmap_nested, + msr_bitmap_l1, msr_bitmap_l0, APIC_BASE_MSR + (APIC_SELF_IPI >> 4), MSR_TYPE_W); } - } else { - /* - * Enable reading intercept of all the x2apic - * MSRs. We should not rely on vmcs12 to do any - * optimizations here, it may have been modified - * by L1. - */ - for (msr = 0x800; msr <= 0x8ff; msr++) - __vmx_enable_intercept_for_msr( - vmx_msr_bitmap_nested, - msr, - MSR_TYPE_R); - - __vmx_enable_intercept_for_msr( - vmx_msr_bitmap_nested, - APIC_BASE_MSR + (APIC_TASKPRI >> 4), - MSR_TYPE_W); - __vmx_enable_intercept_for_msr( - vmx_msr_bitmap_nested, - APIC_BASE_MSR + (APIC_EOI >> 4), - MSR_TYPE_W); - __vmx_enable_intercept_for_msr( - vmx_msr_bitmap_nested, - APIC_BASE_MSR + (APIC_SELF_IPI >> 4), - MSR_TYPE_W); } kunmap(page); nested_release_page_clean(page); @@ -9729,10 +9709,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) } if (cpu_has_vmx_msr_bitmap() && - exec_control & CPU_BASED_USE_MSR_BITMAPS) { - nested_vmx_merge_msr_bitmap(vcpu, vmcs12); - /* MSR_BITMAP will be set by following vmx_set_efer. */ - } else + exec_control & CPU_BASED_USE_MSR_BITMAPS && + nested_vmx_merge_msr_bitmap(vcpu, vmcs12)) + ; /* MSR_BITMAP will be set by following vmx_set_efer. */ + else exec_control &= ~CPU_BASED_USE_MSR_BITMAPS; /* -- cgit v1.2.3 From 1bec1a14bb080e86af254984135cd83e76f1f91d Mon Sep 17 00:00:00 2001 From: David Matlack Date: Tue, 1 Aug 2017 14:00:40 -0700 Subject: KVM: nVMX: mark vmcs12 pages dirty on L2 exit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit c9f04407f2e0b3fc9ff7913c65fcfcb0a4b61570 upstream. The host physical addresses of L1's Virtual APIC Page and Posted Interrupt descriptor are loaded into the VMCS02. The CPU may write to these pages via their host physical address while L2 is running, bypassing address-translation-based dirty tracking (e.g. EPT write protection). Mark them dirty on every exit from L2 to prevent them from getting out of sync with dirty tracking. Also mark the virtual APIC page and the posted interrupt descriptor dirty when KVM is virtualizing posted interrupt processing. Signed-off-by: David Matlack Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 53 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3df636400ec8..b886a7c9ed4b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4527,6 +4527,28 @@ static int vmx_cpu_uses_apicv(struct kvm_vcpu *vcpu) return enable_apicv && lapic_in_kernel(vcpu); } +static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + gfn_t gfn; + + /* + * Don't need to mark the APIC access page dirty; it is never + * written to by the CPU during APIC virtualization. + */ + + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { + gfn = vmcs12->virtual_apic_page_addr >> PAGE_SHIFT; + kvm_vcpu_mark_page_dirty(vcpu, gfn); + } + + if (nested_cpu_has_posted_intr(vmcs12)) { + gfn = vmcs12->posted_intr_desc_addr >> PAGE_SHIFT; + kvm_vcpu_mark_page_dirty(vcpu, gfn); + } +} + + static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -4534,18 +4556,15 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) void *vapic_page; u16 status; - if (vmx->nested.pi_desc && - vmx->nested.pi_pending) { - vmx->nested.pi_pending = false; - if (!pi_test_and_clear_on(vmx->nested.pi_desc)) - return; - - max_irr = find_last_bit( - (unsigned long *)vmx->nested.pi_desc->pir, 256); + if (!vmx->nested.pi_desc || !vmx->nested.pi_pending) + return; - if (max_irr == 256) - return; + vmx->nested.pi_pending = false; + if (!pi_test_and_clear_on(vmx->nested.pi_desc)) + return; + max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256); + if (max_irr != 256) { vapic_page = kmap(vmx->nested.virtual_apic_page); __kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page); kunmap(vmx->nested.virtual_apic_page); @@ -4557,6 +4576,8 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) vmcs_write16(GUEST_INTR_STATUS, status); } } + + nested_mark_vmcs12_pages_dirty(vcpu); } static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu) @@ -7761,6 +7782,18 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) vmcs_read32(VM_EXIT_INTR_ERROR_CODE), KVM_ISA_VMX); + /* + * The host physical addresses of some pages of guest memory + * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU + * may write to these pages via their host physical address while + * L2 is running, bypassing any address-translation-based dirty + * tracking (e.g. EPT write protection). + * + * Mark them dirty on every exit from L2 to prevent them from + * getting out of sync with dirty tracking. + */ + nested_mark_vmcs12_pages_dirty(vcpu); + if (vmx->nested.nested_run_pending) return false; -- cgit v1.2.3 From 81cd492667c69020b3f55bed8eb5bfa4bebf7895 Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Mon, 27 Nov 2017 17:22:25 -0600 Subject: KVM: nVMX: Eliminate vmcs02 pool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit de3a0021a60635de96aa92713c1a31a96747d72c upstream. The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Signed-off-by: Jim Mattson Signed-off-by: Mark Kanda Reviewed-by: Ameya More Reviewed-by: David Hildenbrand Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: - No loaded_vmcs::shadow_vmcs field to initialise - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 144 ++++++++--------------------------------------------- 1 file changed, 22 insertions(+), 122 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b886a7c9ed4b..cf131f41d4b9 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -172,7 +172,6 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 -#define VMCS02_POOL_SIZE 1 struct vmcs { u32 revision_id; @@ -205,7 +204,7 @@ struct shared_msr_entry { * stored in guest memory specified by VMPTRLD, but is opaque to the guest, * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. * More than one of these structures may exist, if L1 runs multiple L2 guests. - * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the * underlying hardware which will be used to run L2. * This structure is packed to ensure that its layout is identical across * machines (necessary for live migration). @@ -384,13 +383,6 @@ struct __packed vmcs12 { */ #define VMCS12_SIZE 0x1000 -/* Used to remember the last vmcs02 used for some recently used vmcs12s */ -struct vmcs02_list { - struct list_head list; - gpa_t vmptr; - struct loaded_vmcs vmcs02; -}; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -412,16 +404,16 @@ struct nested_vmx { */ bool sync_shadow_vmcs; - /* vmcs02_list cache of VMCSs recently used to run L2 guests */ - struct list_head vmcs02_pool; - int vmcs02_num; u64 vmcs01_tsc_offset; bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; + + struct loaded_vmcs vmcs02; + /* - * Guest pages referred to in vmcs02 with host-physical pointers, so - * we must keep them pinned while L2 runs. + * Guest pages referred to in the vmcs02 with host-physical + * pointers, so we must keep them pinned while L2 runs. */ struct page *apic_access_page; struct page *virtual_apic_page; @@ -6434,93 +6426,6 @@ static int handle_monitor(struct kvm_vcpu *vcpu) return handle_nop(vcpu); } -/* - * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. - * We could reuse a single VMCS for all the L2 guests, but we also want the - * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this - * allows keeping them loaded on the processor, and in the future will allow - * optimizations where prepare_vmcs02 doesn't need to set all the fields on - * every entry if they never change. - * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE - * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first. - * - * The following functions allocate and free a vmcs02 in this pool. - */ - -/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */ -static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmx->nested.current_vmptr) { - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { - /* Recycle the least recently used VMCS. */ - item = list_entry(vmx->nested.vmcs02_pool.prev, - struct vmcs02_list, list); - item->vmptr = vmx->nested.current_vmptr; - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - /* Create a new VMCS */ - item = kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL); - if (!item) - return NULL; - item->vmcs02.vmcs = alloc_vmcs(); - if (!item->vmcs02.vmcs) { - kfree(item); - return NULL; - } - loaded_vmcs_init(&item->vmcs02); - item->vmptr = vmx->nested.current_vmptr; - list_add(&(item->list), &(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num++; - return &item->vmcs02; -} - -/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */ -static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmptr) { - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - return; - } -} - -/* - * Free all VMCSs saved for this vcpu, except the one pointed by - * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs - * must be &vmx->vmcs01. - */ -static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item, *n; - - WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01); - list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) { - /* - * Something will leak if the above WARN triggers. Better than - * a use-after-free. - */ - if (vmx->loaded_vmcs == &item->vmcs02) - continue; - - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - } -} - /* * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(), * set the success or error code of an emulated VMX instruction, as specified @@ -6833,6 +6738,11 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; } + vmx->nested.vmcs02.vmcs = alloc_vmcs(); + if (!vmx->nested.vmcs02.vmcs) + goto out_vmcs02; + loaded_vmcs_init(&vmx->nested.vmcs02); + if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); @@ -6851,9 +6761,6 @@ static int handle_vmon(struct kvm_vcpu *vcpu) vmx->nested.current_shadow_vmcs = shadow_vmcs; } - INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num = 0; - hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; @@ -6870,6 +6777,9 @@ out_shadow_vmcs: free_page((unsigned long)vmx->nested.msr_bitmap); out_msr_bitmap: + free_loaded_vmcs(&vmx->nested.vmcs02); + +out_vmcs02: return -ENOMEM; } @@ -6946,7 +6856,7 @@ static void free_nested(struct vcpu_vmx *vmx) } if (enable_shadow_vmcs) free_vmcs(vmx->nested.current_shadow_vmcs); - /* Unpin physical memory we referred to in current vmcs02 */ + /* Unpin physical memory we referred to in the vmcs02 */ if (vmx->nested.apic_access_page) { nested_release_page(vmx->nested.apic_access_page); vmx->nested.apic_access_page = NULL; @@ -6962,7 +6872,7 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->nested.pi_desc = NULL; } - nested_free_all_saved_vmcss(vmx); + free_loaded_vmcs(&vmx->nested.vmcs02); } /* Emulate the VMXOFF instruction */ @@ -6996,8 +6906,6 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) vmptr + offsetof(struct vmcs12, launch_state), &zero, sizeof(zero)); - nested_free_vmcs02(vmx, vmptr); - skip_emulated_instruction(vcpu); nested_vmx_succeed(vcpu); return 1; @@ -7784,10 +7692,11 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) /* * The host physical addresses of some pages of guest memory - * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU - * may write to these pages via their host physical address while - * L2 is running, bypassing any address-translation-based dirty - * tracking (e.g. EPT write protection). + * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC + * Page). The CPU may write to these pages via their host + * physical address while L2 is running, bypassing any + * address-translation-based dirty tracking (e.g. EPT write + * protection). * * Mark them dirty on every exit from L2 to prevent them from * getting out of sync with dirty tracking. @@ -9889,7 +9798,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) struct vmcs12 *vmcs12; struct vcpu_vmx *vmx = to_vmx(vcpu); int cpu; - struct loaded_vmcs *vmcs02; bool ia32e; u32 msr_entry_idx; @@ -10029,10 +9937,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) * the nested entry. */ - vmcs02 = nested_get_current_vmcs02(vmx); - if (!vmcs02) - return -ENOMEM; - enter_guest_mode(vcpu); vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET); @@ -10041,7 +9945,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); cpu = get_cpu(); - vmx->loaded_vmcs = vmcs02; + vmx->loaded_vmcs = &vmx->nested.vmcs02; vmx_vcpu_put(vcpu); vmx_vcpu_load(vcpu, cpu); vcpu->cpu = cpu; @@ -10553,10 +10457,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vm_exit_controls_init(vmx, vmcs_read32(VM_EXIT_CONTROLS)); vmx_segment_cache_clear(vmx); - /* if no vmcs02 cache requested, remove the one we used */ - if (VMCS02_POOL_SIZE == 0) - nested_free_vmcs02(vmx, vmx->nested.current_vmptr); - load_vmcs12_host_state(vcpu, vmcs12); /* Update TSC_OFFSET if TSC was changed while L2 ran */ -- cgit v1.2.3 From 8f54df9756caed1d499bc8f412ab736a8928dc39 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 11 Jan 2018 12:16:15 +0100 Subject: KVM: VMX: introduce alloc_loaded_vmcs commit f21f165ef922c2146cc5bdc620f542953c41714b upstream. Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Signed-off-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: - No loaded_vmcs::shadow_vmcs field to initialise - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cf131f41d4b9..5ffc2731e14d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3345,11 +3345,6 @@ static struct vmcs *alloc_vmcs_cpu(int cpu) return vmcs; } -static struct vmcs *alloc_vmcs(void) -{ - return alloc_vmcs_cpu(raw_smp_processor_id()); -} - static void free_vmcs(struct vmcs *vmcs) { free_pages((unsigned long)vmcs, vmcs_config.order); @@ -3367,6 +3362,21 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs->vmcs = NULL; } +static struct vmcs *alloc_vmcs(void) +{ + return alloc_vmcs_cpu(raw_smp_processor_id()); +} + +static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) +{ + loaded_vmcs->vmcs = alloc_vmcs(); + if (!loaded_vmcs->vmcs) + return -ENOMEM; + + loaded_vmcs_init(loaded_vmcs); + return 0; +} + static void free_kvm_area(void) { int cpu; @@ -6699,6 +6709,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu) struct vmcs *shadow_vmcs; const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + int r; /* The Intel VMX Instruction Reference lists a bunch of bits that * are prerequisite to running VMXON, most notably cr4.VMXE must be @@ -6738,10 +6749,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; } - vmx->nested.vmcs02.vmcs = alloc_vmcs(); - if (!vmx->nested.vmcs02.vmcs) + r = alloc_loaded_vmcs(&vmx->nested.vmcs02); + if (r < 0) goto out_vmcs02; - loaded_vmcs_init(&vmx->nested.vmcs02); if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = @@ -8802,16 +8812,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx->guest_msrs) goto free_pml; - vmx->loaded_vmcs = &vmx->vmcs01; - vmx->loaded_vmcs->vmcs = alloc_vmcs(); - if (!vmx->loaded_vmcs->vmcs) - goto free_msrs; if (!vmm_exclusive) kvm_cpu_vmxon(__pa(per_cpu(vmxarea, raw_smp_processor_id()))); - loaded_vmcs_init(vmx->loaded_vmcs); + err = alloc_loaded_vmcs(&vmx->vmcs01); if (!vmm_exclusive) kvm_cpu_vmxoff(); + if (err < 0) + goto free_msrs; + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); vmx->vcpu.cpu = cpu; -- cgit v1.2.3 From 321fbb1fad297ccbac0efd28e58851a085ac29fa Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 16 Jan 2018 16:51:18 +0100 Subject: KVM: VMX: make MSR bitmaps per-VCPU commit 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6 upstream. Place the MSR bitmap in struct loaded_vmcs, and update it in place every time the x2apic or APICv state can change. This is rare and the loop can handle 64 MSRs per iteration, in a similar fashion as nested_vmx_prepare_msr_bitmap. This prepares for choosing, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs. Suggested-by: Jim Mattson Signed-off-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: - APICv support looked different - We still need to intercept the APIC_ID MSR - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 254 +++++++++++++++++++++++------------------------------ 1 file changed, 112 insertions(+), 142 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5ffc2731e14d..e0064855fbdb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -109,6 +109,14 @@ static u64 __read_mostly host_xss; static bool __read_mostly enable_pml = 1; module_param_named(pml, enable_pml, bool, S_IRUGO); +#define MSR_TYPE_R 1 +#define MSR_TYPE_W 2 +#define MSR_TYPE_RW 3 + +#define MSR_BITMAP_MODE_X2APIC 1 +#define MSR_BITMAP_MODE_X2APIC_APICV 2 +#define MSR_BITMAP_MODE_LM 4 + #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD) @@ -188,6 +196,7 @@ struct loaded_vmcs { struct vmcs *vmcs; int cpu; int launched; + unsigned long *msr_bitmap; struct list_head loaded_vmcss_on_cpu_link; }; @@ -423,8 +432,6 @@ struct nested_vmx { u16 posted_intr_nv; u64 msr_ia32_feature_control; - unsigned long *msr_bitmap; - struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -525,6 +532,7 @@ struct vcpu_vmx { unsigned long host_rsp; u8 fail; bool nmi_known_unmasked; + u8 msr_bitmap_mode; u32 exit_intr_info; u32 idt_vectoring_info; ulong rflags; @@ -883,6 +891,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu); static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); static int alloc_identity_pagetable(struct kvm *kvm); +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -902,10 +911,6 @@ static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock); static unsigned long *vmx_io_bitmap_a; static unsigned long *vmx_io_bitmap_b; -static unsigned long *vmx_msr_bitmap_legacy; -static unsigned long *vmx_msr_bitmap_longmode; -static unsigned long *vmx_msr_bitmap_legacy_x2apic; -static unsigned long *vmx_msr_bitmap_longmode_x2apic; static unsigned long *vmx_vmread_bitmap; static unsigned long *vmx_vmwrite_bitmap; @@ -2346,27 +2351,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to) vmx->guest_msrs[from] = tmp; } -static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) -{ - unsigned long *msr_bitmap; - - if (is_guest_mode(vcpu)) - msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; - else if (vcpu->arch.apic_base & X2APIC_ENABLE) { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode; - else - msr_bitmap = vmx_msr_bitmap_legacy; - } - - vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); -} - /* * Set up the vmcs to automatically save and restore system * msrs. Don't touch the 64-bit msrs if the guest is in legacy @@ -2407,7 +2391,7 @@ static void setup_msrs(struct vcpu_vmx *vmx) vmx->save_nmsrs = save_nmsrs; if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(&vmx->vcpu); + vmx_update_msr_bitmap(&vmx->vcpu); } /* @@ -3360,6 +3344,8 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs_clear(loaded_vmcs); free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; + if (loaded_vmcs->msr_bitmap) + free_page((unsigned long)loaded_vmcs->msr_bitmap); } static struct vmcs *alloc_vmcs(void) @@ -3374,7 +3360,18 @@ static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) return -ENOMEM; loaded_vmcs_init(loaded_vmcs); + + if (cpu_has_vmx_msr_bitmap()) { + loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); + if (!loaded_vmcs->msr_bitmap) + goto out_vmcs; + memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); + } return 0; + +out_vmcs: + free_loaded_vmcs(loaded_vmcs); + return -ENOMEM; } static void free_kvm_area(void) @@ -4373,10 +4370,8 @@ static void free_vpid(int vpid) spin_unlock(&vmx_vpid_lock); } -#define MSR_TYPE_R 1 -#define MSR_TYPE_W 2 -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type) +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) { int f = sizeof(unsigned long); @@ -4410,8 +4405,8 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, } } -static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type) +static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) { int f = sizeof(unsigned long); @@ -4491,37 +4486,76 @@ static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, } } -static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) +static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type, bool value) { - if (!longmode_only) - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, - msr, MSR_TYPE_R | MSR_TYPE_W); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, - msr, MSR_TYPE_R | MSR_TYPE_W); + if (value) + vmx_enable_intercept_for_msr(msr_bitmap, msr, type); + else + vmx_disable_intercept_for_msr(msr_bitmap, msr, type); } -static void vmx_enable_intercept_msr_read_x2apic(u32 msr) +static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) { - __vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, MSR_TYPE_R); - __vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, MSR_TYPE_R); + u8 mode = 0; + + if (irqchip_in_kernel(vcpu->kvm) && apic_x2apic_mode(vcpu->arch.apic)) { + mode |= MSR_BITMAP_MODE_X2APIC; + if (enable_apicv) + mode |= MSR_BITMAP_MODE_X2APIC_APICV; + } + + if (is_long_mode(vcpu)) + mode |= MSR_BITMAP_MODE_LM; + + return mode; } -static void vmx_disable_intercept_msr_read_x2apic(u32 msr) +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) + +static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, + u8 mode) { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, MSR_TYPE_R); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, MSR_TYPE_R); + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; + } + + if (mode & MSR_BITMAP_MODE_X2APIC) { + /* + * TPR reads and writes can be virtualized even if virtual interrupt + * delivery is not in use. + */ + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_ID), MSR_TYPE_R); + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); + } + } } -static void vmx_disable_intercept_msr_write_x2apic(u32 msr) +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, MSR_TYPE_W); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, MSR_TYPE_W); + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; + u8 mode = vmx_msr_bitmap_mode(vcpu); + u8 changed = mode ^ vmx->msr_bitmap_mode; + + if (!changed) + return; + + vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, + !(mode & MSR_BITMAP_MODE_LM)); + + if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) + vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); + + vmx->msr_bitmap_mode = mode; } static int vmx_cpu_uses_apicv(struct kvm_vcpu *vcpu) @@ -4842,7 +4876,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); } if (cpu_has_vmx_msr_bitmap()) - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); + vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ @@ -6183,7 +6217,7 @@ static void wakeup_handler(void) static __init int hardware_setup(void) { - int r = -ENOMEM, i, msr; + int r = -ENOMEM, i; rdmsrl_safe(MSR_EFER, &host_efer); @@ -6198,31 +6232,13 @@ static __init int hardware_setup(void) if (!vmx_io_bitmap_b) goto out; - vmx_msr_bitmap_legacy = (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx_msr_bitmap_legacy) - goto out1; - - vmx_msr_bitmap_legacy_x2apic = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx_msr_bitmap_legacy_x2apic) - goto out2; - - vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx_msr_bitmap_longmode) - goto out3; - - vmx_msr_bitmap_longmode_x2apic = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx_msr_bitmap_longmode_x2apic) - goto out4; - vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); if (!vmx_vmread_bitmap) - goto out6; + goto out1; vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); if (!vmx_vmwrite_bitmap) - goto out7; + goto out2; memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE); memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE); @@ -6231,12 +6247,9 @@ static __init int hardware_setup(void) memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - if (setup_vmcs_config(&vmcs_config) < 0) { r = -EIO; - goto out8; + goto out3; } if (boot_cpu_has(X86_FEATURE_NX)) @@ -6302,38 +6315,8 @@ static __init int hardware_setup(void) kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy; } - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); - - memcpy(vmx_msr_bitmap_legacy_x2apic, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic, - vmx_msr_bitmap_longmode, PAGE_SIZE); - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ - if (enable_apicv) { - for (msr = 0x800; msr <= 0x8ff; msr++) - vmx_disable_intercept_msr_read_x2apic(msr); - - /* According SDM, in x2apic mode, the whole id reg is used. - * But in KVM, it only use the highest eight bits. Need to - * intercept it */ - vmx_enable_intercept_msr_read_x2apic(0x802); - /* TMCCT */ - vmx_enable_intercept_msr_read_x2apic(0x839); - /* TPR */ - vmx_disable_intercept_msr_write_x2apic(0x808); - /* EOI */ - vmx_disable_intercept_msr_write_x2apic(0x80b); - /* SELF-IPI */ - vmx_disable_intercept_msr_write_x2apic(0x83f); - } - if (enable_ept) { kvm_mmu_set_mask_ptes(0ull, (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull, @@ -6364,18 +6347,10 @@ static __init int hardware_setup(void) return alloc_kvm_area(); -out8: - free_page((unsigned long)vmx_vmwrite_bitmap); -out7: - free_page((unsigned long)vmx_vmread_bitmap); -out6: - free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic); -out4: - free_page((unsigned long)vmx_msr_bitmap_longmode); out3: - free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic); + free_page((unsigned long)vmx_vmwrite_bitmap); out2: - free_page((unsigned long)vmx_msr_bitmap_legacy); + free_page((unsigned long)vmx_vmread_bitmap); out1: free_page((unsigned long)vmx_io_bitmap_b); out: @@ -6386,10 +6361,6 @@ out: static __exit void hardware_unsetup(void) { - free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic); - free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic); - free_page((unsigned long)vmx_msr_bitmap_legacy); - free_page((unsigned long)vmx_msr_bitmap_longmode); free_page((unsigned long)vmx_io_bitmap_b); free_page((unsigned long)vmx_io_bitmap_a); free_page((unsigned long)vmx_vmwrite_bitmap); @@ -6753,13 +6724,6 @@ static int handle_vmon(struct kvm_vcpu *vcpu) if (r < 0) goto out_vmcs02; - if (cpu_has_vmx_msr_bitmap()) { - vmx->nested.msr_bitmap = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx->nested.msr_bitmap) - goto out_msr_bitmap; - } - if (enable_shadow_vmcs) { shadow_vmcs = alloc_vmcs(); if (!shadow_vmcs) @@ -6784,9 +6748,6 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; out_shadow_vmcs: - free_page((unsigned long)vmx->nested.msr_bitmap); - -out_msr_bitmap: free_loaded_vmcs(&vmx->nested.vmcs02); out_vmcs02: @@ -6860,10 +6821,6 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->nested.vmxon = false; free_vpid(vmx->nested.vpid02); nested_release_vmcs12(vmx); - if (vmx->nested.msr_bitmap) { - free_page((unsigned long)vmx->nested.msr_bitmap); - vmx->nested.msr_bitmap = NULL; - } if (enable_shadow_vmcs) free_vmcs(vmx->nested.current_shadow_vmcs); /* Unpin physical memory we referred to in the vmcs02 */ @@ -8200,7 +8157,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) } vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) @@ -8780,6 +8737,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + unsigned long *msr_bitmap; int cpu; if (!vmx) @@ -8820,6 +8778,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (err < 0) goto free_msrs; + msr_bitmap = vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); + vmx->msr_bitmap_mode = 0; + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); @@ -9204,7 +9171,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, int msr; struct page *page; unsigned long *msr_bitmap_l1; - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* This shortcut is ok because we support only x2APIC MSRs so far. */ if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) @@ -9715,6 +9682,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) else vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); + if (cpu_has_vmx_msr_bitmap()) + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); + if (enable_vpid) { /* * There is no direct mapping between vpid02 and vpid12, the @@ -10415,7 +10385,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, vmcs_write64(GUEST_IA32_DEBUGCTL, 0); if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) -- cgit v1.2.3 From 4b3870c343a82cd2df7192cc5149c87205dcc611 Mon Sep 17 00:00:00 2001 From: Ashok Raj Date: Thu, 1 Feb 2018 22:59:43 +0100 Subject: KVM/x86: Add IBPB support commit 15d45071523d89b3fb7372e2135fbd72f6af9506 upstream. The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Cc: Andrea Arcangeli Cc: Andi Kleen Cc: kvm@vger.kernel.org Cc: Asit Mallick Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Greg KH Cc: Jun Nakajima Cc: Paolo Bonzini Cc: Dan Williams Cc: Tim Chen Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/cpuid.c | 11 +++++++- arch/x86/kvm/cpuid.h | 12 ++++++++ arch/x86/kvm/svm.c | 28 +++++++++++++++++++ arch/x86/kvm/vmx.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 127 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 338d13d4fd2f..35196f8e1ba6 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -341,6 +341,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x80000008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -583,7 +587,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x80000019: diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index d1534feefcfe..213102389795 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -159,6 +159,18 @@ static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu) return best && (best->edx & bit(X86_FEATURE_RDTSCP)); } +static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); + if (best && (best->ebx & bit(X86_FEATURE_IBPB))) + return true; + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); +} + + /* * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3 */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index df7827a981dd..d489836da6f5 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -182,6 +182,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_CSTAR, .always = true }, { .index = MSR_SYSCALL_MASK, .always = true }, #endif + { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, @@ -411,6 +412,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -1210,11 +1212,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* + * The vmcb page can be recycled, causing a false negative in + * svm_vcpu_load(). So do a full IBPB now. + */ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1239,6 +1247,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio); } } + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } } static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -3125,6 +3137,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has_ibpb(vcpu)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); + break; case MSR_STAR: svm->vmcb->save.star = data; break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e0064855fbdb..a19116fad680 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -544,6 +544,7 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; /* @@ -892,6 +893,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); static int alloc_identity_pagetable(struct kvm *kvm); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -1687,6 +1690,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(EXCEPTION_BITMAP, eb); } +/* + * Check if MSR is intercepted for L01 MSR bitmap. + */ +static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap; + + if (msr <= 0x1fff) { + return !!test_bit(msr, msr_bitmap + 0x800 / f); + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + return !!test_bit(msr, msr_bitmap + 0xc00 / f); + } + + return true; +} + static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, unsigned long entry, unsigned long exit) { @@ -2072,6 +2098,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (vmx->loaded_vmcs->cpu != cpu) { @@ -2904,6 +2931,33 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has_ibpb(vcpu)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_vmx_merge_msr_bitmap. We should not touch the + * vmcs02.msr_bitmap here since it gets completely overwritten + * in the merging. + */ + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -9172,9 +9226,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, struct page *page; unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; + /* + * pred_cmd is trying to verify two things: + * + * 1. L0 gave a permission to L1 to actually passthrough the MSR. This + * ensures that we do not accidentally generate an L02 MSR bitmap + * from the L12 MSR bitmap that is too permissive. + * 2. That L1 or L2s have actually used the MSR. This avoids + * unnecessarily merging of the bitmap if the MSR is unused. This + * works properly because we only update the L01 MSR bitmap lazily. + * So even if L0 should pass L1 these MSRs, the L01 bitmap is only + * updated to reflect this when L1 (or its L2s) actually write to + * the MSR. + */ + bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); - /* This shortcut is ok because we support only x2APIC MSRs so far. */ - if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && + !pred_cmd) return false; page = nested_get_page(vcpu, vmcs12->msr_bitmap); @@ -9209,6 +9277,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, MSR_TYPE_W); } } + + if (pred_cmd) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_W); + kunmap(page); nested_release_page_clean(page); -- cgit v1.2.3 From 337c26f50a7189f114fce87e45eadd8d6dd9560b Mon Sep 17 00:00:00 2001 From: KarimAllah Ahmed Date: Thu, 1 Feb 2018 22:59:44 +0100 Subject: KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd upstream. Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Reviewed-by: Darren Kenny Reviewed-by: Jim Mattson Reviewed-by: Konrad Rzeszutek Wilk Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Jun Nakajima Cc: kvm@vger.kernel.org Cc: Dave Hansen Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Greg KH Cc: Dan Williams Cc: Tim Chen Cc: Ashok Raj Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/cpuid.c | 11 +++++++++-- arch/x86/kvm/cpuid.h | 8 ++++++++ arch/x86/kvm/vmx.c | 15 +++++++++++++++ arch/x86/kvm/x86.c | 1 + 4 files changed, 33 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 35196f8e1ba6..2f3483e395bf 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -362,6 +362,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, const u32 kvm_supported_word10_x86_features = F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves; + /* cpuid 7.0.edx*/ + const u32 kvm_cpuid_7_0_edx_x86_features = + F(ARCH_CAPABILITIES); + /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -439,11 +443,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, cpuid_mask(&entry->ebx, 9); // TSC_ADJUST is emulated entry->ebx |= F(TSC_ADJUST); - } else + entry->edx &= kvm_cpuid_7_0_edx_x86_features; + cpuid_mask(&entry->edx, CPUID_7_EDX); + } else { entry->ebx = 0; + entry->edx = 0; + } entry->eax = 0; entry->ecx = 0; - entry->edx = 0; break; } case 9: diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 213102389795..67c35486d8d0 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -170,6 +170,14 @@ static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu) return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); } +static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best && (best->edx & bit(X86_FEATURE_ARCH_CAPABILITIES)); +} + /* * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a19116fad680..3a513997b1cb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -545,6 +545,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; /* @@ -2832,6 +2834,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has_arch_capabilities(vcpu)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -2958,6 +2966,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5002,6 +5015,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e6ab034f0bc7..276f978efeed 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -961,6 +961,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- cgit v1.2.3 From fc6aae9f407810cb153a9133c28735871f9f0a16 Mon Sep 17 00:00:00 2001 From: KarimAllah Ahmed Date: Thu, 1 Feb 2018 22:59:45 +0100 Subject: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL commit d28b387fb74da95d69d2615732f50cceb38e9a4d upstream. [ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Reviewed-by: Darren Kenny Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Jim Mattson Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Jun Nakajima Cc: kvm@vger.kernel.org Cc: Dave Hansen Cc: Tim Chen Cc: Andy Lutomirski Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Greg KH Cc: Paolo Bonzini Cc: Dan Williams Cc: Linus Torvalds Cc: Ashok Raj Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/cpuid.c | 8 ++-- arch/x86/kvm/cpuid.h | 11 ++++++ arch/x86/kvm/vmx.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 2 +- 4 files changed, 118 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 2f3483e395bf..0ab72a8387d4 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = @@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(ARCH_CAPABILITIES); + F(SPEC_CTRL) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -595,9 +595,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 67c35486d8d0..7f74d7e18a01 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -170,6 +170,17 @@ static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu) return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); } +static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); + if (best && (best->ebx & bit(X86_FEATURE_IBRS))) + return true; + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); +} + static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3a513997b1cb..b118d415ca08 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -546,6 +546,7 @@ struct vcpu_vmx { #endif u64 arch_capabilities; + u64 spec_ctrl; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -1692,6 +1693,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(EXCEPTION_BITMAP, eb); } +/* + * Check if MSR is intercepted for currently loaded MSR bitmap. + */ +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; + + if (msr <= 0x1fff) { + return !!test_bit(msr, msr_bitmap + 0x800 / f); + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + return !!test_bit(msr, msr_bitmap + 0xc00 / f); + } + + return true; +} + /* * Check if MSR is intercepted for L01 MSR bitmap. */ @@ -2834,6 +2858,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has_ibrs(vcpu)) + return 1; + + msr_info->data = to_vmx(vcpu)->spec_ctrl; + break; case MSR_IA32_ARCH_CAPABILITIES: if (!msr_info->host_initiated && !guest_cpuid_has_arch_capabilities(vcpu)) @@ -2939,6 +2970,36 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has_ibrs(vcpu)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + vmx->spec_ctrl = data; + + if (!data) + break; + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_vmx_merge_msr_bitmap. We should not touch the + * vmcs02.msr_bitmap here since it gets completely overwritten + * in the merging. We update the vmcs01 here for L1 as well + * since it will end up touching the MSR anyway now. + */ + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_RW); + break; case MSR_IA32_PRED_CMD: if (!msr_info->host_initiated && !guest_cpuid_has_ibpb(vcpu)) @@ -5045,6 +5106,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u64 cr0; vmx->rmode.vm86_active = 0; + vmx->spec_ctrl = 0; vmx->soft_vnmi_blocked = 0; @@ -8589,6 +8651,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); debugctlmsr = get_debugctlmsr(); + /* + * If this vCPU has touched SPEC_CTRL, restore the guest's value if + * it's non-zero. Since vmentry is serialising on affected CPUs, there + * is no need to worry about the conditional branch over the wrmsr + * being speculatively taken. + */ + if (vmx->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + vmx->__launched = vmx->loaded_vmcs->launched; asm( /* Store host registers */ @@ -8707,6 +8778,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* + * We do not use IBRS in the kernel. If this vCPU has used the + * SPEC_CTRL MSR it may have left it on; save the value and + * turn it off. This is much more efficient than blindly adding + * it to the atomic save/restore list. Especially as the former + * (Saving guest MSRs on vmexit) doesn't even exist in KVM. + * + * For non-nested case: + * If the L01 MSR bitmap does not intercept the MSR, then we need to + * save it. + * + * For nested case: + * If the L02 MSR bitmap does not intercept the MSR, then we need to + * save it. + */ + if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) + rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + + if (vmx->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); @@ -9242,7 +9334,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* - * pred_cmd is trying to verify two things: + * pred_cmd & spec_ctrl are trying to verify two things: * * 1. L0 gave a permission to L1 to actually passthrough the MSR. This * ensures that we do not accidentally generate an L02 MSR bitmap @@ -9255,9 +9347,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, * the MSR. */ bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); + bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL); if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && - !pred_cmd) + !pred_cmd && !spec_ctrl) return false; page = nested_get_page(vcpu, vmcs12->msr_bitmap); @@ -9293,6 +9386,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, } } + if (spec_ctrl) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_R | MSR_TYPE_W); + if (pred_cmd) nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 276f978efeed..12a91ea85d3a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -961,7 +961,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, - MSR_IA32_ARCH_CAPABILITIES + MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- cgit v1.2.3 From 89be8950bab799ddb9cc3777345e3c21bcb32dba Mon Sep 17 00:00:00 2001 From: KarimAllah Ahmed Date: Sat, 3 Feb 2018 15:56:23 +0100 Subject: KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL commit b2ac58f90540e39324e7a29a7ad471407ae0bf48 upstream. [ Based on a patch from Paolo Bonzini ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Reviewed-by: Darren Kenny Reviewed-by: Konrad Rzeszutek Wilk Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Jun Nakajima Cc: kvm@vger.kernel.org Cc: Dave Hansen Cc: Tim Chen Cc: Andy Lutomirski Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Greg KH Cc: Paolo Bonzini Cc: Dan Williams Cc: Linus Torvalds Cc: Ashok Raj Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d489836da6f5..9a390fb95b4b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -147,6 +147,8 @@ struct vcpu_svm { u64 gs_base; } host; + u64 spec_ctrl; + u32 *msrpm; ulong nmi_iret_rip; @@ -182,6 +184,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_CSTAR, .always = true }, { .index = MSR_SYSCALL_MASK, .always = true }, #endif + { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, @@ -764,6 +767,25 @@ static bool valid_msr_intercept(u32 index) return false; } +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr) +{ + u8 bit_write; + unsigned long tmp; + u32 offset; + u32 *msrpm; + + msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm: + to_svm(vcpu)->msrpm; + + offset = svm_msrpm_offset(msr); + bit_write = 2 * (msr & 0x0f) + 1; + tmp = msrpm[offset]; + + BUG_ON(offset == MSR_INVALID); + + return !!test_bit(bit_write, &tmp); +} + static void set_msr_interception(u32 *msrpm, unsigned msr, int read, int write) { @@ -1122,6 +1144,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 dummy; u32 eax = 1; + svm->spec_ctrl = 0; + if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -3063,6 +3087,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_VM_CR: msr_info->data = svm->nested.vm_cr_msr; break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has_ibrs(vcpu)) + return 1; + + msr_info->data = svm->spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x01000065; break; @@ -3137,6 +3168,33 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has_ibrs(vcpu)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + svm->spec_ctrl = data; + + if (!data) + break; + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_svm_vmrun_msrpm. + * We update the L1 MSR bit as well since it will end up + * touching the MSR anyway now. + */ + set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + break; case MSR_IA32_PRED_CMD: if (!msr->host_initiated && !guest_cpuid_has_ibpb(vcpu)) @@ -3839,6 +3897,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) local_irq_enable(); + /* + * If this vCPU has touched SPEC_CTRL, restore the guest's value if + * it's non-zero. Since vmentry is serialising on affected CPUs, there + * is no need to worry about the conditional branch over the wrmsr + * being speculatively taken. + */ + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -3931,6 +3998,27 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* + * We do not use IBRS in the kernel. If this vCPU has used the + * SPEC_CTRL MSR it may have left it on; save the value and + * turn it off. This is much more efficient than blindly adding + * it to the atomic save/restore list. Especially as the former + * (Saving guest MSRs on vmexit) doesn't even exist in KVM. + * + * For non-nested case: + * If the L01 MSR bitmap does not intercept the MSR, then we need to + * save it. + * + * For nested case: + * If the L02 MSR bitmap does not intercept the MSR, then we need to + * save it. + */ + if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) + rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); -- cgit v1.2.3 From d0169c04fee013922a272a19f7950439a5e07230 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 22 Feb 2018 16:43:17 +0100 Subject: KVM/x86: Remove indirect MSR op calls from SPEC_CTRL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ecb586bd29c99fb4de599dec388658e74388daad upstream. Having a paravirt indirect call in the IBRS restore path is not a good idea, since we are trying to protect from speculative execution of bogus indirect branch targets. It is also slower, so use native_wrmsrl() on the vmentry path too. Signed-off-by: Paolo Bonzini Reviewed-by: Jim Mattson Cc: David Woodhouse Cc: KarimAllah Ahmed Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Fixes: d28b387fb74da95d69d2615732f50cceb38e9a4d Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 7 ++++--- arch/x86/kvm/vmx.c | 7 ++++--- 2 files changed, 8 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9a390fb95b4b..e1f20e0d62c2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -3904,7 +3905,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) * being speculatively taken. */ if (svm->spec_ctrl) - wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + native_wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); asm volatile ( "push %%" _ASM_BP "; \n\t" @@ -4014,10 +4015,10 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) * save it. */ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) - rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); if (svm->spec_ctrl) - wrmsrl(MSR_IA32_SPEC_CTRL, 0); + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b118d415ca08..f7b5c009859e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include "trace.h" @@ -8658,7 +8659,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) * being speculatively taken. */ if (vmx->spec_ctrl) - wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + native_wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); vmx->__launched = vmx->loaded_vmcs->launched; asm( @@ -8794,10 +8795,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) * save it. */ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) - rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); if (vmx->spec_ctrl) - wrmsrl(MSR_IA32_SPEC_CTRL, 0); + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); -- cgit v1.2.3 From cc8c5450b6d7ac0cbc36d133ffa21a01dd21b3d8 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 17 Dec 2015 09:45:09 -0800 Subject: x86: reorganize SMAP handling in user space accesses commit 11f1a4b9755f5dbc3e822a96502ebe9b044b14d8 upstream. This reorganizes how we do the stac/clac instructions in the user access code. Instead of adding the instructions directly to the same inline asm that does the actual user level access and exception handling, add them at a higher level. This is mainly preparation for the next step, where we will expose an interface to allow users to mark several accesses together as being user space accesses, but it does already clean up some code: - the inlined trivial cases of copy_in_user() now do stac/clac just once over the accesses: they used to do one pair around the user space read, and another pair around the write-back. - the {get,put}_user_ex() macros that are used with the catch/try handling don't do any stac/clac at all, because that happens in the try/catch surrounding them. Other than those two cleanups that happened naturally from the re-organization, this should not make any difference. Yet. Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uaccess.h | 53 ++++++++++++++-------- arch/x86/include/asm/uaccess_64.h | 94 +++++++++++++++++++++++++++------------ 2 files changed, 101 insertions(+), 46 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index d788b0cdc0ad..e93a69f9a225 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -144,6 +144,9 @@ extern int __get_user_4(void); extern int __get_user_8(void); extern int __get_user_bad(void); +#define __uaccess_begin() stac() +#define __uaccess_end() clac() + /* * This is a type: either unsigned long, if the argument fits into * that type, or otherwise unsigned long long. @@ -203,10 +206,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) #ifdef CONFIG_X86_32 #define __put_user_asm_u64(x, addr, err, errret) \ - asm volatile(ASM_STAC "\n" \ + asm volatile("\n" \ "1: movl %%eax,0(%2)\n" \ "2: movl %%edx,4(%2)\n" \ - "3: " ASM_CLAC "\n" \ + "3:" \ ".section .fixup,\"ax\"\n" \ "4: movl %3,%0\n" \ " jmp 3b\n" \ @@ -217,10 +220,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) : "A" (x), "r" (addr), "i" (errret), "0" (err)) #define __put_user_asm_ex_u64(x, addr) \ - asm volatile(ASM_STAC "\n" \ + asm volatile("\n" \ "1: movl %%eax,0(%1)\n" \ "2: movl %%edx,4(%1)\n" \ - "3: " ASM_CLAC "\n" \ + "3:" \ _ASM_EXTABLE_EX(1b, 2b) \ _ASM_EXTABLE_EX(2b, 3b) \ : : "A" (x), "r" (addr)) @@ -314,6 +317,10 @@ do { \ } \ } while (0) +/* + * This doesn't do __uaccess_begin/end - the exception handling + * around it must do that. + */ #define __put_user_size_ex(x, ptr, size) \ do { \ __chk_user_ptr(ptr); \ @@ -368,9 +375,9 @@ do { \ } while (0) #define __get_user_asm(x, addr, err, itype, rtype, ltype, errret) \ - asm volatile(ASM_STAC "\n" \ + asm volatile("\n" \ "1: mov"itype" %2,%"rtype"1\n" \ - "2: " ASM_CLAC "\n" \ + "2:\n" \ ".section .fixup,\"ax\"\n" \ "3: mov %3,%0\n" \ " xor"itype" %"rtype"1,%"rtype"1\n" \ @@ -380,6 +387,10 @@ do { \ : "=r" (err), ltype(x) \ : "m" (__m(addr)), "i" (errret), "0" (err)) +/* + * This doesn't do __uaccess_begin/end - the exception handling + * around it must do that. + */ #define __get_user_size_ex(x, ptr, size) \ do { \ __chk_user_ptr(ptr); \ @@ -410,7 +421,9 @@ do { \ #define __put_user_nocheck(x, ptr, size) \ ({ \ int __pu_err; \ + __uaccess_begin(); \ __put_user_size((x), (ptr), (size), __pu_err, -EFAULT); \ + __uaccess_end(); \ __builtin_expect(__pu_err, 0); \ }) @@ -418,7 +431,9 @@ do { \ ({ \ int __gu_err; \ unsigned long __gu_val; \ + __uaccess_begin(); \ __get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \ + __uaccess_end(); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __builtin_expect(__gu_err, 0); \ }) @@ -433,9 +448,9 @@ struct __large_struct { unsigned long buf[100]; }; * aliasing issues. */ #define __put_user_asm(x, addr, err, itype, rtype, ltype, errret) \ - asm volatile(ASM_STAC "\n" \ + asm volatile("\n" \ "1: mov"itype" %"rtype"1,%2\n" \ - "2: " ASM_CLAC "\n" \ + "2:\n" \ ".section .fixup,\"ax\"\n" \ "3: mov %3,%0\n" \ " jmp 2b\n" \ @@ -455,11 +470,11 @@ struct __large_struct { unsigned long buf[100]; }; */ #define uaccess_try do { \ current_thread_info()->uaccess_err = 0; \ - stac(); \ + __uaccess_begin(); \ barrier(); #define uaccess_catch(err) \ - clac(); \ + __uaccess_end(); \ (err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0); \ } while (0) @@ -557,12 +572,13 @@ extern void __cmpxchg_wrong_size(void) __typeof__(ptr) __uval = (uval); \ __typeof__(*(ptr)) __old = (old); \ __typeof__(*(ptr)) __new = (new); \ + __uaccess_begin(); \ switch (size) { \ case 1: \ { \ - asm volatile("\t" ASM_STAC "\n" \ + asm volatile("\n" \ "1:\t" LOCK_PREFIX "cmpxchgb %4, %2\n" \ - "2:\t" ASM_CLAC "\n" \ + "2:\n" \ "\t.section .fixup, \"ax\"\n" \ "3:\tmov %3, %0\n" \ "\tjmp 2b\n" \ @@ -576,9 +592,9 @@ extern void __cmpxchg_wrong_size(void) } \ case 2: \ { \ - asm volatile("\t" ASM_STAC "\n" \ + asm volatile("\n" \ "1:\t" LOCK_PREFIX "cmpxchgw %4, %2\n" \ - "2:\t" ASM_CLAC "\n" \ + "2:\n" \ "\t.section .fixup, \"ax\"\n" \ "3:\tmov %3, %0\n" \ "\tjmp 2b\n" \ @@ -592,9 +608,9 @@ extern void __cmpxchg_wrong_size(void) } \ case 4: \ { \ - asm volatile("\t" ASM_STAC "\n" \ + asm volatile("\n" \ "1:\t" LOCK_PREFIX "cmpxchgl %4, %2\n" \ - "2:\t" ASM_CLAC "\n" \ + "2:\n" \ "\t.section .fixup, \"ax\"\n" \ "3:\tmov %3, %0\n" \ "\tjmp 2b\n" \ @@ -611,9 +627,9 @@ extern void __cmpxchg_wrong_size(void) if (!IS_ENABLED(CONFIG_X86_64)) \ __cmpxchg_wrong_size(); \ \ - asm volatile("\t" ASM_STAC "\n" \ + asm volatile("\n" \ "1:\t" LOCK_PREFIX "cmpxchgq %4, %2\n" \ - "2:\t" ASM_CLAC "\n" \ + "2:\n" \ "\t.section .fixup, \"ax\"\n" \ "3:\tmov %3, %0\n" \ "\tjmp 2b\n" \ @@ -628,6 +644,7 @@ extern void __cmpxchg_wrong_size(void) default: \ __cmpxchg_wrong_size(); \ } \ + __uaccess_end(); \ *__uval = __old; \ __ret; \ }) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index d83a55b95a48..307698688fa1 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -56,35 +56,49 @@ int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size) if (!__builtin_constant_p(size)) return copy_user_generic(dst, (__force void *)src, size); switch (size) { - case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src, + case 1: + __uaccess_begin(); + __get_user_asm(*(u8 *)dst, (u8 __user *)src, ret, "b", "b", "=q", 1); + __uaccess_end(); return ret; - case 2:__get_user_asm(*(u16 *)dst, (u16 __user *)src, + case 2: + __uaccess_begin(); + __get_user_asm(*(u16 *)dst, (u16 __user *)src, ret, "w", "w", "=r", 2); + __uaccess_end(); return ret; - case 4:__get_user_asm(*(u32 *)dst, (u32 __user *)src, + case 4: + __uaccess_begin(); + __get_user_asm(*(u32 *)dst, (u32 __user *)src, ret, "l", "k", "=r", 4); + __uaccess_end(); return ret; - case 8:__get_user_asm(*(u64 *)dst, (u64 __user *)src, + case 8: + __uaccess_begin(); + __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 8); + __uaccess_end(); return ret; case 10: + __uaccess_begin(); __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 10); - if (unlikely(ret)) - return ret; - __get_user_asm(*(u16 *)(8 + (char *)dst), - (u16 __user *)(8 + (char __user *)src), - ret, "w", "w", "=r", 2); + if (likely(!ret)) + __get_user_asm(*(u16 *)(8 + (char *)dst), + (u16 __user *)(8 + (char __user *)src), + ret, "w", "w", "=r", 2); + __uaccess_end(); return ret; case 16: + __uaccess_begin(); __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 16); - if (unlikely(ret)) - return ret; - __get_user_asm(*(u64 *)(8 + (char *)dst), - (u64 __user *)(8 + (char __user *)src), - ret, "q", "", "=r", 8); + if (likely(!ret)) + __get_user_asm(*(u64 *)(8 + (char *)dst), + (u64 __user *)(8 + (char __user *)src), + ret, "q", "", "=r", 8); + __uaccess_end(); return ret; default: return copy_user_generic(dst, (__force void *)src, size); @@ -106,35 +120,51 @@ int __copy_to_user_nocheck(void __user *dst, const void *src, unsigned size) if (!__builtin_constant_p(size)) return copy_user_generic((__force void *)dst, src, size); switch (size) { - case 1:__put_user_asm(*(u8 *)src, (u8 __user *)dst, + case 1: + __uaccess_begin(); + __put_user_asm(*(u8 *)src, (u8 __user *)dst, ret, "b", "b", "iq", 1); + __uaccess_end(); return ret; - case 2:__put_user_asm(*(u16 *)src, (u16 __user *)dst, + case 2: + __uaccess_begin(); + __put_user_asm(*(u16 *)src, (u16 __user *)dst, ret, "w", "w", "ir", 2); + __uaccess_end(); return ret; - case 4:__put_user_asm(*(u32 *)src, (u32 __user *)dst, + case 4: + __uaccess_begin(); + __put_user_asm(*(u32 *)src, (u32 __user *)dst, ret, "l", "k", "ir", 4); + __uaccess_end(); return ret; - case 8:__put_user_asm(*(u64 *)src, (u64 __user *)dst, + case 8: + __uaccess_begin(); + __put_user_asm(*(u64 *)src, (u64 __user *)dst, ret, "q", "", "er", 8); + __uaccess_end(); return ret; case 10: + __uaccess_begin(); __put_user_asm(*(u64 *)src, (u64 __user *)dst, ret, "q", "", "er", 10); - if (unlikely(ret)) - return ret; - asm("":::"memory"); - __put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst, - ret, "w", "w", "ir", 2); + if (likely(!ret)) { + asm("":::"memory"); + __put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst, + ret, "w", "w", "ir", 2); + } + __uaccess_end(); return ret; case 16: + __uaccess_begin(); __put_user_asm(*(u64 *)src, (u64 __user *)dst, ret, "q", "", "er", 16); - if (unlikely(ret)) - return ret; - asm("":::"memory"); - __put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst, - ret, "q", "", "er", 8); + if (likely(!ret)) { + asm("":::"memory"); + __put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst, + ret, "q", "", "er", 8); + } + __uaccess_end(); return ret; default: return copy_user_generic((__force void *)dst, src, size); @@ -160,39 +190,47 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size) switch (size) { case 1: { u8 tmp; + __uaccess_begin(); __get_user_asm(tmp, (u8 __user *)src, ret, "b", "b", "=q", 1); if (likely(!ret)) __put_user_asm(tmp, (u8 __user *)dst, ret, "b", "b", "iq", 1); + __uaccess_end(); return ret; } case 2: { u16 tmp; + __uaccess_begin(); __get_user_asm(tmp, (u16 __user *)src, ret, "w", "w", "=r", 2); if (likely(!ret)) __put_user_asm(tmp, (u16 __user *)dst, ret, "w", "w", "ir", 2); + __uaccess_end(); return ret; } case 4: { u32 tmp; + __uaccess_begin(); __get_user_asm(tmp, (u32 __user *)src, ret, "l", "k", "=r", 4); if (likely(!ret)) __put_user_asm(tmp, (u32 __user *)dst, ret, "l", "k", "ir", 4); + __uaccess_end(); return ret; } case 8: { u64 tmp; + __uaccess_begin(); __get_user_asm(tmp, (u64 __user *)src, ret, "q", "", "=r", 8); if (likely(!ret)) __put_user_asm(tmp, (u64 __user *)dst, ret, "q", "", "er", 8); + __uaccess_end(); return ret; } default: -- cgit v1.2.3 From 45b871531cb6d2973d0c68083376719c50779a96 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 23 Feb 2016 14:58:52 -0800 Subject: x86: fix SMAP in 32-bit environments commit de9e478b9d49f3a0214310d921450cf5bb4a21e6 upstream. In commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses") I changed how the stac/clac instructions were generated around the user space accesses, which then made it possible to do batched accesses efficiently for user string copies etc. However, in doing so, I completely spaced out, and didn't even think about the 32-bit case. And nobody really even seemed to notice, because SMAP doesn't even exist until modern Skylake processors, and you'd have to be crazy to run 32-bit kernels on a modern CPU. Which brings us to Andy Lutomirski. He actually tested the 32-bit kernel on new hardware, and noticed that it doesn't work. My bad. The trivial fix is to add the required uaccess begin/end markers around the raw accesses in . I feel a bit bad about this patch, just because that header file really should be cleaned up to avoid all the duplicated code in it, and this commit just expands on the problem. But this just fixes the bug without any bigger cleanup surgery. Reported-and-tested-by: Andy Lutomirski Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uaccess_32.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'arch') diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h index f5dcb5204dcd..3fe0eac59462 100644 --- a/arch/x86/include/asm/uaccess_32.h +++ b/arch/x86/include/asm/uaccess_32.h @@ -48,20 +48,28 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n) switch (n) { case 1: + __uaccess_begin(); __put_user_size(*(u8 *)from, (u8 __user *)to, 1, ret, 1); + __uaccess_end(); return ret; case 2: + __uaccess_begin(); __put_user_size(*(u16 *)from, (u16 __user *)to, 2, ret, 2); + __uaccess_end(); return ret; case 4: + __uaccess_begin(); __put_user_size(*(u32 *)from, (u32 __user *)to, 4, ret, 4); + __uaccess_end(); return ret; case 8: + __uaccess_begin(); __put_user_size(*(u64 *)from, (u64 __user *)to, 8, ret, 8); + __uaccess_end(); return ret; } } @@ -103,13 +111,19 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) switch (n) { case 1: + __uaccess_begin(); __get_user_size(*(u8 *)to, from, 1, ret, 1); + __uaccess_end(); return ret; case 2: + __uaccess_begin(); __get_user_size(*(u16 *)to, from, 2, ret, 2); + __uaccess_end(); return ret; case 4: + __uaccess_begin(); __get_user_size(*(u32 *)to, from, 4, ret, 4); + __uaccess_end(); return ret; } } @@ -148,13 +162,19 @@ __copy_from_user(void *to, const void __user *from, unsigned long n) switch (n) { case 1: + __uaccess_begin(); __get_user_size(*(u8 *)to, from, 1, ret, 1); + __uaccess_end(); return ret; case 2: + __uaccess_begin(); __get_user_size(*(u16 *)to, from, 2, ret, 2); + __uaccess_end(); return ret; case 4: + __uaccess_begin(); __get_user_size(*(u32 *)to, from, 4, ret, 4); + __uaccess_end(); return ret; } } @@ -170,13 +190,19 @@ static __always_inline unsigned long __copy_from_user_nocache(void *to, switch (n) { case 1: + __uaccess_begin(); __get_user_size(*(u8 *)to, from, 1, ret, 1); + __uaccess_end(); return ret; case 2: + __uaccess_begin(); __get_user_size(*(u16 *)to, from, 2, ret, 2); + __uaccess_end(); return ret; case 4: + __uaccess_begin(); __get_user_size(*(u32 *)to, from, 4, ret, 4); + __uaccess_end(); return ret; } } -- cgit v1.2.3 From 6d1d4fc34287da617b50bd7139e536a8d69c24ea Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 29 Jan 2018 17:02:39 -0800 Subject: x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec commit b3bbfb3fb5d25776b8e3f361d2eedaabb0b496cd upstream. For __get_user() paths, do not allow the kernel to speculate on the value of a user controlled pointer. In addition to the 'stac' instruction for Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the access_ok() result to resolve in the pipeline before the CPU might take any speculative action on the pointer value. Given the cost of 'stac' the speculation barrier is placed after 'stac' to hopefully overlap the cost of disabling SMAP with the cost of flushing the instruction pipeline. Since __get_user is a major kernel interface that deals with user controlled pointers, the __uaccess_begin_nospec() mechanism will prevent speculative execution past an access_ok() permission check. While speculative execution past access_ok() is not enough to lead to a kernel memory leak, it is a necessary precondition. To be clear, __uaccess_begin_nospec() is addressing a class of potential problems near __get_user() usages. Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used to protect __get_user(), pointer masking similar to array_index_nospec() will be used for get_user() since it incorporates a bounds check near the usage. uaccess_try_nospec provides the same mechanism for get_user_try. No functional changes. Suggested-by: Linus Torvalds Suggested-by: Andi Kleen Suggested-by: Ingo Molnar Signed-off-by: Dan Williams Signed-off-by: Thomas Gleixner Cc: linux-arch@vger.kernel.org Cc: Tom Lendacky Cc: Kees Cook Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com [bwh: Backported to 4.4: use current_thread_info()] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uaccess.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'arch') diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index e93a69f9a225..4b50bc52ea3e 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -146,6 +146,11 @@ extern int __get_user_bad(void); #define __uaccess_begin() stac() #define __uaccess_end() clac() +#define __uaccess_begin_nospec() \ +({ \ + stac(); \ + barrier_nospec(); \ +}) /* * This is a type: either unsigned long, if the argument fits into @@ -473,6 +478,10 @@ struct __large_struct { unsigned long buf[100]; }; __uaccess_begin(); \ barrier(); +#define uaccess_try_nospec do { \ + current_thread_info()->uaccess_err = 0; \ + __uaccess_begin_nospec(); \ + #define uaccess_catch(err) \ __uaccess_end(); \ (err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0); \ -- cgit v1.2.3 From 557cd0d20ec971f52e4b9482d551b41503bb3e55 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 29 Jan 2018 17:02:44 -0800 Subject: x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} commit b5c4ae4f35325d520b230bab6eb3310613b72ac1 upstream. In preparation for converting some __uaccess_begin() instances to __uacess_begin_nospec(), make sure all 'from user' uaccess paths are using the _begin(), _end() helpers rather than open-coded stac() and clac(). No functional changes. Suggested-by: Ingo Molnar Signed-off-by: Dan Williams Signed-off-by: Thomas Gleixner Cc: linux-arch@vger.kernel.org Cc: Tom Lendacky Cc: Kees Cook Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com [bwh: Backported to 4.4: - Convert several more functions to use __uaccess_begin_nospec(), that are just wrappers in mainline - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/lib/usercopy_32.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c index 91d93b95bd86..5755942f5eb2 100644 --- a/arch/x86/lib/usercopy_32.c +++ b/arch/x86/lib/usercopy_32.c @@ -570,12 +570,12 @@ do { \ unsigned long __copy_to_user_ll(void __user *to, const void *from, unsigned long n) { - stac(); + __uaccess_begin(); if (movsl_is_ok(to, from, n)) __copy_user(to, from, n); else n = __copy_user_intel(to, from, n); - clac(); + __uaccess_end(); return n; } EXPORT_SYMBOL(__copy_to_user_ll); @@ -583,12 +583,12 @@ EXPORT_SYMBOL(__copy_to_user_ll); unsigned long __copy_from_user_ll(void *to, const void __user *from, unsigned long n) { - stac(); + __uaccess_begin(); if (movsl_is_ok(to, from, n)) __copy_user_zeroing(to, from, n); else n = __copy_user_zeroing_intel(to, from, n); - clac(); + __uaccess_end(); return n; } EXPORT_SYMBOL(__copy_from_user_ll); @@ -596,13 +596,13 @@ EXPORT_SYMBOL(__copy_from_user_ll); unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from, unsigned long n) { - stac(); + __uaccess_begin(); if (movsl_is_ok(to, from, n)) __copy_user(to, from, n); else n = __copy_user_intel((void __user *)to, (const void *)from, n); - clac(); + __uaccess_end(); return n; } EXPORT_SYMBOL(__copy_from_user_ll_nozero); @@ -610,7 +610,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero); unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from, unsigned long n) { - stac(); + __uaccess_begin(); #ifdef CONFIG_X86_INTEL_USERCOPY if (n > 64 && cpu_has_xmm2) n = __copy_user_zeroing_intel_nocache(to, from, n); @@ -619,7 +619,7 @@ unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from, #else __copy_user_zeroing(to, from, n); #endif - clac(); + __uaccess_end(); return n; } EXPORT_SYMBOL(__copy_from_user_ll_nocache); @@ -627,7 +627,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache); unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from, unsigned long n) { - stac(); + __uaccess_begin(); #ifdef CONFIG_X86_INTEL_USERCOPY if (n > 64 && cpu_has_xmm2) n = __copy_user_intel_nocache(to, from, n); @@ -636,7 +636,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr #else __copy_user(to, from, n); #endif - clac(); + __uaccess_end(); return n; } EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero); -- cgit v1.2.3 From 67e326e034383857f0cd0a2bc92c6b525fc710e6 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 29 Jan 2018 17:02:49 -0800 Subject: x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec commit 304ec1b050310548db33063e567123fae8fd0301 upstream. Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the limit check is far away from the user pointer de-reference. In those cases a barrier_nospec() prevents speculation with a potential pointer to privileged memory. uaccess_try_nospec covers get_user_try. Suggested-by: Linus Torvalds Suggested-by: Andi Kleen Signed-off-by: Dan Williams Signed-off-by: Thomas Gleixner Cc: linux-arch@vger.kernel.org Cc: Kees Cook Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com [bwh: Backported to 4.4: - Convert several more functions to use __uaccess_begin_nospec(), that are just wrappers in mainline - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uaccess.h | 6 +++--- arch/x86/include/asm/uaccess_32.h | 26 +++++++++++++------------- arch/x86/include/asm/uaccess_64.h | 20 ++++++++++---------- arch/x86/lib/usercopy_32.c | 10 +++++----- 4 files changed, 31 insertions(+), 31 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 4b50bc52ea3e..6f8eadf0681f 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -436,7 +436,7 @@ do { \ ({ \ int __gu_err; \ unsigned long __gu_val; \ - __uaccess_begin(); \ + __uaccess_begin_nospec(); \ __get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \ __uaccess_end(); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ @@ -546,7 +546,7 @@ struct __large_struct { unsigned long buf[100]; }; * get_user_ex(...); * } get_user_catch(err) */ -#define get_user_try uaccess_try +#define get_user_try uaccess_try_nospec #define get_user_catch(err) uaccess_catch(err) #define get_user_ex(x, ptr) do { \ @@ -581,7 +581,7 @@ extern void __cmpxchg_wrong_size(void) __typeof__(ptr) __uval = (uval); \ __typeof__(*(ptr)) __old = (old); \ __typeof__(*(ptr)) __new = (new); \ - __uaccess_begin(); \ + __uaccess_begin_nospec(); \ switch (size) { \ case 1: \ { \ diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h index 3fe0eac59462..f575ee3aea5c 100644 --- a/arch/x86/include/asm/uaccess_32.h +++ b/arch/x86/include/asm/uaccess_32.h @@ -48,25 +48,25 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n) switch (n) { case 1: - __uaccess_begin(); + __uaccess_begin_nospec(); __put_user_size(*(u8 *)from, (u8 __user *)to, 1, ret, 1); __uaccess_end(); return ret; case 2: - __uaccess_begin(); + __uaccess_begin_nospec(); __put_user_size(*(u16 *)from, (u16 __user *)to, 2, ret, 2); __uaccess_end(); return ret; case 4: - __uaccess_begin(); + __uaccess_begin_nospec(); __put_user_size(*(u32 *)from, (u32 __user *)to, 4, ret, 4); __uaccess_end(); return ret; case 8: - __uaccess_begin(); + __uaccess_begin_nospec(); __put_user_size(*(u64 *)from, (u64 __user *)to, 8, ret, 8); __uaccess_end(); @@ -111,17 +111,17 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) switch (n) { case 1: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u8 *)to, from, 1, ret, 1); __uaccess_end(); return ret; case 2: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u16 *)to, from, 2, ret, 2); __uaccess_end(); return ret; case 4: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u32 *)to, from, 4, ret, 4); __uaccess_end(); return ret; @@ -162,17 +162,17 @@ __copy_from_user(void *to, const void __user *from, unsigned long n) switch (n) { case 1: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u8 *)to, from, 1, ret, 1); __uaccess_end(); return ret; case 2: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u16 *)to, from, 2, ret, 2); __uaccess_end(); return ret; case 4: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u32 *)to, from, 4, ret, 4); __uaccess_end(); return ret; @@ -190,17 +190,17 @@ static __always_inline unsigned long __copy_from_user_nocache(void *to, switch (n) { case 1: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u8 *)to, from, 1, ret, 1); __uaccess_end(); return ret; case 2: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u16 *)to, from, 2, ret, 2); __uaccess_end(); return ret; case 4: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_size(*(u32 *)to, from, 4, ret, 4); __uaccess_end(); return ret; diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index 307698688fa1..dc2d00e7ced3 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -57,31 +57,31 @@ int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size) return copy_user_generic(dst, (__force void *)src, size); switch (size) { case 1: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u8 *)dst, (u8 __user *)src, ret, "b", "b", "=q", 1); __uaccess_end(); return ret; case 2: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u16 *)dst, (u16 __user *)src, ret, "w", "w", "=r", 2); __uaccess_end(); return ret; case 4: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u32 *)dst, (u32 __user *)src, ret, "l", "k", "=r", 4); __uaccess_end(); return ret; case 8: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 8); __uaccess_end(); return ret; case 10: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 10); if (likely(!ret)) @@ -91,7 +91,7 @@ int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size) __uaccess_end(); return ret; case 16: - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 16); if (likely(!ret)) @@ -190,7 +190,7 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size) switch (size) { case 1: { u8 tmp; - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(tmp, (u8 __user *)src, ret, "b", "b", "=q", 1); if (likely(!ret)) @@ -201,7 +201,7 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size) } case 2: { u16 tmp; - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(tmp, (u16 __user *)src, ret, "w", "w", "=r", 2); if (likely(!ret)) @@ -213,7 +213,7 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size) case 4: { u32 tmp; - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(tmp, (u32 __user *)src, ret, "l", "k", "=r", 4); if (likely(!ret)) @@ -224,7 +224,7 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size) } case 8: { u64 tmp; - __uaccess_begin(); + __uaccess_begin_nospec(); __get_user_asm(tmp, (u64 __user *)src, ret, "q", "", "=r", 8); if (likely(!ret)) diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c index 5755942f5eb2..0a6fcae404f8 100644 --- a/arch/x86/lib/usercopy_32.c +++ b/arch/x86/lib/usercopy_32.c @@ -570,7 +570,7 @@ do { \ unsigned long __copy_to_user_ll(void __user *to, const void *from, unsigned long n) { - __uaccess_begin(); + __uaccess_begin_nospec(); if (movsl_is_ok(to, from, n)) __copy_user(to, from, n); else @@ -583,7 +583,7 @@ EXPORT_SYMBOL(__copy_to_user_ll); unsigned long __copy_from_user_ll(void *to, const void __user *from, unsigned long n) { - __uaccess_begin(); + __uaccess_begin_nospec(); if (movsl_is_ok(to, from, n)) __copy_user_zeroing(to, from, n); else @@ -596,7 +596,7 @@ EXPORT_SYMBOL(__copy_from_user_ll); unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from, unsigned long n) { - __uaccess_begin(); + __uaccess_begin_nospec(); if (movsl_is_ok(to, from, n)) __copy_user(to, from, n); else @@ -610,7 +610,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero); unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from, unsigned long n) { - __uaccess_begin(); + __uaccess_begin_nospec(); #ifdef CONFIG_X86_INTEL_USERCOPY if (n > 64 && cpu_has_xmm2) n = __copy_user_zeroing_intel_nocache(to, from, n); @@ -627,7 +627,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache); unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from, unsigned long n) { - __uaccess_begin(); + __uaccess_begin_nospec(); #ifdef CONFIG_X86_INTEL_USERCOPY if (n > 64 && cpu_has_xmm2) n = __copy_user_intel_nocache(to, from, n); -- cgit v1.2.3 From 2658e4d66deca4c1fc6eb59514bded62dd0a7812 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 25 Apr 2018 22:04:19 -0400 Subject: x86/bugs, KVM: Support the combination of guest and host IBRS commit 5cf687548705412da47c9cec342fd952d71ed3d5 upstream. A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 4.4: This was partly applied before; apply just the missing bits] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 6 ++---- arch/x86/kvm/vmx.c | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e1f20e0d62c2..f86303592768 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3904,8 +3904,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) * is no need to worry about the conditional branch over the wrmsr * being speculatively taken. */ - if (svm->spec_ctrl) - native_wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + x86_spec_ctrl_set_guest(svm->spec_ctrl); asm volatile ( "push %%" _ASM_BP "; \n\t" @@ -4017,8 +4016,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); - if (svm->spec_ctrl) - native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); + x86_spec_ctrl_restore_host(svm->spec_ctrl); /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f7b5c009859e..0fffd247037b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8658,8 +8658,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) * is no need to worry about the conditional branch over the wrmsr * being speculatively taken. */ - if (vmx->spec_ctrl) - native_wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + x86_spec_ctrl_set_guest(vmx->spec_ctrl); vmx->__launched = vmx->loaded_vmcs->launched; asm( @@ -8797,8 +8796,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); - if (vmx->spec_ctrl) - native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); + x86_spec_ctrl_restore_host(vmx->spec_ctrl); /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); -- cgit v1.2.3 From 0109a1b0a5cababd514671b517722585302c0d4f Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 25 Apr 2018 22:04:25 -0400 Subject: x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit da39556f66f5cfe8f9c989206974f1cb16ca5d7c upstream. Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar [dwmw2: Handle 4.9 guest CPUID differences, rename guest_cpu_has_ibrs() → guest_cpu_has_spec_ctrl()] Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: Update feature bit name] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/cpuid.h | 4 ++-- arch/x86/kvm/svm.c | 4 ++-- arch/x86/kvm/vmx.c | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0ab72a8387d4..6b20e0a823da 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(SPEC_CTRL) | F(ARCH_CAPABILITIES); + F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 7f74d7e18a01..31ff5d2d0536 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -170,7 +170,7 @@ static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu) return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); } -static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu) +static inline bool guest_cpuid_has_spec_ctrl(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; @@ -178,7 +178,7 @@ static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu) if (best && (best->ebx & bit(X86_FEATURE_IBRS))) return true; best = kvm_find_cpuid_entry(vcpu, 7, 0); - return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); + return best && (best->edx & (bit(X86_FEATURE_SPEC_CTRL) | bit(X86_FEATURE_SPEC_CTRL_SSBD))); } static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f86303592768..9b3ac8c54a59 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3090,7 +3090,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && - !guest_cpuid_has_ibrs(vcpu)) + !guest_cpuid_has_spec_ctrl(vcpu)) return 1; msr_info->data = svm->spec_ctrl; @@ -3171,7 +3171,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) break; case MSR_IA32_SPEC_CTRL: if (!msr->host_initiated && - !guest_cpuid_has_ibrs(vcpu)) + !guest_cpuid_has_spec_ctrl(vcpu)) return 1; /* The STIBP bit doesn't fault even if it's not advertised */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0fffd247037b..ea2e36d85569 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2861,7 +2861,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && - !guest_cpuid_has_ibrs(vcpu)) + !guest_cpuid_has_spec_ctrl(vcpu)) return 1; msr_info->data = to_vmx(vcpu)->spec_ctrl; @@ -2973,11 +2973,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && - !guest_cpuid_has_ibrs(vcpu)) + !guest_cpuid_has_spec_ctrl(vcpu)) return 1; /* The STIBP bit doesn't fault even if it's not advertised */ - if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD)) return 1; vmx->spec_ctrl = data; -- cgit v1.2.3 From 7f77d36ab3f3d3dc09af0afbc7b58198382e9941 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 11 May 2018 15:21:01 +0200 Subject: KVM: SVM: Move spec control call after restore of GS commit 15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765 upstream. svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9b3ac8c54a59..49d5543ebc98 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3998,6 +3998,18 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, svm->host.gs_base); +#else + loadsegment(fs, svm->host.fs); +#ifndef CONFIG_X86_32_LAZY_GS + loadsegment(gs, svm->host.gs); +#endif +#endif + /* * We do not use IBRS in the kernel. If this vCPU has used the * SPEC_CTRL MSR it may have left it on; save the value and @@ -4018,18 +4030,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) x86_spec_ctrl_restore_host(svm->spec_ctrl); - /* Eliminate branch target predictions from guest mode */ - vmexit_fill_RSB(); - -#ifdef CONFIG_X86_64 - wrmsrl(MSR_GS_BASE, svm->host.gs_base); -#else - loadsegment(fs, svm->host.fs); -#ifndef CONFIG_X86_32_LAZY_GS - loadsegment(gs, svm->host.gs); -#endif -#endif - reload_tss(vcpu); local_irq_disable(); -- cgit v1.2.3 From b5ec2b3f11993d843f75c2d2954ece20af96dc88 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 9 May 2018 23:01:01 +0200 Subject: x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream. AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: This was partly applied before; apply just the missing bits] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 11 +++++++++-- arch/x86/kvm/vmx.c | 5 +++-- 2 files changed, 12 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 49d5543ebc98..9abcc08f4e93 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -149,6 +149,12 @@ struct vcpu_svm { } host; u64 spec_ctrl; + /* + * Contains guest-controlled bits of VIRT_SPEC_CTRL, which will be + * translated into the appropriate L2_CFG bits on the host to + * perform speculative control. + */ + u64 virt_spec_ctrl; u32 *msrpm; @@ -1146,6 +1152,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 eax = 1; svm->spec_ctrl = 0; + svm->virt_spec_ctrl = 0; if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | @@ -3904,7 +3911,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) * is no need to worry about the conditional branch over the wrmsr * being speculatively taken. */ - x86_spec_ctrl_set_guest(svm->spec_ctrl); + x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl); asm volatile ( "push %%" _ASM_BP "; \n\t" @@ -4028,7 +4035,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); - x86_spec_ctrl_restore_host(svm->spec_ctrl); + x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); reload_tss(vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ea2e36d85569..e99994cc1266 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8658,9 +8658,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) * is no need to worry about the conditional branch over the wrmsr * being speculatively taken. */ - x86_spec_ctrl_set_guest(vmx->spec_ctrl); + x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); vmx->__launched = vmx->loaded_vmcs->launched; + asm( /* Store host registers */ "push %%" _ASM_DX "; push %%" _ASM_BP ";" @@ -8796,7 +8797,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); - x86_spec_ctrl_restore_host(vmx->spec_ctrl); + x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0); /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); -- cgit v1.2.3 From 3e3a1c2ee031cd3d1a8fe9a990b61c8f17a6dd83 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Wed, 2 May 2018 18:15:14 +0200 Subject: x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e7c587da125291db39ddf1f49b18e5970adbac17 upstream. Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Jörg Otte Cc: Linus Torvalds Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 4.4: This was partly applied before; apply just the missing bits] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/cpuid.c | 10 +++++----- arch/x86/kvm/cpuid.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 6b20e0a823da..b0371a77cbc8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB) | F(IBRS); + F(AMD_IBPB) | F(AMD_IBRS); /* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = @@ -596,10 +596,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; /* IBRS and IBPB aren't necessarily present in hardware cpuid */ - if (boot_cpu_has(X86_FEATURE_IBPB)) - entry->ebx |= F(IBPB); - if (boot_cpu_has(X86_FEATURE_IBRS)) - entry->ebx |= F(IBRS); + if (boot_cpu_has(X86_FEATURE_AMD_IBPB)) + entry->ebx |= F(AMD_IBPB); + if (boot_cpu_has(X86_FEATURE_AMD_IBRS)) + entry->ebx |= F(AMD_IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 31ff5d2d0536..ba8988707e9d 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -164,7 +164,7 @@ static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); - if (best && (best->ebx & bit(X86_FEATURE_IBPB))) + if (best && (best->ebx & bit(X86_FEATURE_AMD_IBPB))) return true; best = kvm_find_cpuid_entry(vcpu, 7, 0); return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL)); @@ -175,7 +175,7 @@ static inline bool guest_cpuid_has_spec_ctrl(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); - if (best && (best->ebx & bit(X86_FEATURE_IBRS))) + if (best && (best->ebx & bit(X86_FEATURE_AMD_IBRS))) return true; best = kvm_find_cpuid_entry(vcpu, 7, 0); return best && (best->edx & (bit(X86_FEATURE_SPEC_CTRL) | bit(X86_FEATURE_SPEC_CTRL_SSBD))); -- cgit v1.2.3 From ff3c3b181c5ee5930b9cc6ca59c4c985a3d93220 Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Thu, 10 May 2018 22:06:39 +0200 Subject: KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD commit bc226f07dcd3c9ef0b7f6236fe356ea4a9cb4769 upstream. Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kernel/cpu/common.c | 3 ++- arch/x86/kvm/cpuid.c | 11 +++++++++-- arch/x86/kvm/cpuid.h | 9 +++++++++ arch/x86/kvm/svm.c | 21 +++++++++++++++++++-- arch/x86/kvm/vmx.c | 18 +++++++++++++++--- arch/x86/kvm/x86.c | 13 ++++--------- 7 files changed, 59 insertions(+), 18 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3a37cdbdfbaa..c048d0d70cc4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -765,7 +765,7 @@ struct kvm_x86_ops { int (*hardware_setup)(void); /* __init */ void (*hardware_unsetup)(void); /* __exit */ bool (*cpu_has_accelerated_tpr)(void); - bool (*cpu_has_high_real_mode_segbase)(void); + bool (*has_emulated_msr)(int index); void (*cpuid_update)(struct kvm_vcpu *vcpu); /* Create, but do not attach this VCPU */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b12c0287d6cf..e8b46f575306 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -693,7 +693,8 @@ static void init_speculation_control(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_INTEL_STIBP)) set_cpu_cap(c, X86_FEATURE_STIBP); - if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD)) + if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD) || + cpu_has(c, X86_FEATURE_VIRT_SSBD)) set_cpu_cap(c, X86_FEATURE_SSBD); if (cpu_has(c, X86_FEATURE_AMD_IBRS)) { diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index b0371a77cbc8..b857bb9f6f23 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(AMD_IBPB) | F(AMD_IBRS); + F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD); /* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = @@ -595,13 +595,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBRS and IBPB aren't necessarily present in hardware cpuid */ + /* + * IBRS, IBPB and VIRT_SSBD aren't necessarily present in + * hardware cpuid + */ if (boot_cpu_has(X86_FEATURE_AMD_IBPB)) entry->ebx |= F(AMD_IBPB); if (boot_cpu_has(X86_FEATURE_AMD_IBRS)) entry->ebx |= F(AMD_IBRS); + if (boot_cpu_has(X86_FEATURE_VIRT_SSBD)) + entry->ebx |= F(VIRT_SSBD); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); + if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD)) + entry->ebx |= F(VIRT_SSBD); break; } case 0x80000019: diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index ba8988707e9d..72f159f4d456 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -189,6 +189,15 @@ static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu) return best && (best->edx & bit(X86_FEATURE_ARCH_CAPABILITIES)); } +static inline bool guest_cpuid_has_virt_ssbd(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); + return best && (best->ebx & bit(X86_FEATURE_VIRT_SSBD)); +} + + /* * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9abcc08f4e93..ecdf724da371 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3102,6 +3102,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = svm->spec_ctrl; break; + case MSR_AMD64_VIRT_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has_virt_ssbd(vcpu)) + return 1; + + msr_info->data = svm->virt_spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x01000065; break; @@ -3219,6 +3226,16 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) break; set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); break; + case MSR_AMD64_VIRT_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has_virt_ssbd(vcpu)) + return 1; + + if (data & ~SPEC_CTRL_SSBD) + return 1; + + svm->virt_spec_ctrl = data; + break; case MSR_STAR: svm->vmcb->save.star = data; break; @@ -4137,7 +4154,7 @@ static bool svm_cpu_has_accelerated_tpr(void) return false; } -static bool svm_has_high_real_mode_segbase(void) +static bool svm_has_emulated_msr(int index) { return true; } @@ -4421,7 +4438,7 @@ static struct kvm_x86_ops svm_x86_ops = { .hardware_enable = svm_hardware_enable, .hardware_disable = svm_hardware_disable, .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr, - .cpu_has_high_real_mode_segbase = svm_has_high_real_mode_segbase, + .has_emulated_msr = svm_has_emulated_msr, .vcpu_create = svm_create_vcpu, .vcpu_free = svm_free_vcpu, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e99994cc1266..e4b5fd72ca24 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8458,9 +8458,21 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) local_irq_enable(); } -static bool vmx_has_high_real_mode_segbase(void) +static bool vmx_has_emulated_msr(int index) { - return enable_unrestricted_guest || emulate_invalid_guest_state; + switch (index) { + case MSR_IA32_SMBASE: + /* + * We cannot do SMM unless we can run the guest in big + * real mode. + */ + return enable_unrestricted_guest || emulate_invalid_guest_state; + case MSR_AMD64_VIRT_SPEC_CTRL: + /* This is AMD only. */ + return false; + default: + return true; + } } static bool vmx_mpx_supported(void) @@ -10952,7 +10964,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .hardware_enable = hardware_enable, .hardware_disable = hardware_disable, .cpu_has_accelerated_tpr = report_flexpriority, - .cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase, + .has_emulated_msr = vmx_has_emulated_msr, .vcpu_create = vmx_create_vcpu, .vcpu_free = vmx_free_vcpu, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 12a91ea85d3a..aa1a0277a678 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -985,6 +985,7 @@ static u32 emulated_msrs[] = { MSR_IA32_MCG_STATUS, MSR_IA32_MCG_CTL, MSR_IA32_SMBASE, + MSR_AMD64_VIRT_SPEC_CTRL, }; static unsigned num_emulated_msrs; @@ -2584,7 +2585,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) * fringe case that is not enabled except via specific settings * of the module parameters. */ - r = kvm_x86_ops->cpu_has_high_real_mode_segbase(); + r = kvm_x86_ops->has_emulated_msr(MSR_IA32_SMBASE); break; case KVM_CAP_COALESCED_MMIO: r = KVM_COALESCED_MMIO_PAGE_OFFSET; @@ -4073,14 +4074,8 @@ static void kvm_init_msr_list(void) num_msrs_to_save = j; for (i = j = 0; i < ARRAY_SIZE(emulated_msrs); i++) { - switch (emulated_msrs[i]) { - case MSR_IA32_SMBASE: - if (!kvm_x86_ops->cpu_has_high_real_mode_segbase()) - continue; - break; - default: - break; - } + if (!kvm_x86_ops->has_emulated_msr(emulated_msrs[i])) + continue; if (j < i) emulated_msrs[j] = emulated_msrs[i]; -- cgit v1.2.3 From 2b29980eb75bc7dcb23ed0436fe805ac6e684542 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Thu, 13 Oct 2016 01:20:13 +0100 Subject: mm: replace get_user_pages_unlocked() write/force parameters with gup_flags commit c164154f66f0c9b02673f07aa4f044f1d9c70274 upstream. This removes the 'write' and 'force' use from get_user_pages_unlocked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes Reviewed-by: Jan Kara Acked-by: Michal Hocko Signed-off-by: Linus Torvalds [bwh: Backported to 4.4: - Also update calls from process_vm_rw_single_vec() and async_pf_execute() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/mips/mm/gup.c | 2 +- arch/s390/mm/gup.c | 2 +- arch/sh/mm/gup.c | 3 ++- arch/sparc/mm/gup.c | 3 ++- arch/x86/mm/gup.c | 2 +- 5 files changed, 7 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c index 349995d19c7f..e596e0a1cecc 100644 --- a/arch/mips/mm/gup.c +++ b/arch/mips/mm/gup.c @@ -303,7 +303,7 @@ slow_irqon: ret = get_user_pages_unlocked(current, mm, start, (end - start) >> PAGE_SHIFT, - write, 0, pages); + pages, write ? FOLL_WRITE : 0); /* Have to be a bit careful with return values */ if (nr > 0) { diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c index 12bbf0e8478f..7ad41be8b373 100644 --- a/arch/s390/mm/gup.c +++ b/arch/s390/mm/gup.c @@ -242,7 +242,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write, start += nr << PAGE_SHIFT; pages += nr; ret = get_user_pages_unlocked(current, mm, start, - nr_pages - nr, write, 0, pages); + nr_pages - nr, pages, write ? FOLL_WRITE : 0); /* Have to be a bit careful with return values */ if (nr > 0) ret = (ret < 0) ? nr : ret + nr; diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c index e7af6a65baab..8c51a0e94854 100644 --- a/arch/sh/mm/gup.c +++ b/arch/sh/mm/gup.c @@ -258,7 +258,8 @@ slow_irqon: pages += nr; ret = get_user_pages_unlocked(current, mm, start, - (end - start) >> PAGE_SHIFT, write, 0, pages); + (end - start) >> PAGE_SHIFT, pages, + write ? FOLL_WRITE : 0); /* Have to be a bit careful with return values */ if (nr > 0) { diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c index 2e5c4fc2daa9..150f48303fb0 100644 --- a/arch/sparc/mm/gup.c +++ b/arch/sparc/mm/gup.c @@ -250,7 +250,8 @@ slow: pages += nr; ret = get_user_pages_unlocked(current, mm, start, - (end - start) >> PAGE_SHIFT, write, 0, pages); + (end - start) >> PAGE_SHIFT, pages, + write ? FOLL_WRITE : 0); /* Have to be a bit careful with return values */ if (nr > 0) { diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index ae9a37bf1371..7d2542ad346a 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -388,7 +388,7 @@ slow_irqon: ret = get_user_pages_unlocked(current, mm, start, (end - start) >> PAGE_SHIFT, - write, 0, pages); + pages, write ? FOLL_WRITE : 0); /* Have to be a bit careful with return values */ if (nr > 0) { -- cgit v1.2.3 From 8e50b8b07f462ab4b91bc1491b1c91bd75e4ad40 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Thu, 13 Oct 2016 01:20:16 +0100 Subject: mm: replace get_user_pages() write/force parameters with gup_flags MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 768ae309a96103ed02eb1e111e838c87854d8b51 upstream. This removes the 'write' and 'force' from get_user_pages() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes Acked-by: Christian König Acked-by: Jesper Nilsson Acked-by: Michal Hocko Reviewed-by: Jan Kara Signed-off-by: Linus Torvalds [bwh: Backported to 4.4: - Drop changes in rapidio, vchiq, goldfish - Keep the "write" variable in amdgpu_ttm_tt_pin_userptr() as it's still needed - Also update calls from various other places that now use get_user_pages_remote() upstream, which were updated there by commit 9beae1ea8930 "mm: replace get_user_pages_remote() write/force ..." - Also update calls from hfi1 and ipath - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/cris/arch-v32/drivers/cryptocop.c | 4 +--- arch/ia64/kernel/err_inject.c | 2 +- arch/x86/mm/mpx.c | 3 +-- 3 files changed, 3 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/cris/arch-v32/drivers/cryptocop.c b/arch/cris/arch-v32/drivers/cryptocop.c index 877da1908234..98e2a5dbcfda 100644 --- a/arch/cris/arch-v32/drivers/cryptocop.c +++ b/arch/cris/arch-v32/drivers/cryptocop.c @@ -2724,7 +2724,6 @@ static int cryptocop_ioctl_process(struct inode *inode, struct file *filp, unsig (unsigned long int)(oper.indata + prev_ix), noinpages, 0, /* read access only for in data */ - 0, /* no force */ inpages, NULL); @@ -2740,8 +2739,7 @@ static int cryptocop_ioctl_process(struct inode *inode, struct file *filp, unsig current->mm, (unsigned long int)oper.cipher_outdata, nooutpages, - 1, /* write access for out data */ - 0, /* no force */ + FOLL_WRITE, /* write access for out data */ outpages, NULL); up_read(¤t->mm->mmap_sem); diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c index 0c161ed6d18e..8205b456de7a 100644 --- a/arch/ia64/kernel/err_inject.c +++ b/arch/ia64/kernel/err_inject.c @@ -143,7 +143,7 @@ store_virtual_to_phys(struct device *dev, struct device_attribute *attr, int ret; ret = get_user_pages(current, current->mm, virt_addr, - 1, VM_READ, 0, NULL, NULL); + 1, FOLL_WRITE, NULL, NULL); if (ret<=0) { #ifdef ERR_INJ_DEBUG printk("Virtual address %lx is not existing.\n",virt_addr); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 7ed47b1e6f42..7e94fc6f608a 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -536,10 +536,9 @@ static int mpx_resolve_fault(long __user *addr, int write) { long gup_ret; int nr_pages = 1; - int force = 0; gup_ret = get_user_pages(current, current->mm, (unsigned long)addr, - nr_pages, write, force, NULL, NULL); + nr_pages, write ? FOLL_WRITE : 0, NULL, NULL); /* * get_user_pages() returns number of pages gotten. * 0 means we failed to fault in and get anything, -- cgit v1.2.3 From f9b94f823aebb377bdd7e8e8e3de982e9df82606 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sat, 15 Dec 2018 07:30:39 -0800 Subject: powerpc/boot: Fix random libfdt related build errors [ Upstream commit 64c3f648c25d108f346fdc96c15180c6b7d250e9 ] Once in a while I see build errors similar to the following when building images from a clean tree. Building powerpc:virtex-ml507:44x/virtex5_defconfig ... failed ------------ Error log: arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error: libfdt.h: No such file or directory Building powerpc:bamboo:smpdev:44x/bamboo_defconfig ... failed ------------ Error log: arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error: libfdt.h: No such file or directory arch/powerpc/boot/treeboot-currituck.c:35:20: fatal error: libfdt.h: No such file or directory Rebuilds will succeed. Turns out that several source files in arch/powerpc/boot/ include libfdt.h, but Makefile dependencies are incomplete. Let's fix that. Signed-off-by: Guenter Roeck Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman [groeck: Backport to v4.4.y] Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin --- arch/powerpc/boot/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 99e4487248ff..57003d1bd243 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -70,7 +70,8 @@ $(addprefix $(obj)/,$(zlib) cuboot-c2k.o gunzip_util.o main.o): \ libfdt := fdt.c fdt_ro.c fdt_wip.c fdt_sw.c fdt_rw.c fdt_strerror.c libfdtheader := fdt.h libfdt.h libfdt_internal.h -$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o): \ +$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o \ + treeboot-akebono.o treeboot-currituck.o treeboot-iss4xx.o): \ $(addprefix $(obj)/,$(libfdtheader)) src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \ -- cgit v1.2.3 From ee2bc807cbf0dfcf0f4ee522e3ca3d250c269158 Mon Sep 17 00:00:00 2001 From: Radu Rendec Date: Tue, 27 Nov 2018 22:20:48 -0500 Subject: powerpc/msi: Fix NULL pointer access in teardown code commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream. The arch_teardown_msi_irqs() function assumes that controller ops pointers were already checked in arch_setup_msi_irqs(), but this assumption is wrong: arch_teardown_msi_irqs() can be called even when arch_setup_msi_irqs() returns an error (-ENOSYS). This can happen in the following scenario: - msi_capability_init() calls pci_msi_setup_msi_irqs() - pci_msi_setup_msi_irqs() returns -ENOSYS - msi_capability_init() notices the error and calls free_msi_irqs() - free_msi_irqs() calls pci_msi_teardown_msi_irqs() This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs(). The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure seems legit, as it does additional cleanup; e.g. list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs() is called and need to be cleaned up if that fails). Fixes: 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Radu Rendec Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/msi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c index dab616a33b8d..f2197654be07 100644 --- a/arch/powerpc/kernel/msi.c +++ b/arch/powerpc/kernel/msi.c @@ -34,5 +34,10 @@ void arch_teardown_msi_irqs(struct pci_dev *dev) { struct pci_controller *phb = pci_bus_to_host(dev->bus); - phb->controller_ops.teardown_msi_irqs(dev); + /* + * We can be called even when arch_setup_msi_irqs() returns -ENOSYS, + * so check the pointer again. + */ + if (phb->controller_ops.teardown_msi_irqs) + phb->controller_ops.teardown_msi_irqs(dev); } -- cgit v1.2.3 From 5c42212ae2d7261a33e7aba5c40904e129061cef Mon Sep 17 00:00:00 2001 From: YiFei Zhu Date: Thu, 29 Nov 2018 18:12:30 +0100 Subject: x86/earlyprintk/efi: Fix infinite loop on some screen widths [ Upstream commit 79c2206d369b87b19ac29cb47601059b6bf5c291 ] An affected screen resolution is 1366 x 768, which width is not divisible by 8, the default font width. On such screens, when longer lines are earlyprintk'ed, overflow-to-next-line can never trigger, due to the left-most x-coordinate of the next character always less than the screen width. Earlyprintk will infinite loop in trying to print the rest of the string but unable to, due to the line being full. This patch makes the trigger consider the right-most x-coordinate, instead of left-most, as the value to compare against the screen width threshold. Signed-off-by: YiFei Zhu Signed-off-by: Ard Biesheuvel Cc: Andy Lutomirski Cc: Arend van Spriel Cc: Bhupesh Sharma Cc: Borislav Petkov Cc: Dave Hansen Cc: Eric Snowberg Cc: Hans de Goede Cc: Joe Perches Cc: Jon Hunter Cc: Julien Thierry Cc: Linus Torvalds Cc: Marc Zyngier Cc: Matt Fleming Cc: Nathan Chancellor Cc: Peter Zijlstra Cc: Sai Praneeth Prakhya Cc: Sedat Dilek Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181129171230.18699-12-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- arch/x86/platform/efi/early_printk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/platform/efi/early_printk.c b/arch/x86/platform/efi/early_printk.c index 524142117296..82324fc25d5e 100644 --- a/arch/x86/platform/efi/early_printk.c +++ b/arch/x86/platform/efi/early_printk.c @@ -179,7 +179,7 @@ early_efi_write(struct console *con, const char *str, unsigned int num) num--; } - if (efi_x >= si->lfb_width) { + if (efi_x + font->width > si->lfb_width) { efi_x = 0; efi_y += font->height; } -- cgit v1.2.3 From 44b4717e50555c27424ceabcff443f07b1db58ed Mon Sep 17 00:00:00 2001 From: Jose Abreu Date: Fri, 30 Nov 2018 09:47:31 +0000 Subject: ARC: io.h: Implement reads{x}()/writes{x}() [ Upstream commit 10d443431dc2bb733cf7add99b453e3fb9047a2e ] Some ARC CPU's do not support unaligned loads/stores. Currently, generic implementation of reads{b/w/l}()/writes{b/w/l}() is being used with ARC. This can lead to misfunction of some drivers as generic functions do a plain dereference of a pointer that can be unaligned. Let's use {get/put}_unaligned() helpers instead of plain dereference of pointer in order to fix. The helpers allow to get and store data from an unaligned address whilst preserving the CPU internal alignment. According to [1], the use of these helpers are costly in terms of performance so we added an initial check for a buffer already aligned so that the usage of the helpers can be avoided, when possible. [1] Documentation/unaligned-memory-access.txt Cc: Alexey Brodkin Cc: Joao Pinto Cc: David Laight Tested-by: Vitor Soares Signed-off-by: Jose Abreu Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin --- arch/arc/include/asm/io.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) (limited to 'arch') diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h index cb69299a492e..f120d823e8c2 100644 --- a/arch/arc/include/asm/io.h +++ b/arch/arc/include/asm/io.h @@ -12,6 +12,7 @@ #include #include #include +#include #ifdef CONFIG_ISA_ARCV2 #include @@ -85,6 +86,42 @@ static inline u32 __raw_readl(const volatile void __iomem *addr) return w; } +/* + * {read,write}s{b,w,l}() repeatedly access the same IO address in + * native endianness in 8-, 16-, 32-bit chunks {into,from} memory, + * @count times + */ +#define __raw_readsx(t,f) \ +static inline void __raw_reads##f(const volatile void __iomem *addr, \ + void *ptr, unsigned int count) \ +{ \ + bool is_aligned = ((unsigned long)ptr % ((t) / 8)) == 0; \ + u##t *buf = ptr; \ + \ + if (!count) \ + return; \ + \ + /* Some ARC CPU's don't support unaligned accesses */ \ + if (is_aligned) { \ + do { \ + u##t x = __raw_read##f(addr); \ + *buf++ = x; \ + } while (--count); \ + } else { \ + do { \ + u##t x = __raw_read##f(addr); \ + put_unaligned(x, buf++); \ + } while (--count); \ + } \ +} + +#define __raw_readsb __raw_readsb +__raw_readsx(8, b) +#define __raw_readsw __raw_readsw +__raw_readsx(16, w) +#define __raw_readsl __raw_readsl +__raw_readsx(32, l) + #define __raw_writeb __raw_writeb static inline void __raw_writeb(u8 b, volatile void __iomem *addr) { @@ -117,6 +154,35 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr) } +#define __raw_writesx(t,f) \ +static inline void __raw_writes##f(volatile void __iomem *addr, \ + const void *ptr, unsigned int count) \ +{ \ + bool is_aligned = ((unsigned long)ptr % ((t) / 8)) == 0; \ + const u##t *buf = ptr; \ + \ + if (!count) \ + return; \ + \ + /* Some ARC CPU's don't support unaligned accesses */ \ + if (is_aligned) { \ + do { \ + __raw_write##f(*buf++, addr); \ + } while (--count); \ + } else { \ + do { \ + __raw_write##f(get_unaligned(buf++), addr); \ + } while (--count); \ + } \ +} + +#define __raw_writesb __raw_writesb +__raw_writesx(8, b) +#define __raw_writesw __raw_writesw +__raw_writesx(16, w) +#define __raw_writesl __raw_writesl +__raw_writesx(32, l) + /* * MMIO can also get buffered/optimized in micro-arch, so barriers needed * Based on ARM model for the typical use case @@ -132,10 +198,16 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr) #define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; }) #define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; }) #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) +#define readsb(p,d,l) ({ __raw_readsb(p,d,l); __iormb(); }) +#define readsw(p,d,l) ({ __raw_readsw(p,d,l); __iormb(); }) +#define readsl(p,d,l) ({ __raw_readsl(p,d,l); __iormb(); }) #define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); }) #define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) +#define writesb(p,d,l) ({ __iowmb(); __raw_writesb(p,d,l); }) +#define writesw(p,d,l) ({ __iowmb(); __raw_writesw(p,d,l); }) +#define writesl(p,d,l) ({ __iowmb(); __raw_writesl(p,d,l); }) /* * Relaxed API for drivers which can handle barrier ordering themselves -- cgit v1.2.3 From fa5d9b585e83232d286188e1e28f433401394948 Mon Sep 17 00:00:00 2001 From: Chris Cole Date: Fri, 23 Nov 2018 12:20:45 +0100 Subject: ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling [ Upstream commit a1208f6a822ac29933e772ef1f637c5d67838da9 ] This patch addresses possible memory corruption when v7_dma_inv_range(start_address, end_address) address parameters are not aligned to whole cache lines. This function issues "invalidate" cache management operations to all cache lines from start_address (inclusive) to end_address (exclusive). When start_address and/or end_address are not aligned, the start and/or end cache lines are first issued "clean & invalidate" operation. The assumption is this is done to ensure that any dirty data addresses outside the address range (but part of the first or last cache lines) are cleaned/flushed so that data is not lost, which could happen if just an invalidate is issued. The problem is that these first/last partial cache lines are issued "clean & invalidate" and then "invalidate". This second "invalidate" is not required and worse can cause "lost" writes to addresses outside the address range but part of the cache line. If another component writes to its part of the cache line between the "clean & invalidate" and "invalidate" operations, the write can get lost. This fix is to remove the extra "invalidate" operation when unaligned addressed are used. A kernel module is available that has a stress test to reproduce the issue and a unit test of the updated v7_dma_inv_range(). It can be downloaded from http://ftp.sageembedded.com/outgoing/linux/cache-test-20181107.tgz. v7_dma_inv_range() is call by dmac_[un]map_area(addr, len, direction) when the direction is DMA_FROM_DEVICE. One can (I believe) successfully argue that DMA from a device to main memory should use buffers aligned to cache line size, because the "clean & invalidate" might overwrite data that the device just wrote using DMA. But if a driver does use unaligned buffers, at least this fix will prevent memory corruption outside the buffer. Signed-off-by: Chris Cole Signed-off-by: Russell King Signed-off-by: Sasha Levin --- arch/arm/mm/cache-v7.S | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index a134d8a13d00..11d699af30ed 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -359,14 +359,16 @@ v7_dma_inv_range: ALT_UP(W(nop)) #endif mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D / U line + addne r0, r0, r2 tst r1, r3 bic r1, r1, r3 mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D / U line -1: - mcr p15, 0, r0, c7, c6, 1 @ invalidate D / U line - add r0, r0, r2 cmp r0, r1 +1: + mcrlo p15, 0, r0, c7, c6, 1 @ invalidate D / U line + addlo r0, r0, r2 + cmplo r0, r1 blo 1b dsb st ret lr -- cgit v1.2.3 From 38b1b66e5796cc9089c02720b78d50d0682e66b0 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 18 Dec 2018 17:29:56 +0000 Subject: x86/mtrr: Don't copy uninitialized gentry fields back to userspace commit 32043fa065b51e0b1433e48d118821c71b5cd65d upstream. Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King Signed-off-by: Thomas Gleixner Reviewed-by: Tyler Hicks Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mtrr/if.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c index d76f13d6d8d6..ec894bf5eeb0 100644 --- a/arch/x86/kernel/cpu/mtrr/if.c +++ b/arch/x86/kernel/cpu/mtrr/if.c @@ -173,6 +173,8 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg) struct mtrr_gentry gentry; void __user *arg = (void __user *) __arg; + memset(&gentry, 0, sizeof(gentry)); + switch (cmd) { case MTRRIOC_ADD_ENTRY: case MTRRIOC_SET_ENTRY: -- cgit v1.2.3 From e2f780613e3b0b220c5bffefd94381df9c1c933e Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 20 Dec 2018 14:21:08 -0800 Subject: KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup commit e81434995081fd7efb755fd75576b35dbb0850b1 upstream. ____kvm_handle_fault_on_reboot() provides a generic exception fixup handler that is used to cleanly handle faults on VMX/SVM instructions during reboot (or at least try to). If there isn't a reboot in progress, ____kvm_handle_fault_on_reboot() treats any exception as fatal to KVM and invokes kvm_spurious_fault(), which in turn generates a BUG() to get a stack trace and die. When it was originally added by commit 4ecac3fd6dc2 ("KVM: Handle virtualization instruction #UD faults during reboot"), the "call" to kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value is the RIP of the faulting instructing. The PUSH+JMP trickery is necessary because the exception fixup handler code lies outside of its associated function, e.g. right after the function. An actual CALL from the .fixup code would show a slightly bogus stack trace, e.g. an extra "random" function would be inserted into the trace, as the return RIP on the stack would point to no known function (and the unwinder will likely try to guess who owns the RIP). Unfortunately, the JMP was replaced with a CALL when the macro was reworked to not spin indefinitely during reboot (commit b7c4145ba2eb "KVM: Don't spin on virt instruction faults during reboot"). This causes the aforementioned behavior where a bogus function is inserted into the stack trace, e.g. my builds like to blame free_kvm_area(). Revert the CALL back to a JMP. The changelog for commit b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") contains nothing that indicates the switch to CALL was deliberate. This is backed up by the fact that the PUSH was left intact. Note that an alternative to the PUSH+JMP magic would be to JMP back to the "real" code and CALL from there, but that would require adding a JMP in the non-faulting path to avoid calling kvm_spurious_fault() and would add no value, i.e. the stack trace would be the same. Using CALL: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 Call Trace: free_kvm_area+0x1044/0x43ea [kvm_intel] ? vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace 9775b14b123b1713 ]--- Using JMP: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 Call Trace: vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace f9daedb85ab3ddba ]--- Fixes: b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c048d0d70cc4..2cb49ac1b2b2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1200,7 +1200,7 @@ asmlinkage void kvm_spurious_fault(void); "cmpb $0, kvm_rebooting \n\t" \ "jne 668b \n\t" \ __ASM_SIZE(push) " $666b \n\t" \ - "call kvm_spurious_fault \n\t" \ + "jmp kvm_spurious_fault \n\t" \ ".popsection \n\t" \ _ASM_EXTABLE(666b, 667b) -- cgit v1.2.3 From 79d02189e782873e4ce7fd676b3eec8a040217e6 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Thu, 15 Nov 2018 15:53:54 +0800 Subject: MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() commit 92aa0718c9fa5160ad2f0e7b5bffb52f1ea1e51a upstream. This patch is borrowed from ARM64 to ensure pmd_present() returns false after pmd_mknotpresent(). This is needed for THP. References: 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after pmd_mknotpresent()") Reviewed-by: James Hogan Signed-off-by: Huacai Chen Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/21135/ Cc: Ralf Baechle Cc: James Hogan Cc: Steven J . Hill Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang Cc: Zhangjin Wu Cc: # 3.8+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/pgtable-64.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h index cf661a2fb141..16fade4f49dd 100644 --- a/arch/mips/include/asm/pgtable-64.h +++ b/arch/mips/include/asm/pgtable-64.h @@ -189,6 +189,11 @@ static inline int pmd_bad(pmd_t pmd) static inline int pmd_present(pmd_t pmd) { +#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT + if (unlikely(pmd_val(pmd) & _PAGE_HUGE)) + return pmd_val(pmd) & _PAGE_PRESENT; +#endif + return pmd_val(pmd) != (unsigned long) invalid_pte_table; } -- cgit v1.2.3 From a214fe5560e7129d50e8d1157293e770276d5d73 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Thu, 15 Nov 2018 15:53:56 +0800 Subject: MIPS: Align kernel load address to 64KB commit bec0de4cfad21bd284dbddee016ed1767a5d2823 upstream. KEXEC needs the new kernel's load address to be aligned on a page boundary (see sanity_check_segment_list()), but on MIPS the default vmlinuz load address is only explicitly aligned to 16 bytes. Since the largest PAGE_SIZE supported by MIPS kernels is 64KB, increase the alignment calculated by calc_vmlinuz_load_addr to 64KB. Signed-off-by: Huacai Chen Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/21131/ Cc: Ralf Baechle Cc: James Hogan Cc: Steven J . Hill Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang Cc: Zhangjin Wu Cc: # 2.6.36+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/boot/compressed/calc_vmlinuz_load_addr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c b/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c index 37fe58c19a90..542c3ede9722 100644 --- a/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c +++ b/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c @@ -13,6 +13,7 @@ #include #include #include +#include "../../../../include/linux/sizes.h" int main(int argc, char *argv[]) { @@ -45,11 +46,11 @@ int main(int argc, char *argv[]) vmlinuz_load_addr = vmlinux_load_addr + vmlinux_size; /* - * Align with 16 bytes: "greater than that used for any standard data - * types by a MIPS compiler." -- See MIPS Run Linux (Second Edition). + * Align with 64KB: KEXEC needs load sections to be aligned to PAGE_SIZE, + * which may be as large as 64KB depending on the kernel configuration. */ - vmlinuz_load_addr += (16 - vmlinux_size % 16); + vmlinuz_load_addr += (SZ_64K - vmlinux_size % SZ_64K); printf("0x%llx\n", vmlinuz_load_addr); -- cgit v1.2.3 From 0c53038267a9883e4d0d591dc620fc7f0da4c584 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 25 Jan 2018 16:37:07 +0100 Subject: x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d391f1207067268261add0485f0f34503539c5b0 upstream. I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae Suggested-by: Radim Krčmář Suggested-by: Paolo Bonzini Signed-off-by: Vitaly Kuznetsov Acked-by: Michael S. Tsirkin Signed-off-by: Radim Krčmář Signed-off-by: Sasha Levin [mhaboustak: backport to 4.9.y] Signed-off-by: Mike Haboustak Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 19 +++++++++++++++++-- arch/x86/kvm/x86.c | 3 ++- 2 files changed, 19 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e4b5fd72ca24..3bdb2e747b89 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6163,9 +6163,24 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { - skip_emulated_instruction(vcpu); trace_kvm_fast_mmio(gpa); - return 1; + /* + * Doing kvm_skip_emulated_instruction() depends on undefined + * behavior: Intel's manual doesn't mandate + * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG + * occurs and while on real hardware it was observed to be set, + * other hypervisors (namely Hyper-V) don't set it, we end up + * advancing IP with some random value. Disable fast mmio when + * running nested and keep it for real hardware in hope that + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. + */ + if (!static_cpu_has(X86_FEATURE_HYPERVISOR)) { + skip_emulated_instruction(vcpu); + return 1; + } + else + return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP, + NULL, 0) == EMULATE_DONE; } ret = handle_mmio_page_fault(vcpu, gpa, true); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aa1a0277a678..1a934bb8ed1c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5436,7 +5436,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, * handle watchpoints yet, those would be handled in * the emulate_ops. */ - if (kvm_vcpu_check_breakpoint(vcpu, &r)) + if (!(emulation_type & EMULTYPE_SKIP) && + kvm_vcpu_check_breakpoint(vcpu, &r)) return r; ctxt->interruptibility = 0; -- cgit v1.2.3 From c7c77cdf7b494ab4a6e92013c7fc5281d0207a86 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Tue, 27 Nov 2018 09:01:54 +1100 Subject: powerpc: Fix COFF zImage booting on old powermacs [ Upstream commit 5564597d51c8ff5b88d95c76255e18b13b760879 ] Commit 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor at the start of crt0.S to have a hard-coded start address of 0x500000 rather than a reference to _zimage_start, presumably because having a reference to a symbol introduced a relocation which is awkward to handle in a position-independent executable. Unfortunately, what is at 0x500000 in the COFF image is not the first instruction, but the procedure descriptor itself, that is, a word containing 0x500000, which is not a valid instruction. Hence, booting a COFF zImage results in a "DEFAULT CATCH!, code=FFF00700" message from Open Firmware. This fixes the problem by (a) putting the procedure descriptor in the data section and (b) adding a branch to _zimage_start as the first instruction in the program. Fixes: 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN") Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/boot/crt0.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S index 5c2199857aa8..a3550e8f1a77 100644 --- a/arch/powerpc/boot/crt0.S +++ b/arch/powerpc/boot/crt0.S @@ -15,7 +15,7 @@ RELA = 7 RELACOUNT = 0x6ffffff9 - .text + .data /* A procedure descriptor used when booting this as a COFF file. * When making COFF, this comes first in the link and we're * linked at 0x500000. @@ -23,6 +23,8 @@ RELACOUNT = 0x6ffffff9 .globl _zimage_start_opd _zimage_start_opd: .long 0x500000, 0, 0, 0 + .text + b _zimage_start #ifdef __powerpc64__ .balign 8 -- cgit v1.2.3 From 88206b0e304275c64d98bc26c798ee2299bba8c3 Mon Sep 17 00:00:00 2001 From: Anson Huang Date: Tue, 4 Dec 2018 03:17:45 +0000 Subject: ARM: imx: update the cpu power up timing setting on i.mx6sx [ Upstream commit 1e434b703248580b7aaaf8a115d93e682f57d29f ] The sw2iso count should cover ARM LDO ramp-up time, the MAX ARM LDO ramp-up time may be up to more than 100us on some boards, this patch sets sw2iso to 0xf (~384us) which is the reset value, and it is much more safe to cover different boards, since we have observed that some customer boards failed with current setting of 0x2. Fixes: 05136f0897b5 ("ARM: imx: support arm power off in cpuidle for i.mx6sx") Signed-off-by: Anson Huang Reviewed-by: Fabio Estevam Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin --- arch/arm/mach-imx/cpuidle-imx6sx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-imx/cpuidle-imx6sx.c b/arch/arm/mach-imx/cpuidle-imx6sx.c index 3c6672b3796b..7f5df8992008 100644 --- a/arch/arm/mach-imx/cpuidle-imx6sx.c +++ b/arch/arm/mach-imx/cpuidle-imx6sx.c @@ -97,7 +97,7 @@ int __init imx6sx_cpuidle_init(void) * except for power up sw2iso which need to be * larger than LDO ramp up time. */ - imx_gpc_set_arm_power_up_timing(2, 1); + imx_gpc_set_arm_power_up_timing(0xf, 1); imx_gpc_set_arm_power_down_timing(1, 1); return cpuidle_register(&imx6sx_cpuidle_driver, NULL); -- cgit v1.2.3 From 557f16c7fe26a2d16013c2821c5ce5a90a7da97e Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 7 Jan 2019 15:15:59 -0800 Subject: crypto: x86/chacha20 - avoid sleeping with preemption disabled In chacha20-simd, clear the MAY_SLEEP flag in the blkcipher_desc to prevent sleeping with preemption disabled, under kernel_fpu_begin(). This was fixed upstream incidentally by a large refactoring, commit 9ae433bc79f9 ("crypto: chacha20 - convert generic and x86 versions to skcipher"). But syzkaller easily trips over this when running on older kernels, as it's easily reachable via AF_ALG. Therefore, this patch makes the minimal fix for older kernels. Fixes: c9320b6dcb89 ("crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64") Cc: linux-crypto@vger.kernel.org Cc: Martin Willi Signed-off-by: Eric Biggers Acked-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- arch/x86/crypto/chacha20_glue.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c index 8baaff5af0b5..75b9d43069f1 100644 --- a/arch/x86/crypto/chacha20_glue.c +++ b/arch/x86/crypto/chacha20_glue.c @@ -77,6 +77,7 @@ static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst, blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE); + desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv); -- cgit v1.2.3 From 0b6c2279b7a4a3b4c26781948c7edb08c7dd4488 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 18 Jan 2019 17:55:53 +0000 Subject: arm64/kvm: consistently handle host HCR_EL2 flags [ Backport of upstream commit 4eaed6aa2c628101246bcabc91b203bfac1193f8 ] In KVM we define the configuration of HCR_EL2 for a VHE HOST in HCR_HOST_VHE_FLAGS, but we don't have a similar definition for the non-VHE host flags, and open-code HCR_RW. Further, in head.S we open-code the flags for VHE and non-VHE configurations. In future, we're going to want to configure more flags for the host, so lets add a HCR_HOST_NVHE_FLAGS defintion, and consistently use both HCR_HOST_VHE_FLAGS and HCR_HOST_NVHE_FLAGS in the kvm code and head.S. We now use mov_q to generate the HCR_EL2 value, as we use when configuring other registers in head.S. Reviewed-by: Marc Zyngier Reviewed-by: Richard Henderson Signed-off-by: Mark Rutland Reviewed-by: Christoffer Dall Cc: Catalin Marinas Cc: Marc Zyngier Cc: Will Deacon Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon [kristina: backport to 4.4.y: non-VHE only; __deactivate_traps_nvhe in assembly; add #include] Signed-off-by: Kristina Martsenko Signed-off-by: Sasha Levin --- arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/kernel/head.S | 3 ++- arch/arm64/kvm/hyp.S | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index ef8e13d379cb..013b7de45ee7 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -81,6 +81,7 @@ HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW) #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) #define HCR_INT_OVERRIDE (HCR_FMO | HCR_IMO) +#define HCR_HOST_NVHE_FLAGS (HCR_RW) /* Hyp System Control Register (SCTLR_EL2) bits */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index d019c3a58cc2..0382eba4bf7b 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -464,7 +465,7 @@ CPU_LE( bic x0, x0, #(3 << 24) ) // Clear the EE and E0E bits for EL1 ret /* Hyp configuration. */ -2: mov x0, #(1 << 31) // 64-bit EL1 +2: mov_q x0, HCR_HOST_NVHE_FLAGS msr hcr_el2, x0 /* Generic timers. */ diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index 86c289832272..8d3da858c257 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -494,7 +494,7 @@ .endm .macro deactivate_traps - mov x2, #HCR_RW + mov_q x2, HCR_HOST_NVHE_FLAGS msr hcr_el2, x2 msr hstr_el2, xzr -- cgit v1.2.3 From 5b1d8e5d86c29ddc9d2ee13dd82de98a5d2692d7 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 18 Jan 2019 17:55:54 +0000 Subject: arm64: Don't trap host pointer auth use to EL2 [ Backport of upstream commit b3669b1e1c09890d61109a1a8ece2c5b66804714 ] To allow EL0 (and/or EL1) to use pointer authentication functionality, we must ensure that pointer authentication instructions and accesses to pointer authentication keys are not trapped to EL2. This patch ensures that HCR_EL2 is configured appropriately when the kernel is booted at EL2. For non-VHE kernels we set HCR_EL2.{API,APK}, ensuring that EL1 can access keys and permit EL0 use of instructions. For VHE kernels host EL0 (TGE && E2H) is unaffected by these settings, and it doesn't matter how we configure HCR_EL2.{API,APK}, so we don't bother setting them. This does not enable support for KVM guests, since KVM manages HCR_EL2 itself when running VMs. Reviewed-by: Richard Henderson Signed-off-by: Mark Rutland Acked-by: Christoffer Dall Cc: Catalin Marinas Cc: Marc Zyngier Cc: Will Deacon Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon [kristina: backport to 4.4.y: adjust context] Signed-off-by: Kristina Martsenko Signed-off-by: Sasha Levin --- arch/arm64/include/asm/kvm_arm.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 013b7de45ee7..d7e7cf56e8d6 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -23,6 +23,8 @@ #include /* Hyp Configuration Register (HCR) bits */ +#define HCR_API (UL(1) << 41) +#define HCR_APK (UL(1) << 40) #define HCR_ID (UL(1) << 33) #define HCR_CD (UL(1) << 32) #define HCR_RW_SHIFT 31 @@ -81,7 +83,7 @@ HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW) #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) #define HCR_INT_OVERRIDE (HCR_FMO | HCR_IMO) -#define HCR_HOST_NVHE_FLAGS (HCR_RW) +#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK) /* Hyp System Control Register (SCTLR_EL2) bits */ -- cgit v1.2.3 From db58a203792a853c700af7147c3bcbd82adbfe75 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 10 Jan 2019 17:24:31 +0100 Subject: mips: fix n32 compat_ipc_parse_version commit 5a9372f751b5350e0ce3d2ee91832f1feae2c2e5 upstream. While reading through the sysvipc implementation, I noticed that the n32 semctl/shmctl/msgctl system calls behave differently based on whether o32 support is enabled or not: Without o32, the IPC_64 flag passed by user space is rejected but calls without that flag get IPC_64 behavior. As far as I can tell, this was inadvertently changed by a cleanup patch but never noticed by anyone, possibly nobody has tried using sysvipc on n32 after linux-3.19. Change it back to the old behavior now. Fixes: 78aaf956ba3a ("MIPS: Compat: Fix build error if CONFIG_MIPS32_COMPAT but no compat ABI.") Signed-off-by: Arnd Bergmann Signed-off-by: Paul Burton Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 8b0424abc84c..3a908cc81317 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2972,6 +2972,7 @@ config MIPS32_O32 config MIPS32_N32 bool "Kernel support for n32 binaries" depends on 64BIT + select ARCH_WANT_COMPAT_IPC_PARSE_VERSION select COMPAT select MIPS32_COMPAT select SYSVIPC_COMPAT if SYSVIPC -- cgit v1.2.3 From 93e6b2659b167b15207221bc4c05c11d94e6a14c Mon Sep 17 00:00:00 2001 From: YunQiang Su Date: Tue, 8 Jan 2019 13:45:10 +0800 Subject: Disable MSI also when pcie-octeon.pcie_disable on commit a214720cbf50cd8c3f76bbb9c3f5c283910e9d33 upstream. Octeon has an boot-time option to disable pcie. Since MSI depends on PCI-E, we should also disable MSI also with this option is on in order to avoid inadvertently accessing PCIe registers. Signed-off-by: YunQiang Su Signed-off-by: Paul Burton Cc: pburton@wavecomp.com Cc: linux-mips@vger.kernel.org Cc: aaro.koskinen@iki.fi Cc: stable@vger.kernel.org # v3.3+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/pci/msi-octeon.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/mips/pci/msi-octeon.c b/arch/mips/pci/msi-octeon.c index 2a5bb849b10e..288b58b00dc8 100644 --- a/arch/mips/pci/msi-octeon.c +++ b/arch/mips/pci/msi-octeon.c @@ -369,7 +369,9 @@ int __init octeon_msi_initialize(void) int irq; struct irq_chip *msi; - if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_PCIE) { + if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_INVALID) { + return 0; + } else if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_PCIE) { msi_rcv_reg[0] = CVMX_PEXP_NPEI_MSI_RCV0; msi_rcv_reg[1] = CVMX_PEXP_NPEI_MSI_RCV1; msi_rcv_reg[2] = CVMX_PEXP_NPEI_MSI_RCV2; -- cgit v1.2.3 From c890a458e27210d1a749a18941047a9e4209fa93 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Tue, 13 Nov 2018 22:42:44 +0000 Subject: MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur [ Upstream commit e4849aff1e169b86c561738daf8ff020e9de1011 ] The Broadcom SiByte BCM1250, BCM1125, and BCM1125H SOCs have an onchip DRAM controller that supports memory amounts of up to 16GiB, and due to how the address decoder has been wired in the SOC any memory beyond 1GiB is actually mapped starting from 4GiB physical up, that is beyond the 32-bit addressable limit[1]. Consequently if the maximum amount of memory has been installed, then it will span up to 19GiB. Many of the evaluation boards we support that are based on one of these SOCs have their memory soldered and the amount present fits in the 32-bit address range. The BCM91250A SWARM board however has actual DIMM slots and accepts, depending on the peripherals revision of the SOC, up to 4GiB or 8GiB of memory in commercially available JEDEC modules[2]. I believe this is also the case with the BCM91250C2 LittleSur board. This means that up to either 3GiB or 7GiB of memory requires 64-bit addressing to access. I believe the BCM91480B BigSur board, which has the BCM1480 SOC instead, accepts at least as much memory, although I have no documentation or actual hardware available to verify that. Both systems have PCI slots installed for use by any PCI option boards, including ones that only support 32-bit addressing (additionally the 32-bit PCI host bridge of the BCM1250, BCM1125, and BCM1125H SOCs limits addressing to 32-bits), and there is no IOMMU available. Therefore for PCI DMA to work in the presence of memory beyond enable swiotlb for the affected systems. All the other SOC onchip DMA devices use 40-bit addressing and therefore can address the whole memory, so only enable swiotlb if PCI support and support for DMA beyond 4GiB have been both enabled in the configuration of the kernel. This shows up as follows: Broadcom SiByte BCM1250 B2 @ 800 MHz (SB1 rev 2) Board type: SiByte BCM91250A (SWARM) Determined physical RAM map: memory: 000000000fe7fe00 @ 0000000000000000 (usable) memory: 000000001ffffe00 @ 0000000080000000 (usable) memory: 000000000ffffe00 @ 00000000c0000000 (usable) memory: 0000000087fffe00 @ 0000000100000000 (usable) software IO TLB: mapped [mem 0xcbffc000-0xcfffc000] (64MB) in the bootstrap log and removes failures like these: defxx 0000:02:00.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi0: Receive buffer allocation failed fddi0: Adapter open failed! IP-Config: Failed to open fddi0 defxx 0000:09:08.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi1: Receive buffer allocation failed fddi1: Adapter open failed! IP-Config: Failed to open fddi1 when memory beyond 4GiB is handed out to devices that can only do 32-bit addressing. This updates commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32."). References: [1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R, Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview", "Memory Map", pp. 34-38 [2] "BCM91250A User Manual", Revision 91250A-UM100-R, Broadcom Corporation, 18 May 2004, Section 3: "Physical Description", "Supported DRAM", p. 23 Signed-off-by: Maciej W. Rozycki [paul.burton@mips.com: Remove GPL text from dma.c; SPDX tag covers it] Signed-off-by: Paul Burton Reviewed-by: Christoph Hellwig Patchwork: https://patchwork.linux-mips.org/patch/21108/ References: cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin --- arch/mips/Kconfig | 3 +++ arch/mips/sibyte/common/Makefile | 1 + arch/mips/sibyte/common/dma.c | 14 ++++++++++++++ 3 files changed, 18 insertions(+) create mode 100644 arch/mips/sibyte/common/dma.c (limited to 'arch') diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 3a908cc81317..333ea0389adb 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -760,6 +760,7 @@ config SIBYTE_SWARM select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI config SIBYTE_LITTLESUR bool "Sibyte BCM91250C2-LittleSur" @@ -782,6 +783,7 @@ config SIBYTE_SENTOSA select SYS_HAS_CPU_SB1 select SYS_SUPPORTS_BIG_ENDIAN select SYS_SUPPORTS_LITTLE_ENDIAN + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI config SIBYTE_BIGSUR bool "Sibyte BCM91480B-BigSur" @@ -795,6 +797,7 @@ config SIBYTE_BIGSUR select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI config SNI_RM bool "SNI RM200/300/400" diff --git a/arch/mips/sibyte/common/Makefile b/arch/mips/sibyte/common/Makefile index b3d6bf23a662..3ef3fb658136 100644 --- a/arch/mips/sibyte/common/Makefile +++ b/arch/mips/sibyte/common/Makefile @@ -1,4 +1,5 @@ obj-y := cfe.o +obj-$(CONFIG_SWIOTLB) += dma.o obj-$(CONFIG_SIBYTE_BUS_WATCHER) += bus_watcher.o obj-$(CONFIG_SIBYTE_CFE_CONSOLE) += cfe_console.o obj-$(CONFIG_SIBYTE_TBPROF) += sb_tbprof.o diff --git a/arch/mips/sibyte/common/dma.c b/arch/mips/sibyte/common/dma.c new file mode 100644 index 000000000000..eb47a94f3583 --- /dev/null +++ b/arch/mips/sibyte/common/dma.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * DMA support for Broadcom SiByte platforms. + * + * Copyright (c) 2018 Maciej W. Rozycki + */ + +#include +#include + +void __init plat_swiotlb_setup(void) +{ + swiotlb_init(1); +} -- cgit v1.2.3 From 2ec43b2673525149bfeaadfc5b527bf8581afe1c Mon Sep 17 00:00:00 2001 From: Anders Roxell Date: Wed, 17 Oct 2018 17:26:22 +0200 Subject: arm64: perf: set suppress_bind_attrs flag to true [ Upstream commit 81e9fa8bab381f8b6eb04df7cdf0f71994099bd4 ] The armv8_pmuv3 driver doesn't have a remove function, and when the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, the following Call trace can be seen. [ 1.424287] Failed to register pmu: armv8_pmuv3, reason -17 [ 1.424870] WARNING: CPU: 0 PID: 1 at ../kernel/events/core.c:11771 perf_event_sysfs_init+0x98/0xdc [ 1.425220] Modules linked in: [ 1.425531] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.19.0-rc7-next-20181012-00003-ge7a97b1ad77b-dirty #35 [ 1.425951] Hardware name: linux,dummy-virt (DT) [ 1.426212] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 1.426458] pc : perf_event_sysfs_init+0x98/0xdc [ 1.426720] lr : perf_event_sysfs_init+0x98/0xdc [ 1.426908] sp : ffff00000804bd50 [ 1.427077] x29: ffff00000804bd50 x28: ffff00000934e078 [ 1.427429] x27: ffff000009546000 x26: 0000000000000007 [ 1.427757] x25: ffff000009280710 x24: 00000000ffffffef [ 1.428086] x23: ffff000009408000 x22: 0000000000000000 [ 1.428415] x21: ffff000009136008 x20: ffff000009408730 [ 1.428744] x19: ffff80007b20b400 x18: 000000000000000a [ 1.429075] x17: 0000000000000000 x16: 0000000000000000 [ 1.429418] x15: 0000000000000400 x14: 2e79726f74636572 [ 1.429748] x13: 696420656d617320 x12: 656874206e692065 [ 1.430060] x11: 6d616e20656d6173 x10: 2065687420687469 [ 1.430335] x9 : ffff00000804bd50 x8 : 206e6f7361657220 [ 1.430610] x7 : 2c3376756d705f38 x6 : ffff00000954d7ce [ 1.430880] x5 : 0000000000000000 x4 : 0000000000000000 [ 1.431226] x3 : 0000000000000000 x2 : ffffffffffffffff [ 1.431554] x1 : 4d151327adc50b00 x0 : 0000000000000000 [ 1.431868] Call trace: [ 1.432102] perf_event_sysfs_init+0x98/0xdc [ 1.432382] do_one_initcall+0x6c/0x1a8 [ 1.432637] kernel_init_freeable+0x1bc/0x280 [ 1.432905] kernel_init+0x18/0x160 [ 1.433115] ret_from_fork+0x10/0x18 [ 1.433297] ---[ end trace 27fd415390eb9883 ]--- Rework to set suppress_bind_attrs flag to avoid removing the device when CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, since there's no real reason to remove the armv8_pmuv3 driver. Cc: Arnd Bergmann Co-developed-by: Arnd Bergmann Signed-off-by: Anders Roxell Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- arch/arm64/kernel/perf_event.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 62d3dc60ca09..e99a0ed7e66b 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -670,6 +670,7 @@ static struct platform_driver armv8_pmu_driver = { .driver = { .name = "armv8-pmu", .of_match_table = armv8_pmu_of_device_ids, + .suppress_bind_attrs = true, }, .probe = armv8_pmu_device_probe, }; -- cgit v1.2.3 From c25a126d1a651a07dcdddc8f8eb3ef8c01fec932 Mon Sep 17 00:00:00 2001 From: Eugeniy Paltsev Date: Mon, 17 Dec 2018 12:54:23 +0300 Subject: ARC: perf: map generic branches to correct hardware condition commit 3affbf0e154ee351add6fcc254c59c3f3947fa8f upstream. So far we've mapped branches to "ijmp" which also counts conditional branches NOT taken. This makes us different from other architectures such as ARM which seem to be counting only taken branches. So use "ijmptak" hardware condition which only counts (all jump instructions that are taken) 'ijmptak' event is available on both ARCompact and ARCv2 ISA based cores. Signed-off-by: Eugeniy Paltsev Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta [vgupta: reworked changelog] Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h index 5f071762fb1c..6a2ae61748e4 100644 --- a/arch/arc/include/asm/perf_event.h +++ b/arch/arc/include/asm/perf_event.h @@ -103,7 +103,8 @@ static const char * const arc_pmu_ev_hw_map[] = { /* counts condition */ [PERF_COUNT_HW_INSTRUCTIONS] = "iall", - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmp", /* Excludes ZOL jumps */ + /* All jump instructions that are taken */ + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmptak", [PERF_COUNT_ARC_BPOK] = "bpok", /* NP-NT, PT-T, PNT-NT */ #ifdef CONFIG_ISA_ARCV2 [PERF_COUNT_HW_BRANCH_MISSES] = "bpmp", -- cgit v1.2.3 From 74be2fcda651e596fe4e96c3c1ae4d76aa683655 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Fri, 9 Nov 2018 09:21:47 +0100 Subject: s390/early: improve machine detection commit 03aa047ef2db4985e444af6ee1c1dd084ad9fb4c upstream. Right now the early machine detection code check stsi 3.2.2 for "KVM" and set MACHINE_IS_VM if this is different. As the console detection uses diagnose 8 if MACHINE_IS_VM returns true this will crash Linux early for any non z/VM system that sets a different value than KVM. So instead of assuming z/VM, do not set any of MACHINE_IS_LPAR, MACHINE_IS_VM, or MACHINE_IS_KVM. CC: stable@vger.kernel.org Reviewed-by: Heiko Carstens Signed-off-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/early.c | 4 ++-- arch/s390/kernel/setup.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c index 8eccead675d4..cc7b450a7766 100644 --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -224,10 +224,10 @@ static noinline __init void detect_machine_type(void) if (stsi(vmms, 3, 2, 2) || !vmms->count) return; - /* Running under KVM? If not we assume z/VM */ + /* Detect known hypervisors */ if (!memcmp(vmms->vm[0].cpi, "\xd2\xe5\xd4", 3)) S390_lowcore.machine_flags |= MACHINE_FLAG_KVM; - else + else if (!memcmp(vmms->vm[0].cpi, "\xa9\x61\xe5\xd4", 4)) S390_lowcore.machine_flags |= MACHINE_FLAG_VM; } diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index e7a43a30e3ff..47692c78d09c 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -833,6 +833,8 @@ void __init setup_arch(char **cmdline_p) pr_info("Linux is running under KVM in 64-bit mode\n"); else if (MACHINE_IS_LPAR) pr_info("Linux is running natively in 64-bit mode\n"); + else + pr_info("Linux is running as a guest in 64-bit mode\n"); /* Have one command line that is parsed and saved in /proc/cmdline */ /* boot_command_line has been already set up in early.c */ -- cgit v1.2.3 From 86dd006cffecc263f8dc2cf2a787e0c53bb03b80 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Wed, 9 Jan 2019 13:00:03 +0100 Subject: s390/smp: fix CPU hotplug deadlock with CPU rescan commit b7cb707c373094ce4008d4a6ac9b6b366ec52da5 upstream. smp_rescan_cpus() is called without the device_hotplug_lock, which can lead to a dedlock when a new CPU is found and immediately set online by a udev rule. This was observed on an older kernel version, where the cpu_hotplug_begin() loop was still present, and it resulted in hanging chcpu and systemd-udev processes. This specific deadlock will not show on current kernels. However, there may be other possible deadlocks, and since smp_rescan_cpus() can still trigger a CPU hotplug operation, the device_hotplug_lock should be held. For reference, this was the deadlock with the old cpu_hotplug_begin() loop: chcpu (rescan) systemd-udevd echo 1 > /sys/../rescan -> smp_rescan_cpus() -> (*) get_online_cpus() (increases refcount) -> smp_add_present_cpu() (new CPU found) -> register_cpu() -> device_add() -> udev "add" event triggered -----------> udev rule sets CPU online -> echo 1 > /sys/.../online -> lock_device_hotplug_sysfs() (this is missing in rescan path) -> device_online() -> (**) device_lock(new CPU dev) -> cpu_up() -> cpu_hotplug_begin() (loops until refcount == 0) -> deadlock with (*) -> bus_probe_device() -> device_attach() -> device_lock(new CPU dev) -> deadlock with (**) Fix this by taking the device_hotplug_lock in the CPU rescan path. Cc: Signed-off-by: Gerald Schaefer Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/smp.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 77f4f334a465..dbf883e3247f 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -1152,7 +1152,11 @@ static ssize_t __ref rescan_store(struct device *dev, { int rc; + rc = lock_device_hotplug_sysfs(); + if (rc) + return rc; rc = smp_rescan_cpus(); + unlock_device_hotplug(); return rc ? rc : count; } static DEVICE_ATTR(rescan, 0200, NULL, rescan_store); -- cgit v1.2.3 From 43473a6f6734e83ff5d458bedfe2e88b51d161e9 Mon Sep 17 00:00:00 2001 From: Alexander Popov Date: Mon, 21 Jan 2019 15:48:40 +0300 Subject: KVM: x86: Fix single-step debugging commit 5cc244a20b86090c087073c124284381cdf47234 upstream. The single-step debugging of KVM guests on x86 is broken: if we run gdb 'stepi' command at the breakpoint when the guest interrupts are enabled, RIP always jumps to native_apic_mem_write(). Then other nasty effects follow. Long investigation showed that on Jun 7, 2017 the commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can be called without X86_EFLAGS_TF set. Let's fix it. Please consider that for -stable. Signed-off-by: Alexander Popov Cc: stable@vger.kernel.org Fixes: c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1a934bb8ed1c..758e2b39567d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5524,8 +5524,7 @@ restart: toggle_interruptibility(vcpu, ctxt->interruptibility); vcpu->arch.emulate_regs_need_sync_to_vcpu = false; kvm_rip_write(vcpu, ctxt->eip); - if (r == EMULATE_DONE && - (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) + if (r == EMULATE_DONE && ctxt->tf) kvm_vcpu_do_singlestep(vcpu, &r); if (!ctxt->have_exception || exception_type(ctxt->exception.vector) == EXCPT_TRAP) -- cgit v1.2.3 From 5efadf3b3e7f7387e48527e8581976968dc2d11c Mon Sep 17 00:00:00 2001 From: Daniel Drake Date: Mon, 7 Jan 2019 11:40:24 +0800 Subject: x86/kaslr: Fix incorrect i8254 outb() parameters commit 7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4 upstream. The outb() function takes parameters value and port, in that order. Fix the parameters used in the kalsr i8254 fallback code. Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions") Signed-off-by: Daniel Drake Signed-off-by: Thomas Gleixner Cc: bp@alien8.de Cc: hpa@zytor.com Cc: linux@endlessm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/boot/compressed/aslr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index 31dab2135188..21332b431f10 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -25,8 +25,8 @@ static inline u16 i8254(void) u16 status, timer; do { - outb(I8254_PORT_CONTROL, - I8254_CMD_READBACK | I8254_SELECT_COUNTER0); + outb(I8254_CMD_READBACK | I8254_SELECT_COUNTER0, + I8254_PORT_CONTROL); status = inb(I8254_PORT_COUNTER0); timer = inb(I8254_PORT_COUNTER0); timer |= inb(I8254_PORT_COUNTER0) << 8; -- cgit v1.2.3 From 029b5be50938e6f13ce76ca747b43a16fd010252 Mon Sep 17 00:00:00 2001 From: Shaokun Zhang Date: Tue, 21 Jun 2016 15:32:57 +0800 Subject: arm64: mm: remove page_mapping check in __sync_icache_dcache commit 20c27a4270c775d7ed661491af8ac03264d60fc6 upstream. __sync_icache_dcache unconditionally skips the cache maintenance for anonymous pages, under the assumption that flushing is only required in the presence of D-side aliases [see 7249b79f6b4cc ("arm64: Do not flush the D-cache for anonymous pages")]. Unfortunately, this breaks migration of anonymous pages holding self-modifying code, where userspace cannot be reasonably expected to reissue maintenance instructions in response to a migration. This patch fixes the problem by removing the broken page_mapping(page) check from the cache syncing code, otherwise we may end up fetching and executing stale instructions from the PoU. Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Reviewed-by: Catalin Marinas Signed-off-by: Shaokun Zhang Signed-off-by: Will Deacon Signed-off-by: Sasha Levin Cc: Amanieu d'Antras Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/flush.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'arch') diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index c26b804015e8..a90615baa529 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -70,10 +70,6 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr) { struct page *page = pte_page(pte); - /* no flushing needed for anonymous pages */ - if (!page_mapping(page)) - return; - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) { __flush_dcache_area(page_address(page), PAGE_SIZE << compound_order(page)); -- cgit v1.2.3 From dda201759ece8134d389257b5a762b01b588b28b Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Fri, 11 Jan 2019 15:18:22 +0100 Subject: s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU commit 60f1bf29c0b2519989927cae640cd1f50f59dc7f upstream. When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read from pcpu_devices->lowcore. However, due to prefixing, that will result in reading from absolute address 0 on that CPU. We have to go via the actual lowcore instead. This means that right now, we will read lc->nodat_stack == 0 and therfore work on a very wrong stack. This BUG essentially broke rebooting under QEMU TCG (which will report a low address protection exception). And checking under KVM, it is also broken under KVM. With 1 VCPU it can be easily triggered. :/# echo 1 > /proc/sys/kernel/sysrq :/# echo b > /proc/sysrq-trigger [ 28.476745] sysrq: SysRq : Resetting [ 28.476793] Kernel stack overflow. [ 28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13 [ 28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux) [ 28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140) [ 28.476861] R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000 [ 28.476864] 0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0 [ 28.476864] 000000000010dff8 0000000000000000 0000000000000000 0000000000000000 [ 28.476865] 000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000 [ 28.476887] Krnl Code: 0000000000115bfe: 4170f000 la %r7,0(%r15) [ 28.476887] 0000000000115c02: 41f0a000 la %r15,0(%r10) [ 28.476887] #0000000000115c06: e370f0980024 stg %r7,152(%r15) [ 28.476887] >0000000000115c0c: c0e5fffff86e brasl %r14,114ce8 [ 28.476887] 0000000000115c12: 41f07000 la %r15,0(%r7) [ 28.476887] 0000000000115c16: a7f4ffa8 brc 15,115b66 [ 28.476887] 0000000000115c1a: 0707 bcr 0,%r7 [ 28.476887] 0000000000115c1c: 0707 bcr 0,%r7 [ 28.476901] Call Trace: [ 28.476902] Last Breaking-Event-Address: [ 28.476920] [<0000000000a01c4a>] arch_call_rest_init+0x22/0x80 [ 28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue. [ 28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13 [ 28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux) [ 28.476932] Call Trace: Fixes: 2f859d0dad81 ("s390/smp: reduce size of struct pcpu") Cc: stable@vger.kernel.org # 4.0+ Reported-by: Cornelia Huck Signed-off-by: David Hildenbrand Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/smp.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index dbf883e3247f..29e5409c0d48 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -360,9 +360,13 @@ void smp_call_online_cpu(void (*func)(void *), void *data) */ void smp_call_ipl_cpu(void (*func)(void *), void *data) { + struct _lowcore *lc = pcpu_devices->lowcore; + + if (pcpu_devices[0].address == stap()) + lc = &S390_lowcore; + pcpu_delegate(&pcpu_devices[0], func, data, - pcpu_devices->lowcore->panic_stack - - PANIC_FRAME_OFFSET + PAGE_SIZE); + lc->panic_stack - PANIC_FRAME_OFFSET + PAGE_SIZE); } int smp_find_processor_id(u16 address) -- cgit v1.2.3 From 1472585f511fc3a6e95f1be1d6acf87fbb4cde43 Mon Sep 17 00:00:00 2001 From: Koen Vandeputte Date: Thu, 31 Jan 2019 15:00:01 -0600 Subject: ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment commit 65dbb423cf28232fed1732b779249d6164c5999b upstream. Originally, cns3xxx used its own functions for mapping, reading and writing config registers. Commit 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") removed the internal PCI config write function in favor of the generic one: cns3xxx_pci_write_config() --> pci_generic_config_write() cns3xxx_pci_write_config() expected aligned addresses, being produced by cns3xxx_pci_map_bus() while the generic one pci_generic_config_write() actually expects the real address as both the function and hardware are capable of byte-aligned writes. This currently leads to pci_generic_config_write() writing to the wrong registers. For instance, upon ath9k module loading: - driver ath9k gets loaded - The driver wants to write value 0xA8 to register PCI_LATENCY_TIMER, located at 0x0D - cns3xxx_pci_map_bus() aligns the address to 0x0C - pci_generic_config_write() effectively writes 0xA8 into register 0x0C (CACHE_LINE_SIZE) Fix the bug by removing the alignment in the cns3xxx mapping function. Fixes: 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") Signed-off-by: Koen Vandeputte [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi Acked-by: Krzysztof Halasa Acked-by: Tim Harvey Acked-by: Arnd Bergmann CC: stable@vger.kernel.org # v4.0+ CC: Bjorn Helgaas CC: Olof Johansson CC: Robin Leblon CC: Rob Herring CC: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-cns3xxx/pcie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-cns3xxx/pcie.c b/arch/arm/mach-cns3xxx/pcie.c index 318394ed5c7a..5e11ad3164e0 100644 --- a/arch/arm/mach-cns3xxx/pcie.c +++ b/arch/arm/mach-cns3xxx/pcie.c @@ -83,7 +83,7 @@ static void __iomem *cns3xxx_pci_map_bus(struct pci_bus *bus, } else /* remote PCI bus */ base = cnspci->cfg1_regs + ((busno & 0xf) << 20); - return base + (where & 0xffc) + (devfn << 12); + return base + where + (devfn << 12); } static int cns3xxx_pci_read_config(struct pci_bus *bus, unsigned int devfn, -- cgit v1.2.3 From 7685bb0efb0c9ecec18abe9c6226243014222344 Mon Sep 17 00:00:00 2001 From: James Morse Date: Thu, 24 Jan 2019 16:32:56 +0000 Subject: arm64: hyp-stub: Forbid kprobing of the hyp-stub commit 8fac5cbdfe0f01254d9d265c6aa1a95f94f58595 upstream. The hyp-stub is loaded by the kernel's early startup code at EL2 during boot, before KVM takes ownership later. The hyp-stub's text is part of the regular kernel text, meaning it can be kprobed. A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid. Add it to the __hyp_text. Signed-off-by: James Morse Cc: stable@vger.kernel.org Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/hyp-stub.S | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index a272f335c289..096e957aecb0 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -26,6 +26,8 @@ #include .text + .pushsection .hyp.text, "ax" + .align 11 ENTRY(__hyp_stub_vectors) -- cgit v1.2.3 From aeef84f38288f15d80905513c4eaf5b5cc66badd Mon Sep 17 00:00:00 2001 From: Yufen Wang Date: Fri, 2 Nov 2018 11:51:31 +0100 Subject: ARM: 8808/1: kexec:offline panic_smp_self_stop CPU [ Upstream commit 82c08c3e7f171aa7f579b231d0abbc1d62e91974 ] In case panic() and panic() called at the same time on different CPUS. For example: CPU 0: panic() __crash_kexec machine_crash_shutdown crash_smp_send_stop machine_kexec BUG_ON(num_online_cpus() > 1); CPU 1: panic() local_irq_disable panic_smp_self_stop If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump fails. CPU1 can't receive the ipi irq, CPU1 will be always online. To fix this problem, this patch split out the panic_smp_self_stop() and add set_cpu_online(smp_processor_id(), false). Signed-off-by: Yufen Wang Signed-off-by: Russell King Signed-off-by: Sasha Levin --- arch/arm/kernel/smp.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'arch') diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index b26361355dae..e42be5800f37 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -687,6 +687,21 @@ void smp_send_stop(void) pr_warn("SMP: failed to stop secondary CPUs\n"); } +/* In case panic() and panic() called at the same time on CPU1 and CPU2, + * and CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop() + * CPU1 can't receive the ipi irqs from CPU2, CPU1 will be always online, + * kdump fails. So split out the panic_smp_self_stop() and add + * set_cpu_online(smp_processor_id(), false). + */ +void panic_smp_self_stop(void) +{ + pr_debug("CPU %u will stop doing anything useful since another CPU has paniced\n", + smp_processor_id()); + set_cpu_online(smp_processor_id(), false); + while (1) + cpu_relax(); +} + /* * not supported here */ -- cgit v1.2.3 From 2f85daf84873df51f533894e50d9c0b80137dca0 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 25 Oct 2018 14:52:31 +0100 Subject: x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux) [ Upstream commit 53bb565fc5439f2c8c57a786feea5946804aa3e9 ] In the expression "word1 << 16", word1 starts as u16, but is promoted to a signed int, then sign-extended to resource_size_t, which is probably not what was intended. Cast to resource_size_t to avoid the sign extension. This fixes an identical issue as fixed by commit 0b2d70764bb3 ("x86/PCI: Fix Broadcom CNB20LE unintended sign extension") back in 2014. Detected by CoverityScan, CID#138749, 138750 ("Unintended sign extension") Fixes: 3f6ea84a3035 ("PCI: read memory ranges out of Broadcom CNB20LE host bridge") Signed-off-by: Colin Ian King Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin --- arch/x86/pci/broadcom_bus.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/pci/broadcom_bus.c b/arch/x86/pci/broadcom_bus.c index 526536c81ddc..ca1e8e6dccc8 100644 --- a/arch/x86/pci/broadcom_bus.c +++ b/arch/x86/pci/broadcom_bus.c @@ -50,8 +50,8 @@ static void __init cnb20le_res(u8 bus, u8 slot, u8 func) word1 = read_pci_config_16(bus, slot, func, 0xc0); word2 = read_pci_config_16(bus, slot, func, 0xc2); if (word1 != word2) { - res.start = (word1 << 16) | 0x0000; - res.end = (word2 << 16) | 0xffff; + res.start = ((resource_size_t) word1 << 16) | 0x0000; + res.end = ((resource_size_t) word2 << 16) | 0xffff; res.flags = IORESOURCE_MEM; update_res(info, res.start, res.end, res.flags, 0); } -- cgit v1.2.3 From c9ddfda641c4e868b08c37ed51debee2f539ea0d Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 4 Oct 2018 20:27:16 -0700 Subject: powerpc/pseries: add of_node_put() in dlpar_detach_node() [ Upstream commit 5b3f5c408d8cc59b87e47f1ab9803dbd006e4a91 ] The previous commit, "of: overlay: add missing of_node_get() in __of_attach_node_sysfs" added a missing of_node_get() to __of_attach_node_sysfs(). This results in a refcount imbalance for nodes attached with dlpar_attach_node(). The calling sequence from dlpar_attach_node() to __of_attach_node_sysfs() is: dlpar_attach_node() of_attach_node() __of_attach_node_sysfs() For more detailed description of the node refcount, see commit 68baf692c435 ("powerpc/pseries: Fix of_node_put() underflow during DLPAR remove"). Tested-by: Alan Tull Acked-by: Michael Ellerman Signed-off-by: Frank Rowand Signed-off-by: Sasha Levin --- arch/powerpc/platforms/pseries/dlpar.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 96536c969c9c..a8efed3b4691 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -280,6 +280,8 @@ int dlpar_detach_node(struct device_node *dn) if (rc) return rc; + of_node_put(dn); + return 0; } -- cgit v1.2.3 From c34330e1eccdd47268a9ed8c8e84632a2aa72001 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Wed, 17 Oct 2018 17:52:07 -0700 Subject: ARM: OMAP2+: hwmod: Fix some section annotations [ Upstream commit c10b26abeb53cabc1e6271a167d3f3d396ce0218 ] When building the kernel with Clang, the following section mismatch warnings appears: WARNING: vmlinux.o(.text+0x2d398): Section mismatch in reference from the function _setup() to the function .init.text:_setup_iclk_autoidle() The function _setup() references the function __init _setup_iclk_autoidle(). This is often because _setup lacks a __init annotation or the annotation of _setup_iclk_autoidle is wrong. WARNING: vmlinux.o(.text+0x2d3a0): Section mismatch in reference from the function _setup() to the function .init.text:_setup_reset() The function _setup() references the function __init _setup_reset(). This is often because _setup lacks a __init annotation or the annotation of _setup_reset is wrong. WARNING: vmlinux.o(.text+0x2d408): Section mismatch in reference from the function _setup() to the function .init.text:_setup_postsetup() The function _setup() references the function __init _setup_postsetup(). This is often because _setup lacks a __init annotation or the annotation of _setup_postsetup is wrong. _setup is used in omap_hwmod_allocate_module, which isn't marked __init and looks like it shouldn't be, meaning to fix these warnings, those functions must be moved out of the init section, which this patch does. Signed-off-by: Nathan Chancellor Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin --- arch/arm/mach-omap2/omap_hwmod.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c index 147c90e70b2e..36706d32d656 100644 --- a/arch/arm/mach-omap2/omap_hwmod.c +++ b/arch/arm/mach-omap2/omap_hwmod.c @@ -2526,7 +2526,7 @@ static int __init _init(struct omap_hwmod *oh, void *data) * a stub; implementing this properly requires iclk autoidle usecounting in * the clock code. No return value. */ -static void __init _setup_iclk_autoidle(struct omap_hwmod *oh) +static void _setup_iclk_autoidle(struct omap_hwmod *oh) { struct omap_hwmod_ocp_if *os; struct list_head *p; @@ -2561,7 +2561,7 @@ static void __init _setup_iclk_autoidle(struct omap_hwmod *oh) * reset. Returns 0 upon success or a negative error code upon * failure. */ -static int __init _setup_reset(struct omap_hwmod *oh) +static int _setup_reset(struct omap_hwmod *oh) { int r; @@ -2622,7 +2622,7 @@ static int __init _setup_reset(struct omap_hwmod *oh) * * No return value. */ -static void __init _setup_postsetup(struct omap_hwmod *oh) +static void _setup_postsetup(struct omap_hwmod *oh) { u8 postsetup_state; -- cgit v1.2.3 From 2d6f9a86cd9a6081e0431f19757cc0879a701775 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 15 Nov 2018 22:42:01 +0000 Subject: arm64: ftrace: don't adjust the LR value [ Upstream commit 6e803e2e6e367db9a0d6ecae1bd24bb5752011bd ] The core ftrace code requires that when it is handed the PC of an instrumented function, this PC is the address of the instrumented instruction. This is necessary so that the core ftrace code can identify the specific instrumentation site. Since the instrumented function will be a BL, the address of the instrumented function is LR - 4 at entry to the ftrace code. This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers, which acquire the PC of the instrumented function. The mcount_get_lr helper is used to acquire the LR of the instrumented function, whose value does not require this adjustment, and cannot be adjusted to anything meaningful. No adjustment of this value is made on other architectures, including arm. However, arm64 adjusts this value by 4. This patch brings arm64 in line with other architectures and removes the adjustment of the LR value. Signed-off-by: Mark Rutland Cc: AKASHI Takahiro Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Torsten Duwe Cc: Will Deacon Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- arch/arm64/kernel/entry-ftrace.S | 1 - 1 file changed, 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S index 0f03a8fe2314..d18d15810d19 100644 --- a/arch/arm64/kernel/entry-ftrace.S +++ b/arch/arm64/kernel/entry-ftrace.S @@ -78,7 +78,6 @@ .macro mcount_get_lr reg ldr \reg, [x29] ldr \reg, [\reg, #8] - mcount_adjust_addr \reg, \reg .endm .macro mcount_get_lr_addr reg -- cgit v1.2.3 From 75086100ab60a68b22d9eaea71a45de66102dd1a Mon Sep 17 00:00:00 2001 From: Lubomir Rintel Date: Wed, 28 Nov 2018 18:53:10 +0100 Subject: ARM: dts: mmp2: fix TWSI2 [ Upstream commit 1147e05ac9fc2ef86a3691e7ca5c2db7602d81dd ] Marvell keeps their MMP2 datasheet secret, but there are good clues that TWSI2 is not on 0xd4025000 on that platform, not does it use IRQ 58. In fact, the IRQ 58 on MMP2 seems to be a signal processor: arch/arm/mach-mmp/irqs.h:#define IRQ_MMP2_MSP 58 I'm taking a somewhat educated guess that is probably a copy & paste error from PXA168 or PXA910 and that the real controller in fact hides at address 0xd4031000 and uses an interrupt line multiplexed via IRQ 17. I'm also copying some properties from TWSI1 that were missing or incorrect. Tested on a OLPC XO 1.75 machine, where the RTC is on TWSI2. Signed-off-by: Lubomir Rintel Tested-by: Pavel Machek Signed-off-by: Olof Johansson Signed-off-by: Sasha Levin --- arch/arm/boot/dts/mmp2.dtsi | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/mmp2.dtsi b/arch/arm/boot/dts/mmp2.dtsi index 766bbb8495b6..47e5b63339d1 100644 --- a/arch/arm/boot/dts/mmp2.dtsi +++ b/arch/arm/boot/dts/mmp2.dtsi @@ -220,12 +220,15 @@ status = "disabled"; }; - twsi2: i2c@d4025000 { + twsi2: i2c@d4031000 { compatible = "mrvl,mmp-twsi"; - reg = <0xd4025000 0x1000>; - interrupts = <58>; + reg = <0xd4031000 0x1000>; + interrupt-parent = <&intcmux17>; + interrupts = <0>; clocks = <&soc_clocks MMP2_CLK_TWSI1>; resets = <&soc_clocks MMP2_CLK_TWSI1>; + #address-cells = <1>; + #size-cells = <0>; status = "disabled"; }; -- cgit v1.2.3 From 028f17e00e1aae50465b058f4f985fd17e134887 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 28 Nov 2018 23:20:11 +0100 Subject: x86/fpu: Add might_fault() to user_insn() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6637401c35b2f327a35d27f44bda05e327f2f017 ] Every user of user_insn() passes an user memory pointer to this macro. Add might_fault() to user_insn() so we can spot users which are using this macro in sections where page faulting is not allowed. [ bp: Space it out to make it more visible. ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Borislav Petkov Reviewed-by: Rik van Riel Cc: "H. Peter Anvin" Cc: "Jason A. Donenfeld" Cc: Andy Lutomirski Cc: Dave Hansen Cc: Ingo Molnar Cc: Jann Horn Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: kvm ML Cc: x86-ml Link: https://lkml.kernel.org/r/20181128222035.2996-6-bigeasy@linutronix.de Signed-off-by: Sasha Levin --- arch/x86/include/asm/fpu/internal.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16825dda18dc..66a5e60f60c4 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -94,6 +94,9 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu); #define user_insn(insn, output, input...) \ ({ \ int err; \ + \ + might_fault(); \ + \ asm volatile(ASM_STAC "\n" \ "1:" #insn "\n\t" \ "2: " ASM_CLAC "\n" \ -- cgit v1.2.3 From 54b53c5994528bacf6f3e517d75943c147e82b97 Mon Sep 17 00:00:00 2001 From: Russell King - ARM Linux Date: Fri, 7 Dec 2018 09:17:07 -0800 Subject: ARM: dts: Fix OMAP4430 SDP Ethernet startup [ Upstream commit 84fb6c7feb1494ebb7d1ec8b95cfb7ada0264465 ] It was noticed that unbinding and rebinding the KSZ8851 ethernet resulted in the driver reporting "failed to read device ID" at probe. Probing the reset line with a 'scope while repeatedly attempting to bind the driver in a shell loop revealed that the KSZ8851 RSTN pin is constantly held at zero, meaning the device is held in reset, and does not respond on the SPI bus. Experimentation with the startup delay on the regulator set to 50ms shows that the reset is positively released after 20ms. Schematics for this board are not available, and the traces are buried in the inner layers of the board which makes tracing where the RSTN pin extremely difficult. We can only guess that the RSTN pin is wired to a reset generator chip driven off the ethernet supply, which fits the observed behaviour. Include this delay in the regulator startup delay - effectively treating the reset as a "supply stable" indicator. This can not be modelled as a delay in the KSZ8851 driver since the reset generation is board specific - if the RSTN pin had been wired to a GPIO, reset could be released earlier via the already provided support in the KSZ8851 driver. This also got confirmed by Peter Ujfalusi based on Blaze schematics that should be very close to SDP4430: TPS22902YFPR is used as the regulator switch (gpio48 controlled): Convert arm boot_lock to raw The VOUT is routed to TPS3808G01DBV. (SCH Note: Threshold set at 90%. Vsense: 0.405V). According to the TPS3808 data sheet the RESET delay time when Ct is open (this is the case in the schema): MIN/TYP/MAX: 12/20/28 ms. Signed-off-by: Russell King Reviewed-by: Peter Ujfalusi [tony@atomide.com: updated with notes from schematics from Peter] Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin --- arch/arm/boot/dts/omap4-sdp.dts | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/omap4-sdp.dts b/arch/arm/boot/dts/omap4-sdp.dts index f0bdc41f8eff..235d1493f8aa 100644 --- a/arch/arm/boot/dts/omap4-sdp.dts +++ b/arch/arm/boot/dts/omap4-sdp.dts @@ -33,6 +33,7 @@ gpio = <&gpio2 16 GPIO_ACTIVE_HIGH>; /* gpio line 48 */ enable-active-high; regulator-boot-on; + startup-delay-us = <25000>; }; vbat: fixedregulator-vbat { -- cgit v1.2.3 From cab345f0e9b454422c3ec6578003926105414653 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Mon, 3 Dec 2018 17:27:54 -0500 Subject: mips: bpf: fix encoding bug for mm_srlv32_op [ Upstream commit 17f6c83fb5ebf7db4fcc94a5be4c22d5a7bfe428 ] For micro-mips, srlv inside POOL32A encoding space should use 0x50 sub-opcode, NOT 0x90. Some early version ISA doc describes the encoding as 0x90 for both srlv and srav, this looks to me was a typo. I checked Binutils libopcode implementation which is using 0x50 for srlv and 0x90 for srav. v1->v2: - Keep mm_srlv32_op sorted by value. Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction") Cc: Markos Chandras Cc: Paul Burton Cc: linux-mips@vger.kernel.org Acked-by: Jakub Kicinski Acked-by: Song Liu Signed-off-by: Jiong Wang Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin --- arch/mips/include/uapi/asm/inst.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h index 1b6f2f219298..9db764b51ffe 100644 --- a/arch/mips/include/uapi/asm/inst.h +++ b/arch/mips/include/uapi/asm/inst.h @@ -290,8 +290,8 @@ enum mm_32a_minor_op { mm_ext_op = 0x02c, mm_pool32axf_op = 0x03c, mm_srl32_op = 0x040, + mm_srlv32_op = 0x050, mm_sra_op = 0x080, - mm_srlv32_op = 0x090, mm_rotr_op = 0x0c0, mm_lwxs_op = 0x118, mm_addu32_op = 0x150, -- cgit v1.2.3 From b9037406e1516411dc79985436aa4629c7f5070b Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 10 Dec 2018 22:58:39 +0100 Subject: ARM: pxa: avoid section mismatch warning [ Upstream commit 88af3209aa0881aa5ffd99664b6080a4be5f24e5 ] WARNING: vmlinux.o(.text+0x19f90): Section mismatch in reference from the function littleton_init_lcd() to the function .init.text:pxa_set_fb_info() The function littleton_init_lcd() references the function __init pxa_set_fb_info(). This is often because littleton_init_lcd lacks a __init annotation or the annotation of pxa_set_fb_info is wrong. WARNING: vmlinux.o(.text+0xf824): Section mismatch in reference from the function zeus_register_ohci() to the function .init.text:pxa_set_ohci_info() The function zeus_register_ohci() references the function __init pxa_set_ohci_info(). This is often because zeus_register_ohci lacks a __init annotation or the annotation of pxa_set_ohci_info is wrong. WARNING: vmlinux.o(.text+0xf95c): Section mismatch in reference from the function cm_x300_init_u2d() to the function .init.text:pxa3xx_set_u2d_info() The function cm_x300_init_u2d() references the function __init pxa3xx_set_u2d_info(). This is often because cm_x300_init_u2d lacks a __init annotation or the annotation of pxa3xx_set_u2d_info is wrong. Signed-off-by: Arnd Bergmann Signed-off-by: Olof Johansson Signed-off-by: Sasha Levin --- arch/arm/mach-pxa/cm-x300.c | 2 +- arch/arm/mach-pxa/littleton.c | 2 +- arch/arm/mach-pxa/zeus.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-pxa/cm-x300.c b/arch/arm/mach-pxa/cm-x300.c index a7dae60810e8..307fc18edede 100644 --- a/arch/arm/mach-pxa/cm-x300.c +++ b/arch/arm/mach-pxa/cm-x300.c @@ -547,7 +547,7 @@ static struct pxa3xx_u2d_platform_data cm_x300_u2d_platform_data = { .exit = cm_x300_u2d_exit, }; -static void cm_x300_init_u2d(void) +static void __init cm_x300_init_u2d(void) { pxa3xx_set_u2d_info(&cm_x300_u2d_platform_data); } diff --git a/arch/arm/mach-pxa/littleton.c b/arch/arm/mach-pxa/littleton.c index 5d665588c7eb..05aa7071efd6 100644 --- a/arch/arm/mach-pxa/littleton.c +++ b/arch/arm/mach-pxa/littleton.c @@ -183,7 +183,7 @@ static struct pxafb_mach_info littleton_lcd_info = { .lcd_conn = LCD_COLOR_TFT_16BPP, }; -static void littleton_init_lcd(void) +static void __init littleton_init_lcd(void) { pxa_set_fb_info(NULL, &littleton_lcd_info); } diff --git a/arch/arm/mach-pxa/zeus.c b/arch/arm/mach-pxa/zeus.c index d757cfb5f8a6..4da2458d7f32 100644 --- a/arch/arm/mach-pxa/zeus.c +++ b/arch/arm/mach-pxa/zeus.c @@ -558,7 +558,7 @@ static struct pxaohci_platform_data zeus_ohci_platform_data = { .flags = ENABLE_PORT_ALL | POWER_SENSE_LOW, }; -static void zeus_register_ohci(void) +static void __init zeus_register_ohci(void) { /* Port 2 is shared between host and client interface. */ UP2OCR = UP2OCR_HXOE | UP2OCR_HXS | UP2OCR_DMPDE | UP2OCR_DPPDE; -- cgit v1.2.3 From fac39ee2e52b19f826a2c041a41f534fd35717e0 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 9 Nov 2018 15:07:10 +0000 Subject: arm64: KVM: Skip MMIO insn after emulation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 0d640732dbebed0f10f18526de21652931f0b2f2 ] When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell Reviewed-by: Alex Bennée Reviewed-by: Christoffer Dall Signed-off-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm/kvm/mmio.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 387ee2a11e36..885cd0e0015b 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -118,6 +118,12 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data); } + /* + * The MMIO instruction is emulated and should not be re-executed + * in the guest. + */ + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); + return 0; } @@ -151,11 +157,6 @@ static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len) vcpu->arch.mmio_decode.sign_extend = sign_extend; vcpu->arch.mmio_decode.rt = rt; - /* - * The MMIO instruction is emulated and should not be re-executed - * in the guest. - */ - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); return 0; } -- cgit v1.2.3 From 3df04b50ef3f92f10aca8eb70f6db7cb89146258 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 10 Dec 2018 06:50:09 +0000 Subject: powerpc/uaccess: fix warning/error with access_ok() [ Upstream commit 05a4ab823983d9136a460b7b5e0d49ee709a6f86 ] With the following piece of code, the following compilation warning is encountered: if (_IOC_DIR(ioc) != _IOC_NONE) { int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ; if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) { drivers/platform/test/dev.c: In function 'my_ioctl': drivers/platform/test/dev.c:219:7: warning: unused variable 'verify' [-Wunused-variable] int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ; This patch fixes it by referencing 'type' in the macro allthough doing nothing with it. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index a5ffe0207c16..05f1389228d2 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -59,7 +59,7 @@ #endif #define access_ok(type, addr, size) \ - (__chk_user_ptr(addr), \ + (__chk_user_ptr(addr), (void)(type), \ __access_ok((__force unsigned long)(addr), (size), get_fs())) /* -- cgit v1.2.3 From a8014f2741b434598c5323f7fb2984725fbc5ad5 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 19 Dec 2018 12:06:13 +0100 Subject: KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e87555e550cef4941579cd879759a7c0dee24e68 ] AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: Vitaly Kuznetsov Signed-off-by: Radim Krčmář Signed-off-by: Sasha Levin --- arch/x86/kvm/svm.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ecdf724da371..7ce1a19d9d8b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4156,6 +4156,13 @@ static bool svm_cpu_has_accelerated_tpr(void) static bool svm_has_emulated_msr(int index) { + switch (index) { + case MSR_IA32_MCG_EXT_CTL: + return false; + default: + break; + } + return true; } -- cgit v1.2.3 From ccc9ed24493bc02b5174184ae5c73fee422cd0bd Mon Sep 17 00:00:00 2001 From: Anton Ivanov Date: Wed, 5 Dec 2018 12:37:41 +0000 Subject: um: Avoid marking pages with "changed protection" [ Upstream commit 8892d8545f2d0342b9c550defbfb165db237044b ] Changing protection is a very high cost operation in UML because in addition to an extra syscall it also interrupts mmap merge sequences generated by the tlb. While the condition is not particularly common it is worth avoiding. Signed-off-by: Anton Ivanov Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- arch/um/include/asm/pgtable.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index 18eb9924dda3..aeb430212947 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -197,12 +197,17 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - pte_clear_bits(pte, _PAGE_RW); + if (likely(pte_get_bits(pte, _PAGE_RW))) + pte_clear_bits(pte, _PAGE_RW); + else + return pte; return(pte_mknewprot(pte)); } static inline pte_t pte_mkread(pte_t pte) { + if (unlikely(pte_get_bits(pte, _PAGE_USER))) + return pte; pte_set_bits(pte, _PAGE_USER); return(pte_mknewprot(pte)); } @@ -221,6 +226,8 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { + if (unlikely(pte_get_bits(pte, _PAGE_RW))) + return pte; pte_set_bits(pte, _PAGE_RW); return(pte_mknewprot(pte)); } -- cgit v1.2.3 From 1b5fd913a4eb07cb13e969bb8e3b1633a40e683f Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 29 Jan 2019 18:41:16 +0100 Subject: KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) commit 353c0956a618a07ba4bbe7ad00ff29fe70e8412a upstream. Bugzilla: 1671930 Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with memory operand, INVEPT, INVVPID) can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Embargoed until Feb 7th 2019. Reported-by: Felix Wilhelm Cc: stable@kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 758e2b39567d..6bd0538d8ebf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4247,6 +4247,13 @@ int kvm_read_guest_virt(struct kvm_vcpu *vcpu, { u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + /* + * FIXME: this should call handle_emulation_failure if X86EMUL_IO_NEEDED + * is returned, but our callers are not ready for that and they blindly + * call kvm_inject_page_fault. Ensure that they at least do not leak + * uninitialized kernel stack memory into cr2 and error code. + */ + memset(exception, 0, sizeof(*exception)); return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, exception); } -- cgit v1.2.3 From 9872ddae1949b46d5310e0e71ca26bb5c4e52a70 Mon Sep 17 00:00:00 2001 From: Peter Shier Date: Thu, 11 Oct 2018 11:46:46 -0700 Subject: KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) commit ecec76885bcfe3294685dc363fd1273df0d5d65f upstream. Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier Reported-by: Jim Mattson Reviewed-by: Jim Mattson Reported-by: Felix Wilhelm Cc: stable@kernel.org Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3bdb2e747b89..aee2886a387c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6965,6 +6965,7 @@ static void free_nested(struct vcpu_vmx *vmx) if (!vmx->nested.vmxon) return; + hrtimer_cancel(&vmx->nested.preemption_timer); vmx->nested.vmxon = false; free_vpid(vmx->nested.vpid02); nested_release_vmcs12(vmx); -- cgit v1.2.3 From 00025153e202c6040763cce695004dc977823822 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Sun, 27 Jan 2019 06:53:14 -0800 Subject: perf/x86/intel/uncore: Add Node ID mask commit 9e63a7894fd302082cf3627fe90844421a6cbe7f upstream. Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE Superdome Flex). To understand which Socket the PCI uncore PMUs belongs to, perf retrieves the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI configuration space, and the mapping between Socket ID and Node ID from GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly. The local Node ID is only available at bit 2:0, but current code doesn't mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID will be fetched. Filter the Node ID by adding a mask. Reported-by: Song Liu Tested-by: Song Liu Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: # v3.7+ Fixes: 7c94ee2e0917 ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support") Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c index f0f4fcba252e..947579425861 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c @@ -1081,6 +1081,8 @@ static struct pci_driver snbep_uncore_pci_driver = { .id_table = snbep_uncore_pci_ids, }; +#define NODE_ID_MASK 0x7 + /* * build pci bus to socket mapping */ @@ -1102,7 +1104,7 @@ static int snbep_pci2phy_map_init(int devid) err = pci_read_config_dword(ubox_dev, 0x40, &config); if (err) break; - nodeid = config; + nodeid = config & NODE_ID_MASK; /* get the Node ID mapping */ err = pci_read_config_dword(ubox_dev, 0x54, &config); if (err) -- cgit v1.2.3 From 122c0149ad74375d2e740048298aa0b82342004f Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 31 Jan 2019 16:33:41 -0800 Subject: x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out() commit d28af26faa0b1daf3c692603d46bc4687c16f19e upstream. Internal injection testing crashed with a console log that said: mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134 This caused a lot of head scratching because the MCACOD (bits 15:0) of that status is a signature from an L1 data cache error. But Linux says that it found it in "Bank 0", which on this model CPU only reports L1 instruction cache errors. The answer was that Linux doesn't initialize "m->bank" in the case that it finds a fatal error in the mce_no_way_out() pre-scan of banks. If this was a local machine check, then this partially initialized struct mce is being passed to mce_panic(). Fix is simple: just initialize m->bank in the case of a fatal error. Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown source" message") Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Vishal Verma Cc: x86-ml Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 7b8c8c838191..77f7580e22c6 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -670,6 +670,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, } if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { + m->bank = i; *msg = tmp; ret = 1; } -- cgit v1.2.3 From d93fdf4464a0fedd4d797f27a3945092681f6e11 Mon Sep 17 00:00:00 2001 From: Vladimir Kondratiev Date: Wed, 6 Feb 2019 13:46:17 +0200 Subject: mips: cm: reprime error cause commit 05dc6001af0630e200ad5ea08707187fe5537e6d upstream. Accordingly to the documentation ---cut--- The GCR_ERROR_CAUSE.ERR_TYPE field and the GCR_ERROR_MULT.ERR_TYPE fields can be cleared by either a reset or by writing the current value of GCR_ERROR_CAUSE.ERR_TYPE to the GCR_ERROR_CAUSE.ERR_TYPE register. ---cut--- Do exactly this. Original value of cm_error may be safely written back; it clears error cause and keeps other bits untouched. Fixes: 3885c2b463f6 ("MIPS: CM: Add support for reporting CM cache errors") Signed-off-by: Vladimir Kondratiev Signed-off-by: Paul Burton Cc: Ralf Baechle Cc: James Hogan Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/mips-cm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/mips/kernel/mips-cm.c b/arch/mips/kernel/mips-cm.c index 1448c1f43d4e..76f18c56141c 100644 --- a/arch/mips/kernel/mips-cm.c +++ b/arch/mips/kernel/mips-cm.c @@ -424,5 +424,5 @@ void mips_cm_error_report(void) } /* reprime cause register */ - write_gcr_error_cause(0); + write_gcr_error_cause(cm_error); } -- cgit v1.2.3 From 91aaa0dd778be889777e53a1576c31bf249e0a5f Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Sun, 27 Jan 2019 23:28:33 +0200 Subject: MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled commit dcf300a69ac307053dfb35c2e33972e754a98bce upstream. Don't set octeon_dma_bar_type if PCI is disabled. This avoids creation of the MSI irqchip later on, and saves a bit of memory. Signed-off-by: Aaro Koskinen Signed-off-by: Paul Burton Fixes: a214720cbf50 ("Disable MSI also when pcie-octeon.pcie_disable on") Cc: stable@vger.kernel.org # v3.3+ Cc: linux-mips@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/mips/pci/pci-octeon.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c index c258cd406fbb..b36bbda31058 100644 --- a/arch/mips/pci/pci-octeon.c +++ b/arch/mips/pci/pci-octeon.c @@ -571,6 +571,11 @@ static int __init octeon_pci_setup(void) if (octeon_has_feature(OCTEON_FEATURE_PCIE)) return 0; + if (!octeon_is_pci_host()) { + pr_notice("Not in host mode, PCI Controller not initialized\n"); + return 0; + } + /* Point pcibios_map_irq() to the PCI version of it */ octeon_pcibios_map_irq = octeon_pci_pcibios_map_irq; @@ -582,11 +587,6 @@ static int __init octeon_pci_setup(void) else octeon_dma_bar_type = OCTEON_DMA_BAR_TYPE_BIG; - if (!octeon_is_pci_host()) { - pr_notice("Not in host mode, PCI Controller not initialized\n"); - return 0; - } - /* PCI I/O and PCI MEM values */ set_io_port_base(OCTEON_PCI_IOSPACE_BASE); ioport_resource.start = 0; -- cgit v1.2.3 From dfb3536268fc22706ff33d127557f78033b5b1cd Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Mon, 28 Jan 2019 23:16:22 +0000 Subject: MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds commit 67fc5dc8a541e8f458d7f08bf88ff55933bf9f9d upstream. When generating vdso-o32.lds & vdso-n32.lds for use with programs running as compat ABIs under 64b kernels, we previously haven't included the compiler flags that are supposedly common to all ABIs - ie. those in the ccflags-vdso variable. This is problematic in cases where we need to provide the -m%-float flag in order to ensure that we don't attempt to use a floating point ABI that's incompatible with the target CPU & ABI. For example a toolchain using current gcc trunk configured --with-fp-32=xx fails to build a 64r6el_defconfig kernel with the following error: cc1: error: '-march=mips1' requires '-mfp32' make[2]: *** [arch/mips/vdso/Makefile:135: arch/mips/vdso/vdso-o32.lds] Error 1 Include $(ccflags-vdso) for the compat VDSO .lds builds, just as it is included for the native VDSO .lds & when compiling objects for the compat VDSOs. This ensures we consistently provide the -msoft-float flag amongst others, avoiding the problem by ensuring we're agnostic to the toolchain defaults. Signed-off-by: Paul Burton Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: linux-mips@vger.kernel.org Cc: Kevin Hilman Cc: Guenter Roeck Cc: Maciej W . Rozycki Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile index 6c7d78546eee..886005b1e87d 100644 --- a/arch/mips/vdso/Makefile +++ b/arch/mips/vdso/Makefile @@ -107,7 +107,7 @@ $(obj)/%-o32.o: $(src)/%.c FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) -$(obj)/vdso-o32.lds: KBUILD_CPPFLAGS := -mabi=32 +$(obj)/vdso-o32.lds: KBUILD_CPPFLAGS := $(ccflags-vdso) -mabi=32 $(obj)/vdso-o32.lds: $(src)/vdso.lds.S FORCE $(call if_changed_dep,cpp_lds_S) @@ -143,7 +143,7 @@ $(obj)/%-n32.o: $(src)/%.c FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) -$(obj)/vdso-n32.lds: KBUILD_CPPFLAGS := -mabi=n32 +$(obj)/vdso-n32.lds: KBUILD_CPPFLAGS := $(ccflags-vdso) -mabi=n32 $(obj)/vdso-n32.lds: $(src)/vdso.lds.S FORCE $(call if_changed_dep,cpp_lds_S) -- cgit v1.2.3 From fd9d0553fba3d24fc3ba14012489de148a368e3b Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 25 Jan 2019 20:10:15 +0000 Subject: ARM: iop32x/n2100: fix PCI IRQ mapping commit db4090920ba2d61a5827a23e441447926a02ffee upstream. Booting 4.20 on a TheCUS N2100 results in a kernel oops while probing PCI, due to n2100_pci_map_irq() having been discarded during boot. Signed-off-by: Russell King Cc: stable@vger.kernel.org # 2.6.18+ Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-iop32x/n2100.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-iop32x/n2100.c b/arch/arm/mach-iop32x/n2100.c index c1cd80ecc219..a904244264ce 100644 --- a/arch/arm/mach-iop32x/n2100.c +++ b/arch/arm/mach-iop32x/n2100.c @@ -75,8 +75,7 @@ void __init n2100_map_io(void) /* * N2100 PCI. */ -static int __init -n2100_pci_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) +static int n2100_pci_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) { int irq; -- cgit v1.2.3 From 20ad50464b74dd2fc066f32cc4cbe2a3ac3673df Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Wed, 19 Dec 2018 13:47:24 +0200 Subject: ARM: dts: da850-evm: Correct the sound card name [ Upstream commit 7fca69d4e43fa1ae9cb4f652772c132dc5a659c6 ] To avoid the following error: asoc-simple-card sound: ASoC: Failed to create card debugfs directory Which is because the card name contains '/' character, which can not be used in file or directory names. Signed-off-by: Peter Ujfalusi Signed-off-by: Sekhar Nori Signed-off-by: Sasha Levin --- arch/arm/boot/dts/da850-evm.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts index 6881757b03e8..67369f284b91 100644 --- a/arch/arm/boot/dts/da850-evm.dts +++ b/arch/arm/boot/dts/da850-evm.dts @@ -147,7 +147,7 @@ sound { compatible = "simple-audio-card"; - simple-audio-card,name = "DA850/OMAP-L138 EVM"; + simple-audio-card,name = "DA850-OMAPL138 EVM"; simple-audio-card,widgets = "Line", "Line In", "Line", "Line Out"; -- cgit v1.2.3 From 32f04710885113bc1124380e0764372dceb0144a Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 8 Jan 2019 00:08:18 +0100 Subject: ARM: dts: kirkwood: Fix polarity of GPIO fan lines [ Upstream commit b5f034845e70916fd33e172fad5ad530a29c10ab ] These two lines are active high, not active low. The bug was found when we changed the kernel to respect the polarity defined in the device tree. Fixes: 1b90e06b1429 ("ARM: kirkwood: Use devicetree to define DNS-32[05] fan") Cc: Jamie Lentin Cc: Guenter Roeck Cc: Jason Cooper Cc: Andrew Lunn Cc: Gregory Clement Cc: Sebastian Hesselbarth Cc: Julien D'Ascenzio Reviewed-by: Andrew Lunn Tested-by: Jamie Lentin Reported-by: Julien D'Ascenzio Tested-by: Julien D'Ascenzio Signed-off-by: Linus Walleij Signed-off-by: Gregory CLEMENT Signed-off-by: Sasha Levin --- arch/arm/boot/dts/kirkwood-dnskw.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/kirkwood-dnskw.dtsi b/arch/arm/boot/dts/kirkwood-dnskw.dtsi index 113dcf056dcf..1b2dacfa6132 100644 --- a/arch/arm/boot/dts/kirkwood-dnskw.dtsi +++ b/arch/arm/boot/dts/kirkwood-dnskw.dtsi @@ -35,8 +35,8 @@ compatible = "gpio-fan"; pinctrl-0 = <&pmx_fan_high_speed &pmx_fan_low_speed>; pinctrl-names = "default"; - gpios = <&gpio1 14 GPIO_ACTIVE_LOW - &gpio1 13 GPIO_ACTIVE_LOW>; + gpios = <&gpio1 14 GPIO_ACTIVE_HIGH + &gpio1 13 GPIO_ACTIVE_HIGH>; gpio-fan,speed-map = <0 0 3000 1 6000 2>; -- cgit v1.2.3 From 8347670127511f031d2e3503ae00a25d1956f19b Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Sat, 1 Dec 2018 12:57:18 +0100 Subject: gpio: pl061: handle failed allocations [ Upstream commit df209c43a0e8258e096fb722dfbdae4f0dd13fde ] devm_kzalloc(), devm_kstrdup() and devm_kasprintf() all can fail internal allocation and return NULL. Using any of the assigned objects without checking is not safe. As this is early in the boot phase and these allocations really should not fail, any failure here is probably an indication of a more serious issue so it makes little sense to try and rollback the previous allocated resources or try to continue; but rather the probe function is simply exited with -ENOMEM. Signed-off-by: Nicholas Mc Guire Fixes: 684284b64aae ("ARM: integrator: add MMCI device to IM-PD1") Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- arch/arm/mach-integrator/impd1.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-integrator/impd1.c b/arch/arm/mach-integrator/impd1.c index 38b0da300dd5..423a88ff908c 100644 --- a/arch/arm/mach-integrator/impd1.c +++ b/arch/arm/mach-integrator/impd1.c @@ -394,7 +394,11 @@ static int __init_refok impd1_probe(struct lm_device *dev) sizeof(*lookup) + 3 * sizeof(struct gpiod_lookup), GFP_KERNEL); chipname = devm_kstrdup(&dev->dev, devname, GFP_KERNEL); - mmciname = kasprintf(GFP_KERNEL, "lm%x:00700", dev->id); + mmciname = devm_kasprintf(&dev->dev, GFP_KERNEL, + "lm%x:00700", dev->id); + if (!lookup || !chipname || !mmciname) + return -ENOMEM; + lookup->dev_id = mmciname; /* * Offsets on GPIO block 1: -- cgit v1.2.3 From 1d8b20304de194cc426aa015001340ff54598906 Mon Sep 17 00:00:00 2001 From: Sergei Trofimovich Date: Mon, 31 Dec 2018 11:53:55 +0000 Subject: alpha: fix page fault handling for r16-r18 targets commit 491af60ffb848b59e82f7c9145833222e0bf27a5 upstream. Fix page fault handling code to fixup r16-r18 registers. Before the patch code had off-by-two registers bug. This bug caused overwriting of ps,pc,gp registers instead of fixing intended r16,r17,r18 (see `struct pt_regs`). More details: Initially Dmitry noticed a kernel bug as a failure on strace test suite. Test passes unmapped userspace pointer to io_submit: ```c #include #include #include #include int main(void) { unsigned long ctx = 0; if (syscall(__NR_io_setup, 1, &ctx)) err(1, "io_setup"); const size_t page_size = sysconf(_SC_PAGESIZE); const size_t size = page_size * 2; void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (MAP_FAILED == ptr) err(1, "mmap(%zu)", size); if (munmap(ptr, size)) err(1, "munmap"); syscall(__NR_io_submit, ctx, 1, ptr + page_size); syscall(__NR_io_destroy, ctx); return 0; } ``` Running this test causes kernel to crash when handling page fault: ``` Unable to handle kernel paging request at virtual address ffffffffffff9468 CPU 3 aio(26027): Oops 0 pc = [] ra = [] ps = 0000 Not tainted pc is at sys_io_submit+0x108/0x200 ra is at sys_io_submit+0x6c/0x200 v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000 t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0 t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000 s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000 s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2 s6 = fffffc00c58e6300 a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000 a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001 t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440 t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000 gp = 0000000000000000 sp = 00000000265fd174 Disabling lock debugging due to kernel taint Trace: [] entSys+0xa4/0xc0 ``` Here `gp` has invalid value. `gp is s overwritten by a fixup for the following page fault handler in `io_submit` syscall handler: ``` __se_sys_io_submit ... ldq a1,0(t1) bne t0,4280 <__se_sys_io_submit+0x180> ``` After a page fault `t0` should contain -EFALUT and `a1` is 0. Instead `gp` was overwritten in place of `a1`. This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18` (aka `a0-a2`). I think the bug went unnoticed for a long time as `gp` is one of scratch registers. Any kernel function call would re-calculate `gp`. Dmitry tracked down the bug origin back to 2.1.32 kernel version where trap_a{0,1,2} fields were inserted into struct pt_regs. And even before that `dpf_reg()` contained off-by-one error. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reported-and-reviewed-by: "Dmitry V. Levin" Cc: stable@vger.kernel.org # v2.1.32+ Bug: https://bugs.gentoo.org/672040 Signed-off-by: Sergei Trofimovich Signed-off-by: Matt Turner Signed-off-by: Greg Kroah-Hartman --- arch/alpha/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 4a905bd667e2..0f68f0de9b5e 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -77,7 +77,7 @@ __load_new_mm_context(struct mm_struct *next_mm) /* Macro for exception fixup code to access integer registers. */ #define dpf_reg(r) \ (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \ - (r) <= 18 ? (r)+8 : (r)-10]) + (r) <= 18 ? (r)+10 : (r)-10]) asmlinkage void do_page_fault(unsigned long address, unsigned long mmcsr, -- cgit v1.2.3 From 5fc9518605a99a63ca35d2b3ab61112b9ea45161 Mon Sep 17 00:00:00 2001 From: Meelis Roos Date: Fri, 12 Oct 2018 12:27:51 +0300 Subject: alpha: Fix Eiger NR_IRQS to 128 commit bfc913682464f45bc4d6044084e370f9048de9d5 upstream. Eiger machine vector definition has nr_irqs 128, and working 2.6.26 boot shows SCSI getting IRQ-s 64 and 65. Current kernel boot fails because Symbios SCSI fails to request IRQ-s and does not find the disks. It has been broken at least since 3.18 - the earliest I could test with my gcc-5. The headers have moved around and possibly another order of defines has worked in the past - but since 128 seems to be correct and used, fix arch/alpha/include/asm/irq.h to have NR_IRQS=128 for Eiger. This fixes 4.19-rc7 boot on my Force Flexor A264 (Eiger subarch). Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Meelis Roos Signed-off-by: Matt Turner Signed-off-by: Greg Kroah-Hartman --- arch/alpha/include/asm/irq.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/alpha/include/asm/irq.h b/arch/alpha/include/asm/irq.h index 06377400dc09..469642801a68 100644 --- a/arch/alpha/include/asm/irq.h +++ b/arch/alpha/include/asm/irq.h @@ -55,15 +55,15 @@ #elif defined(CONFIG_ALPHA_DP264) || \ defined(CONFIG_ALPHA_LYNX) || \ - defined(CONFIG_ALPHA_SHARK) || \ - defined(CONFIG_ALPHA_EIGER) + defined(CONFIG_ALPHA_SHARK) # define NR_IRQS 64 #elif defined(CONFIG_ALPHA_TITAN) #define NR_IRQS 80 #elif defined(CONFIG_ALPHA_RAWHIDE) || \ - defined(CONFIG_ALPHA_TAKARA) + defined(CONFIG_ALPHA_TAKARA) || \ + defined(CONFIG_ALPHA_EIGER) # define NR_IRQS 128 #elif defined(CONFIG_ALPHA_WILDFIRE) -- cgit v1.2.3 From 7212e37cbdf99f48e4a6c689a42f4bda1ae69001 Mon Sep 17 00:00:00 2001 From: Hedi Berriche Date: Wed, 13 Feb 2019 19:34:13 +0000 Subject: x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls commit f331e766c4be33f4338574f3c9f7f77e98ab4571 upstream. Calls into UV firmware must be protected against concurrency, expose the efi_runtime_lock to the UV platform, and use it to serialise UV BIOS calls. Signed-off-by: Hedi Berriche Signed-off-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Russ Anderson Reviewed-by: Dimitri Sivanich Reviewed-by: Mike Travis Cc: Andy Shevchenko Cc: Bhupesh Sharma Cc: Darren Hart Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-x86@vger.kernel.org Cc: stable@vger.kernel.org # v4.9+ Cc: Steve Wahl Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190213193413.25560-5-hedi.berriche@hpe.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uv/bios.h | 8 +++++++- arch/x86/platform/uv/bios_uv.c | 23 +++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 71605c7d5c5c..8b7594f2d48f 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR, }; /* @@ -111,4 +112,9 @@ extern long system_serial_number; extern struct kobject *sgi_uv_kobj; /* /sys/firmware/sgi_uv */ +/* + * EFI runtime lock; cf. firmware/efi/runtime-wrappers.c for details + */ +extern struct semaphore __efi_uv_runtime_lock; + #endif /* _ASM_X86_UV_BIOS_H */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 1584cbed0dce..a45a1c5aabea 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -28,7 +28,8 @@ static struct uv_systab uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) { struct uv_systab *tab = &uv_systab; s64 ret; @@ -43,6 +44,19 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) a1, a2, a3, a4, a5); return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(&__efi_uv_runtime_lock); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -51,10 +65,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(&__efi_uv_runtime_lock); + return ret; } -- cgit v1.2.3 From 5079b1d1aac92afd534e6a6f6a127817c0a96a52 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Tue, 12 Feb 2019 14:28:03 +0100 Subject: x86/a.out: Clear the dump structure initially commit 10970e1b4be9c74fce8ab6e3c34a7d718f063f2c upstream. dump_thread32() in aout_core_dump() does not clear the user32 structure allocated on the stack as the first thing on function entry. As a result, the dump.u_comm, dump.u_ar0 and dump.signal which get assigned before the clearing, get overwritten. Rename that function to fill_dump() to make it clear what it does and call it first thing. This was caught while staring at a patch by Derek Robson . Signed-off-by: Borislav Petkov Cc: Derek Robson Cc: Linus Torvalds Cc: Michael Matz Cc: x86@kernel.org Cc: Link: https://lkml.kernel.org/r/20190202005512.3144-1-robsonde@gmail.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/ia32/ia32_aout.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c index ae6aad1d24f7..b348c4641312 100644 --- a/arch/x86/ia32/ia32_aout.c +++ b/arch/x86/ia32/ia32_aout.c @@ -50,7 +50,7 @@ static unsigned long get_dr(int n) /* * fill in the user structure for a core dump.. */ -static void dump_thread32(struct pt_regs *regs, struct user32 *dump) +static void fill_dump(struct pt_regs *regs, struct user32 *dump) { u32 fs, gs; memset(dump, 0, sizeof(*dump)); @@ -156,10 +156,12 @@ static int aout_core_dump(struct coredump_params *cprm) fs = get_fs(); set_fs(KERNEL_DS); has_dumped = 1; + + fill_dump(cprm->regs, &dump); + strncpy(dump.u_comm, current->comm, sizeof(current->comm)); dump.u_ar0 = offsetof(struct user32, regs); dump.signal = cprm->siginfo->si_signo; - dump_thread32(cprm->regs, &dump); /* * If the size of the dump file exceeds the rlimit, then see -- cgit v1.2.3 From f5aebe74113f488cad5444ab66f28c08d9c0a39a Mon Sep 17 00:00:00 2001 From: "chenzefeng (A)" Date: Wed, 20 Feb 2019 12:37:54 +0000 Subject: x86: livepatch: Treat R_X86_64_PLT32 as R_X86_64_PC32 Signed-off-by: chenzefeng On x86-64, for 32-bit PC-relacive branches, we can generate PLT32 relocation, instead of PC32 relocation. and R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since linux kernel doesn't use PLT. commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as R_X86_64_PC32") been fixed for the module loading, but not fixed for livepatch relocation, which will fail to load livepatch with the error message as follow: relocation failed for symbol at This issue only effacted the kernel version from 4.0 to 4.6, becauce the function klp_write_module_reloc is introduced by: commit b700e7f03df5 ("livepatch: kernel: add support for live patching") and deleted by: commit 425595a7fc20 ("livepatch: reuse module loader code to write relocations") Signed-off-by: chenzefeng Reviewed-by: Petr Mladek Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/livepatch.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index d1d35ccffed3..579f8f813ce0 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -58,6 +58,7 @@ int klp_write_module_reloc(struct module *mod, unsigned long type, val = (s32)value; break; case R_X86_64_PC32: + case R_X86_64_PLT32: val = (u32)(value - loc); break; default: -- cgit v1.2.3 From 49e1a9d1169adcffee99967130ff6d3b79d5fbe6 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Thu, 21 Feb 2019 14:52:13 +0100 Subject: KVM: VMX: Fix x2apic check in vmx_msr_bitmap_mode() The stable backport of upstream commit 904e14fb7cb96 KVM: VMX: make MSR bitmaps per-VCPU has a bug in vmx_msr_bitmap_mode(). It enables the x2apic MSR-bitmap when the kernel emulates x2apic for the guest in software. The upstream version of the commit checkes whether the hardware has virtualization enabled for x2apic emulation. Since KVM emulates x2apic for guests even when the host does not support x2apic in hardware, this causes the intercept of at least the X2APIC_TASKPRI MSR to be disabled on machines not supporting that MSR. The result is undefined behavior, on some machines (Intel Westmere based) it causes a crash of the guest kernel when it tries to access that MSR. Change the check in vmx_msr_bitmap_mode() to match the upstream code. This fixes the guest crashes observed with stable kernels starting with v4.4.168 through v4.4.175. Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aee2886a387c..14553f6c03a6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4628,7 +4628,9 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) { u8 mode = 0; - if (irqchip_in_kernel(vcpu->kvm) && apic_x2apic_mode(vcpu->arch.apic)) { + if (cpu_has_secondary_exec_ctrls() && + (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { mode |= MSR_BITMAP_MODE_X2APIC; if (enable_apicv) mode |= MSR_BITMAP_MODE_X2APIC_APICV; -- cgit v1.2.3 From 5647975ec2b613f1d4e778df37accb9f469e8c36 Mon Sep 17 00:00:00 2001 From: Alban Bedel Date: Mon, 7 Jan 2019 20:45:15 +0100 Subject: MIPS: ath79: Enable OF serial ports in the default config [ Upstream commit 565dc8a4f55e491935bfb04866068d21784ea9a4 ] CONFIG_SERIAL_OF_PLATFORM is needed to get a working console on the OF boards, enable it in the default config to get a working setup out of the box. Signed-off-by: Alban Bedel Signed-off-by: Paul Burton Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle Cc: James Hogan Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin --- arch/mips/configs/ath79_defconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/mips/configs/ath79_defconfig b/arch/mips/configs/ath79_defconfig index 134879c1310a..4ed369c0ec6a 100644 --- a/arch/mips/configs/ath79_defconfig +++ b/arch/mips/configs/ath79_defconfig @@ -74,6 +74,7 @@ CONFIG_SERIAL_8250_CONSOLE=y # CONFIG_SERIAL_8250_PCI is not set CONFIG_SERIAL_8250_NR_UARTS=1 CONFIG_SERIAL_8250_RUNTIME_UARTS=1 +CONFIG_SERIAL_OF_PLATFORM=y CONFIG_SERIAL_AR933X=y CONFIG_SERIAL_AR933X_CONSOLE=y # CONFIG_HW_RANDOM is not set -- cgit v1.2.3 From 2b285e446056b0dd37bce8058bbdf292a44fe83b Mon Sep 17 00:00:00 2001 From: Thomas Bogendoerfer Date: Wed, 9 Jan 2019 18:12:16 +0100 Subject: MIPS: jazz: fix 64bit build [ Upstream commit 41af167fbc0032f9d7562854f58114eaa9270336 ] 64bit JAZZ builds failed with linux-next/arch/mips/jazz/jazzdma.c: In function `vdma_init`: /linux-next/arch/mips/jazz/jazzdma.c:77:30: error: implicit declaration of function `KSEG1ADDR`; did you mean `CKSEG1ADDR`? [-Werror=implicit-function-declaration] pgtbl = (VDMA_PGTBL_ENTRY *)KSEG1ADDR(pgtbl); ^~~~~~~~~ CKSEG1ADDR /linux-next/arch/mips/jazz/jazzdma.c:77:10: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] pgtbl = (VDMA_PGTBL_ENTRY *)KSEG1ADDR(pgtbl); ^ In file included from /linux-next/arch/mips/include/asm/barrier.h:11:0, from /linux-next/include/linux/compiler.h:248, from /linux-next/include/linux/kernel.h:10, from /linux-next/arch/mips/jazz/jazzdma.c:11: /linux-next/arch/mips/include/asm/addrspace.h:41:29: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] #define _ACAST32_ (_ATYPE_)(_ATYPE32_) /* widen if necessary */ ^ /linux-next/arch/mips/include/asm/addrspace.h:53:25: note: in expansion of macro `_ACAST32_` #define CPHYSADDR(a) ((_ACAST32_(a)) & 0x1fffffff) ^~~~~~~~~ /linux-next/arch/mips/jazz/jazzdma.c:84:44: note: in expansion of macro `CPHYSADDR` r4030_write_reg32(JAZZ_R4030_TRSTBL_BASE, CPHYSADDR(pgtbl)); Using correct casts and CKSEG1ADDR when dealing with the pgtbl setup fixes this. Signed-off-by: Thomas Bogendoerfer Signed-off-by: Paul Burton Cc: Ralf Baechle Cc: James Hogan Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin --- arch/mips/jazz/jazzdma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index db6f5afff4ff..ea897912bc71 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -71,14 +71,15 @@ static int __init vdma_init(void) get_order(VDMA_PGTBL_SIZE)); BUG_ON(!pgtbl); dma_cache_wback_inv((unsigned long)pgtbl, VDMA_PGTBL_SIZE); - pgtbl = (VDMA_PGTBL_ENTRY *)KSEG1ADDR(pgtbl); + pgtbl = (VDMA_PGTBL_ENTRY *)CKSEG1ADDR((unsigned long)pgtbl); /* * Clear the R4030 translation table */ vdma_pgtbl_init(); - r4030_write_reg32(JAZZ_R4030_TRSTBL_BASE, CPHYSADDR(pgtbl)); + r4030_write_reg32(JAZZ_R4030_TRSTBL_BASE, + CPHYSADDR((unsigned long)pgtbl)); r4030_write_reg32(JAZZ_R4030_TRSTBL_LIM, VDMA_PGTBL_SIZE); r4030_write_reg32(JAZZ_R4030_TRSTBL_INV, 0); -- cgit v1.2.3 From 2c2433eba19a9762ff8d151777f6fd5774b5bff4 Mon Sep 17 00:00:00 2001 From: Eugeniy Paltsev Date: Wed, 16 Jan 2019 14:29:50 +0300 Subject: ARCv2: Enable unaligned access in early ASM code commit 252f6e8eae909bc075a1b1e3b9efb095ae4c0b56 upstream. It is currently done in arc_init_IRQ() which might be too late considering gcc 7.3.1 onwards (GNU 2018.03) generates unaligned memory accesses by default Cc: stable@vger.kernel.org #4.4+ Signed-off-by: Eugeniy Paltsev Signed-off-by: Vineet Gupta [vgupta: rewrote changelog] Signed-off-by: Greg Kroah-Hartman --- arch/arc/kernel/head.S | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'arch') diff --git a/arch/arc/kernel/head.S b/arch/arc/kernel/head.S index 689dd867fdff..cd64cb4ef7b0 100644 --- a/arch/arc/kernel/head.S +++ b/arch/arc/kernel/head.S @@ -17,6 +17,7 @@ #include #include #include +#include .macro CPU_EARLY_SETUP @@ -47,6 +48,15 @@ sr r5, [ARC_REG_DC_CTRL] 1: + +#ifdef CONFIG_ISA_ARCV2 + ; Unaligned access is disabled at reset, so re-enable early as + ; gcc 7.3.1 (ARC GNU 2018.03) onwards generates unaligned access + ; by default + lr r5, [status32] + bset r5, r5, STATUS_AD_BIT + kflag r5 +#endif .endm .section .init.text, "ax",@progbits -- cgit v1.2.3 From e120f9d8596ea0a458875fdd16e8823adadd7e66 Mon Sep 17 00:00:00 2001 From: Eugeniy Paltsev Date: Thu, 13 Dec 2018 18:42:57 +0300 Subject: ARC: fix __ffs return value to avoid build warnings [ Upstream commit 4e868f8419cb4cb558c5d428e7ab5629cef864c7 ] | CC mm/nobootmem.o |In file included from ./include/asm-generic/bug.h:18:0, | from ./arch/arc/include/asm/bug.h:32, | from ./include/linux/bug.h:5, | from ./include/linux/mmdebug.h:5, | from ./include/linux/gfp.h:5, | from ./include/linux/slab.h:15, | from mm/nobootmem.c:14: |mm/nobootmem.c: In function '__free_pages_memory': |./include/linux/kernel.h:845:29: warning: comparison of distinct pointer types lacks a cast | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ |./include/linux/kernel.h:859:4: note: in expansion of macro '__typecheck' | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ |./include/linux/kernel.h:869:24: note: in expansion of macro '__safe_cmp' | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ |./include/linux/kernel.h:878:19: note: in expansion of macro '__careful_cmp' | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ |mm/nobootmem.c:104:11: note: in expansion of macro 'min' | order = min(MAX_ORDER - 1UL, __ffs(start)); Change __ffs return value from 'int' to 'unsigned long' as it is done in other implementations (like asm-generic, x86, etc...) to avoid build-time warnings in places where type is strictly checked. As __ffs may return values in [0-31] interval changing return type to unsigned is valid. Signed-off-by: Eugeniy Paltsev Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin --- arch/arc/include/asm/bitops.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h index 0352fb8d21b9..9623ae002f5b 100644 --- a/arch/arc/include/asm/bitops.h +++ b/arch/arc/include/asm/bitops.h @@ -286,7 +286,7 @@ static inline __attribute__ ((const)) int __fls(unsigned long x) /* * __ffs: Similar to ffs, but zero based (0-31) */ -static inline __attribute__ ((const)) int __ffs(unsigned long word) +static inline __attribute__ ((const)) unsigned long __ffs(unsigned long word) { if (!word) return word; @@ -346,9 +346,9 @@ static inline __attribute__ ((const)) int ffs(unsigned long x) /* * __ffs: Similar to ffs, but zero based (0-31) */ -static inline __attribute__ ((const)) int __ffs(unsigned long x) +static inline __attribute__ ((const)) unsigned long __ffs(unsigned long x) { - int n; + unsigned long n; asm volatile( " ffs.f %0, %1 \n" /* 0:31; 31(Z) if src 0 */ -- cgit v1.2.3 From 37131ae9135c048494f1424dea064488ac1ffd04 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 7 Jan 2019 19:44:51 +0100 Subject: KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 [ Upstream commit 619ad846fc3452adaf71ca246c5aa711e2055398 ] kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being delivered to the host (L1) when it's running nested. The problem seems to be: svm_complete_interrupts() raises 'nmi_injected' flag but later we decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI injection upon entry so it got delivered to L1 instead of L2. It seems that VMX code solves the same issue in prepare_vmcs12(), this was introduced with code refactoring in commit 5f3d5799974b ("KVM: nVMX: Rework event injection and recovery"). Signed-off-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin --- arch/x86/kvm/svm.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 7ce1a19d9d8b..acbde1249b6f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2388,6 +2388,14 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) kvm_mmu_reset_context(&svm->vcpu); kvm_mmu_load(&svm->vcpu); + /* + * Drop what we picked up for L2 via svm_complete_interrupts() so it + * doesn't end up in L1. + */ + svm->vcpu.arch.nmi_injected = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); + return 0; } -- cgit v1.2.3 From be96dcc315c75f371774739a34c43c213f177c80 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 15 Feb 2016 17:04:04 +0000 Subject: arm/arm64: KVM: Feed initialized memory to MMIO accesses commit 1d6a821277aaa0cdd666278aaff93298df313d41 upstream. On an MMIO access, we always copy the on-stack buffer info the shared "run" structure, even if this is a read access. This ends up leaking up to 8 bytes of uninitialized memory into userspace, depending on the size of the access. An obvious fix for this one is to only perform the copy if this is an actual write. Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 885cd0e0015b..0b9d152b38c8 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -207,7 +207,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, run->mmio.is_write = is_write; run->mmio.phys_addr = fault_ipa; run->mmio.len = len; - memcpy(run->mmio.data, data_buf, len); + if (is_write) + memcpy(run->mmio.data, data_buf, len); if (!ret) { /* We handled the access successfully in the kernel. */ -- cgit v1.2.3 From 05de33f10001bc617e45110a4815273c245ac5b2 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Tue, 29 Mar 2016 14:29:28 +0200 Subject: KVM: arm/arm64: Fix MMIO emulation data handling commit 83091db981e105d97562d3ed3ffe676e21927e3a upstream. When the kernel was handling a guest MMIO read access internally, we need to copy the emulation result into the run->mmio structure in order for the kvm_handle_mmio_return() function to pick it up and inject the result back into the guest. Currently the only user of kvm_io_bus for ARM is the VGIC, which did this copying itself, so this was not causing issues so far. But with the upcoming new vgic implementation we need this done properly. Update the kvm_handle_mmio_return description and cleanup the code to only perform a single copying when needed. Code and commit message inspired by Andre Przywara. Reported-by: Andre Przywara Signed-off-by: Christoffer Dall Signed-off-by: Andre Przywara Reviewed-by: Marc Zyngier Reviewed-by: Andre Przywara Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmio.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 0b9d152b38c8..ae61e2ea7255 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -87,11 +87,10 @@ static unsigned long mmio_read_buf(char *buf, unsigned int len) /** * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation + * or in-kernel IO emulation + * * @vcpu: The VCPU pointer * @run: The VCPU run struct containing the mmio data - * - * This should only be called after returning from userspace for MMIO load - * emulation. */ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) { @@ -207,15 +206,17 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, run->mmio.is_write = is_write; run->mmio.phys_addr = fault_ipa; run->mmio.len = len; - if (is_write) - memcpy(run->mmio.data, data_buf, len); if (!ret) { /* We handled the access successfully in the kernel. */ + if (!is_write) + memcpy(run->mmio.data, data_buf, len); kvm_handle_mmio_return(vcpu, run); return 1; } + if (is_write) + memcpy(run->mmio.data, data_buf, len); run->exit_reason = KVM_EXIT_MMIO; return 0; } -- cgit v1.2.3 From 1a8ccbf263d63e22bdee411c1ac8a70393c2fe82 Mon Sep 17 00:00:00 2001 From: Seth Forshee Date: Thu, 28 Sep 2017 09:33:39 -0400 Subject: powerpc: Always initialize input array when calling epapr_hypercall() commit 186b8f1587c79c2fa04bfa392fdf084443e398c1 upstream. Several callers to epapr_hypercall() pass an uninitialized stack allocated array for the input arguments, presumably because they have no input arguments. However this can produce errors like this one arch/powerpc/include/asm/epapr_hcalls.h:470:42: error: 'in' may be used uninitialized in this function [-Werror=maybe-uninitialized] unsigned long register r3 asm("r3") = in[0]; ~~^~~ Fix callers to this function to always zero-initialize the input arguments array to prevent this. Signed-off-by: Seth Forshee Signed-off-by: Michael Ellerman Cc: "A. Wilcox" Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/epapr_hcalls.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/epapr_hcalls.h b/arch/powerpc/include/asm/epapr_hcalls.h index 334459ad145b..90863245df53 100644 --- a/arch/powerpc/include/asm/epapr_hcalls.h +++ b/arch/powerpc/include/asm/epapr_hcalls.h @@ -508,7 +508,7 @@ static unsigned long epapr_hypercall(unsigned long *in, static inline long epapr_hypercall0_1(unsigned int nr, unsigned long *r2) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; unsigned long r; @@ -520,7 +520,7 @@ static inline long epapr_hypercall0_1(unsigned int nr, unsigned long *r2) static inline long epapr_hypercall0(unsigned int nr) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; return epapr_hypercall(in, out, nr); @@ -528,7 +528,7 @@ static inline long epapr_hypercall0(unsigned int nr) static inline long epapr_hypercall1(unsigned int nr, unsigned long p1) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -538,7 +538,7 @@ static inline long epapr_hypercall1(unsigned int nr, unsigned long p1) static inline long epapr_hypercall2(unsigned int nr, unsigned long p1, unsigned long p2) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -549,7 +549,7 @@ static inline long epapr_hypercall2(unsigned int nr, unsigned long p1, static inline long epapr_hypercall3(unsigned int nr, unsigned long p1, unsigned long p2, unsigned long p3) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -562,7 +562,7 @@ static inline long epapr_hypercall4(unsigned int nr, unsigned long p1, unsigned long p2, unsigned long p3, unsigned long p4) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; -- cgit v1.2.3 From e90171edbef7b43e3eab51906f5f499f131a2887 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Fri, 22 Feb 2019 17:17:04 -0800 Subject: x86/uaccess: Don't leak the AC flag into __put_user() value evaluation commit 2a418cf3f5f1caf911af288e978d61c9844b0695 upstream. When calling __put_user(foo(), ptr), the __put_user() macro would call foo() in between __uaccess_begin() and __uaccess_end(). If that code were buggy, then those bugs would be run without SMAP protection. Fortunately, there seem to be few instances of the problem in the kernel. Nevertheless, __put_user() should be fixed to avoid doing this. Therefore, evaluate __put_user()'s argument before setting AC. This issue was noticed when an objtool hack by Peter Zijlstra complained about genregs_get() and I compared the assembly output to the C source. [ bp: Massage commit message and fixed up whitespace. ] Fixes: 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses") Signed-off-by: Andy Lutomirski Signed-off-by: Borislav Petkov Acked-by: Linus Torvalds Cc: Peter Zijlstra Cc: Brian Gerst Cc: Josh Poimboeuf Cc: Denys Vlasenko Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/uaccess.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 6f8eadf0681f..ac6932bf1a01 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -314,8 +314,7 @@ do { \ __put_user_asm(x, ptr, retval, "l", "k", "ir", errret); \ break; \ case 8: \ - __put_user_asm_u64((__typeof__(*ptr))(x), ptr, retval, \ - errret); \ + __put_user_asm_u64(x, ptr, retval, errret); \ break; \ default: \ __put_user_bad(); \ @@ -426,8 +425,10 @@ do { \ #define __put_user_nocheck(x, ptr, size) \ ({ \ int __pu_err; \ + __typeof__(*(ptr)) __pu_val; \ + __pu_val = x; \ __uaccess_begin(); \ - __put_user_size((x), (ptr), (size), __pu_err, -EFAULT); \ + __put_user_size(__pu_val, (ptr), (size), __pu_err, -EFAULT);\ __uaccess_end(); \ __builtin_expect(__pu_err, 0); \ }) -- cgit v1.2.3 From 5d58d8969037205101f7e4bdd6ed06b525f05b7e Mon Sep 17 00:00:00 2001 From: Jiaxun Yang Date: Tue, 20 Nov 2018 11:00:18 +0800 Subject: x86/CPU/AMD: Set the CPB bit unconditionally on F17h commit 0237199186e7a4aa5310741f0a6498a20c820fd7 upstream. Some F17h models do not have CPB set in CPUID even though the CPU supports it. Set the feature bit unconditionally on all F17h. [ bp: Rewrite commit message and patch. ] Signed-off-by: Jiaxun Yang Signed-off-by: Borislav Petkov Acked-by: Tom Lendacky Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Sherry Hurwitz Cc: Suravee Suthikulpanit Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20181120030018.5185-1-jiaxun.yang@flygoat.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 9f6151884249..e94e6f16172b 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -716,11 +716,9 @@ static void init_amd_bd(struct cpuinfo_x86 *c) static void init_amd_zn(struct cpuinfo_x86 *c) { set_cpu_cap(c, X86_FEATURE_ZEN); - /* - * Fix erratum 1076: CPB feature bit not being set in CPUID. It affects - * all up to and including B1. - */ - if (c->x86_model <= 1 && c->x86_mask <= 1) + + /* Fix erratum 1076: CPB feature bit not being set in CPUID. */ + if (!cpu_has(c, X86_FEATURE_CPB)) set_cpu_cap(c, X86_FEATURE_CPB); } -- cgit v1.2.3 From 5b98f0928666b8da3ff4a5fc412c55fb092e37d5 Mon Sep 17 00:00:00 2001 From: Liu Xiang Date: Sat, 16 Feb 2019 17:12:24 +0800 Subject: MIPS: irq: Allocate accurate order pages for irq stack commit 72faa7a773ca59336f3c889e878de81445c5a85c upstream. The irq_pages is the number of pages for irq stack, but not the order which is needed by __get_free_pages(). We can use get_order() to calculate the accurate order. Signed-off-by: Liu Xiang Signed-off-by: Paul Burton Fixes: fe8bd18ffea5 ("MIPS: Introduce irq_stack") Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/mips/kernel/irq.c b/arch/mips/kernel/irq.c index dc1180a8bfa1..66736397af9f 100644 --- a/arch/mips/kernel/irq.c +++ b/arch/mips/kernel/irq.c @@ -52,6 +52,7 @@ asmlinkage void spurious_interrupt(void) void __init init_IRQ(void) { int i; + unsigned int order = get_order(IRQ_STACK_SIZE); for (i = 0; i < NR_IRQS; i++) irq_set_noprobe(i); @@ -59,8 +60,7 @@ void __init init_IRQ(void) arch_init_irq(); for_each_possible_cpu(i) { - int irq_pages = IRQ_STACK_SIZE / PAGE_SIZE; - void *s = (void *)__get_free_pages(GFP_KERNEL, irq_pages); + void *s = (void *)__get_free_pages(GFP_KERNEL, order); irq_stack[i] = s; pr_debug("CPU%d IRQ stack at 0x%p - 0x%p\n", i, -- cgit v1.2.3 From f6efc18bbfc3136a829f081388368d75f0b55923 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Mon, 29 Jan 2018 09:09:41 -0800 Subject: xtensa: SMP: fix ccount_timer_shutdown [ Upstream commit 4fe8713b873fc881284722ce4ac47995de7cf62c ] ccount_timer_shutdown is called from the atomic context in the secondary_start_kernel, resulting in the following BUG: BUG: sleeping function called from invalid context in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 Preemption disabled at: secondary_start_kernel+0xa1/0x130 Call Trace: ___might_sleep+0xe7/0xfc __might_sleep+0x41/0x44 synchronize_irq+0x24/0x64 disable_irq+0x11/0x14 ccount_timer_shutdown+0x12/0x20 clockevents_switch_state+0x82/0xb4 clockevents_exchange_device+0x54/0x60 tick_check_new_device+0x46/0x70 clockevents_register_device+0x8c/0xc8 clockevents_config_and_register+0x1d/0x2c local_timer_setup+0x75/0x7c secondary_start_kernel+0xb4/0x130 should_never_return+0x32/0x35 Use disable_irq_nosync instead of disable_irq to avoid it. This is safe because the ccount timer IRQ is per-CPU, and once IRQ is masked the ISR will not be called. Signed-off-by: Max Filippov Signed-off-by: Sasha Levin --- arch/xtensa/kernel/time.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/xtensa/kernel/time.c b/arch/xtensa/kernel/time.c index b9ad9feadc2d..a992cb6a47db 100644 --- a/arch/xtensa/kernel/time.c +++ b/arch/xtensa/kernel/time.c @@ -87,7 +87,7 @@ static int ccount_timer_shutdown(struct clock_event_device *evt) container_of(evt, struct ccount_timer, evt); if (timer->irq_enabled) { - disable_irq(evt->irq); + disable_irq_nosync(evt->irq); timer->irq_enabled = 0; } return 0; -- cgit v1.2.3 From 56b84e4201894b1cf68dc67b88d20f9493bf3072 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Fri, 21 Dec 2018 08:26:20 -0800 Subject: xtensa: SMP: fix secondary CPU initialization [ Upstream commit 32a7726c4f4aadfabdb82440d84f88a5a2c8fe13 ] - add missing memory barriers to the secondary CPU synchronization spin loops; add comment to the matching memory barrier in the boot_secondary and __cpu_die functions; - use READ_ONCE/WRITE_ONCE to access cpu_start_id/cpu_start_ccount instead of reading/writing them directly; - re-initialize cpu_running every time before starting secondary CPU to flush possible previous CPU startup results. Signed-off-by: Max Filippov Signed-off-by: Sasha Levin --- arch/xtensa/kernel/head.S | 5 ++++- arch/xtensa/kernel/smp.c | 34 +++++++++++++++++++++------------- 2 files changed, 25 insertions(+), 14 deletions(-) (limited to 'arch') diff --git a/arch/xtensa/kernel/head.S b/arch/xtensa/kernel/head.S index c7b3bedbfffe..e3823b4f9d08 100644 --- a/arch/xtensa/kernel/head.S +++ b/arch/xtensa/kernel/head.S @@ -286,12 +286,13 @@ should_never_return: movi a2, cpu_start_ccount 1: + memw l32i a3, a2, 0 beqi a3, 0, 1b movi a3, 0 s32i a3, a2, 0 - memw 1: + memw l32i a3, a2, 0 beqi a3, 0, 1b wsr a3, ccount @@ -328,11 +329,13 @@ ENTRY(cpu_restart) rsr a0, prid neg a2, a0 movi a3, cpu_start_id + memw s32i a2, a3, 0 #if XCHAL_DCACHE_IS_WRITEBACK dhwbi a3, 0 #endif 1: + memw l32i a2, a3, 0 dhi a3, 0 bne a2, a0, 1b diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c index 4d02e38514f5..545144d1431d 100644 --- a/arch/xtensa/kernel/smp.c +++ b/arch/xtensa/kernel/smp.c @@ -192,9 +192,11 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts) int i; #ifdef CONFIG_HOTPLUG_CPU - cpu_start_id = cpu; - system_flush_invalidate_dcache_range( - (unsigned long)&cpu_start_id, sizeof(cpu_start_id)); + WRITE_ONCE(cpu_start_id, cpu); + /* Pairs with the third memw in the cpu_restart */ + mb(); + system_flush_invalidate_dcache_range((unsigned long)&cpu_start_id, + sizeof(cpu_start_id)); #endif smp_call_function_single(0, mx_cpu_start, (void *)cpu, 1); @@ -203,18 +205,21 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts) ccount = get_ccount(); while (!ccount); - cpu_start_ccount = ccount; + WRITE_ONCE(cpu_start_ccount, ccount); - while (time_before(jiffies, timeout)) { + do { + /* + * Pairs with the first two memws in the + * .Lboot_secondary. + */ mb(); - if (!cpu_start_ccount) - break; - } + ccount = READ_ONCE(cpu_start_ccount); + } while (ccount && time_before(jiffies, timeout)); - if (cpu_start_ccount) { + if (ccount) { smp_call_function_single(0, mx_cpu_stop, - (void *)cpu, 1); - cpu_start_ccount = 0; + (void *)cpu, 1); + WRITE_ONCE(cpu_start_ccount, 0); return -EIO; } } @@ -234,6 +239,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) pr_debug("%s: Calling wakeup_secondary(cpu:%d, idle:%p, sp: %08lx)\n", __func__, cpu, idle, start_info.stack); + init_completion(&cpu_running); ret = boot_secondary(cpu, idle); if (ret == 0) { wait_for_completion_timeout(&cpu_running, @@ -295,8 +301,10 @@ void __cpu_die(unsigned int cpu) unsigned long timeout = jiffies + msecs_to_jiffies(1000); while (time_before(jiffies, timeout)) { system_invalidate_dcache_range((unsigned long)&cpu_start_id, - sizeof(cpu_start_id)); - if (cpu_start_id == -cpu) { + sizeof(cpu_start_id)); + /* Pairs with the second memw in the cpu_restart */ + mb(); + if (READ_ONCE(cpu_start_id) == -cpu) { platform_cpu_kill(cpu); return; } -- cgit v1.2.3 From ce73d179cf66f6ea37e326716b419948e3df53a6 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Thu, 24 Jan 2019 17:16:11 -0800 Subject: xtensa: smp_lx200_defconfig: fix vectors clash [ Upstream commit 306b38305c0f86de7f17c5b091a95451dcc93d7d ] Secondary CPU reset vector overlaps part of the double exception handler code, resulting in weird crashes and hangups when running user code. Move exception vectors one page up so that they don't clash with the secondary CPU reset vector. Signed-off-by: Max Filippov Signed-off-by: Sasha Levin --- arch/xtensa/configs/smp_lx200_defconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/xtensa/configs/smp_lx200_defconfig b/arch/xtensa/configs/smp_lx200_defconfig index 22eeacba37cc..199e05f85e89 100644 --- a/arch/xtensa/configs/smp_lx200_defconfig +++ b/arch/xtensa/configs/smp_lx200_defconfig @@ -35,6 +35,7 @@ CONFIG_SMP=y CONFIG_HOTPLUG_CPU=y # CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX is not set # CONFIG_PCI is not set +CONFIG_VECTORS_OFFSET=0x00002000 CONFIG_XTENSA_PLATFORM_XTFPGA=y CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="earlycon=uart8250,mmio32,0xfd050020,115200n8 console=ttyS0,115200n8 ip=dhcp root=/dev/nfs rw debug" -- cgit v1.2.3 From 19960e19a7f903062d8191a5773611e313f889b9 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Sat, 19 Jan 2019 00:26:48 -0800 Subject: xtensa: SMP: mark each possible CPU as present [ Upstream commit 8b1c42cdd7181200dc1fff39dcb6ac1a3fac2c25 ] Otherwise it is impossible to enable CPUs after booting with 'maxcpus' parameter. Signed-off-by: Max Filippov Signed-off-by: Sasha Levin --- arch/xtensa/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c index 545144d1431d..0e34c1ed4aa8 100644 --- a/arch/xtensa/kernel/smp.c +++ b/arch/xtensa/kernel/smp.c @@ -80,7 +80,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) { unsigned i; - for (i = 0; i < max_cpus; ++i) + for_each_possible_cpu(i) set_cpu_present(i, true); } -- cgit v1.2.3 From 3afc8b84643848f9d195d9856f12b620fd9a9c95 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Sat, 26 Jan 2019 20:35:18 -0800 Subject: xtensa: SMP: limit number of possible CPUs by NR_CPUS [ Upstream commit 25384ce5f9530def39421597b1457d9462df6455 ] This fixes the following warning at boot when the kernel is booted on a board with more CPU cores than was configured in NR_CPUS: smp_init_cpus: Core Count = 8 smp_init_cpus: Core Id = 0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at include/linux/cpumask.h:121 smp_init_cpus+0x54/0x74 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-00015-g1459333f88a0 #124 Call Trace: __warn$part$3+0x6a/0x7c warn_slowpath_null+0x35/0x3c smp_init_cpus+0x54/0x74 setup_arch+0x1c0/0x1d0 start_kernel+0x44/0x310 _startup+0x107/0x107 Signed-off-by: Max Filippov Signed-off-by: Sasha Levin --- arch/xtensa/kernel/smp.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c index 0e34c1ed4aa8..54bb8e0473a0 100644 --- a/arch/xtensa/kernel/smp.c +++ b/arch/xtensa/kernel/smp.c @@ -93,6 +93,11 @@ void __init smp_init_cpus(void) pr_info("%s: Core Count = %d\n", __func__, ncpus); pr_info("%s: Core Id = %d\n", __func__, core_id); + if (ncpus > NR_CPUS) { + ncpus = NR_CPUS; + pr_info("%s: limiting core count by %d\n", __func__, ncpus); + } + for (i = 0; i < ncpus; ++i) set_cpu_possible(i, true); } -- cgit v1.2.3 From b8c82bd0cc5e4880b478fcda883e5040dd13e50e Mon Sep 17 00:00:00 2001 From: Kairui Song Date: Fri, 18 Jan 2019 19:13:08 +0800 Subject: x86/kexec: Don't setup EFI info if EFI runtime is not enabled [ Upstream commit 2aa958c99c7fd3162b089a1a56a34a0cdb778de1 ] Kexec-ing a kernel with "efi=noruntime" on the first kernel's command line causes the following null pointer dereference: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] Call Trace: efi_runtime_map_copy+0x28/0x30 bzImage64_load+0x688/0x872 arch_kexec_kernel_image_load+0x6d/0x70 kimage_file_alloc_init+0x13e/0x220 __x64_sys_kexec_file_load+0x144/0x290 do_syscall_64+0x55/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Just skip the EFI info setup if EFI runtime services are not enabled. [ bp: Massage commit message. ] Suggested-by: Dave Young Signed-off-by: Kairui Song Signed-off-by: Borislav Petkov Acked-by: Dave Young Cc: AKASHI Takahiro Cc: Andrew Morton Cc: Ard Biesheuvel Cc: bhe@redhat.com Cc: David Howells Cc: erik.schmauss@intel.com Cc: fanc.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: kexec@lists.infradead.org Cc: lenb@kernel.org Cc: linux-acpi@vger.kernel.org Cc: Philipp Rudo Cc: rafael.j.wysocki@intel.com Cc: robert.moore@intel.com Cc: Thomas Gleixner Cc: x86-ml Cc: Yannik Sembritzki Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com Signed-off-by: Sasha Levin --- arch/x86/kernel/kexec-bzimage64.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 0f8a6bbaaa44..0bf17576dd2a 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -168,6 +168,9 @@ setup_efi_state(struct boot_params *params, unsigned long params_load_addr, struct efi_info *current_ei = &boot_params.efi_info; struct efi_info *ei = ¶ms->efi_info; + if (!efi_enabled(EFI_RUNTIME_SERVICES)) + return 0; + if (!current_ei->efi_memmap_size) return 0; -- cgit v1.2.3 From d4cf6d934f72915df1cfd75ee390176bf3a4c99a Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Fri, 1 Feb 2019 14:20:20 -0800 Subject: x86_64: increase stack size for KASAN_EXTRA [ Upstream commit a8e911d13540487942d53137c156bd7707f66e5d ] If the kernel is configured with KASAN_EXTRA, the stack size is increasted significantly because this option sets "-fstack-reuse" to "none" in GCC [1]. As a result, it triggers stack overrun quite often with 32k stack size compiled using GCC 8. For example, this reproducer https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c triggers a "corrupted stack end detected inside scheduler" very reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are just too many functions that could have a large stack with KASAN_EXTRA due to large local variables that have been called over and over again without being able to reuse the stacks. Some noticiable ones are size 7648 shrink_page_list 3584 xfs_rmap_convert 3312 migrate_page_move_mapping 3312 dev_ethtool 3200 migrate_misplaced_transhuge_page 3168 copy_process There are other 49 functions are over 2k in size while compiling kernel with "-Wframe-larger-than=" even with a related minimal config on this machine. Hence, it is too much work to change Makefiles for each object to compile without "-fsanitize-address-use-after-scope" individually. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23 Although there is a patch in GCC 9 to help the situation, GCC 9 probably won't be released in a few months and then it probably take another 6-month to 1-year for all major distros to include it as a default. Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020 when GCC 9 is everywhere. Until then, this patch will help users avoid stack overrun. This has already been fixed for arm64 for the same reason via 6e8830674ea ("arm64: kasan: Increase stack size for KASAN_EXTRA"). Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pw Signed-off-by: Qian Cai Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- arch/x86/include/asm/page_64_types.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 4928cf0d5af0..fb1251946b45 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -2,7 +2,11 @@ #define _ASM_X86_PAGE_64_DEFS_H #ifdef CONFIG_KASAN +#ifdef CONFIG_KASAN_EXTRA +#define KASAN_STACK_ORDER 2 +#else #define KASAN_STACK_ORDER 1 +#endif #else #define KASAN_STACK_ORDER 0 #endif -- cgit v1.2.3 From e83b1928c838f561b4d86fe8f51af9c4195973d0 Mon Sep 17 00:00:00 2001 From: Peng Hao Date: Sat, 29 Dec 2018 13:10:06 +0800 Subject: ARM: pxa: ssp: unneeded to free devm_ allocated data [ Upstream commit ba16adeb346387eb2d1ada69003588be96f098fa ] devm_ allocated data will be automatically freed. The free of devm_ allocated data is invalid. Fixes: 1c459de1e645 ("ARM: pxa: ssp: use devm_ functions") Signed-off-by: Peng Hao [title's prefix changed] Signed-off-by: Robert Jarzmik Signed-off-by: Sasha Levin --- arch/arm/plat-pxa/ssp.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/plat-pxa/ssp.c b/arch/arm/plat-pxa/ssp.c index daa1a65f2eb7..6748827c2ec8 100644 --- a/arch/arm/plat-pxa/ssp.c +++ b/arch/arm/plat-pxa/ssp.c @@ -238,8 +238,6 @@ static int pxa_ssp_remove(struct platform_device *pdev) if (ssp == NULL) return -ENODEV; - iounmap(ssp->mmio_base); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, resource_size(res)); @@ -249,7 +247,6 @@ static int pxa_ssp_remove(struct platform_device *pdev) list_del(&ssp->node); mutex_unlock(&ssp_lock); - kfree(ssp); return 0; } -- cgit v1.2.3 From 861e9499505441980380703d7affed21f767c8ce Mon Sep 17 00:00:00 2001 From: Jun-Ru Chang Date: Tue, 29 Jan 2019 11:56:07 +0800 Subject: MIPS: Remove function size check in get_frame_info() [ Upstream commit 2b424cfc69728224fcb5fad138ea7260728e0901 ] Patch (b6c7a324df37b "MIPS: Fix get_frame_info() handling of microMIPS function size.") introduces additional function size check for microMIPS by only checking insn between ip and ip + func_size. However, func_size in get_frame_info() is always 0 if KALLSYMS is not enabled. This causes get_frame_info() to return immediately without calculating correct frame_size, which in turn causes "Can't analyze schedule() prologue" warning messages at boot time. This patch removes func_size check, and let the frame_size check run up to 128 insns for both MIPS and microMIPS. Signed-off-by: Jun-Ru Chang Signed-off-by: Tony Wu Signed-off-by: Paul Burton Fixes: b6c7a324df37b ("MIPS: Fix get_frame_info() handling of microMIPS function size.") Cc: Cc: Cc: Cc: Cc: Cc: Cc: Cc: Signed-off-by: Sasha Levin --- arch/mips/kernel/process.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index ebd8a715fe38..e6102775892d 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -339,7 +339,7 @@ static inline int is_sp_move_ins(union mips_instruction *ip) static int get_frame_info(struct mips_frame_info *info) { bool is_mmips = IS_ENABLED(CONFIG_CPU_MICROMIPS); - union mips_instruction insn, *ip, *ip_end; + union mips_instruction insn, *ip; const unsigned int max_insns = 128; unsigned int last_insn_size = 0; unsigned int i; @@ -351,10 +351,9 @@ static int get_frame_info(struct mips_frame_info *info) if (!ip) goto err; - ip_end = (void *)ip + info->func_size; - - for (i = 0; i < max_insns && ip < ip_end; i++) { + for (i = 0; i < max_insns; i++) { ip = (void *)ip + last_insn_size; + if (is_mmips && mm_insn_16bit(ip->halfword[0])) { insn.halfword[0] = 0; insn.halfword[1] = ip->halfword[0]; -- cgit v1.2.3 From a264be2b41088ed423f5610e8d0b8e923b4c32af Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Fri, 15 Feb 2019 11:36:50 +0100 Subject: ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU commit a66352e005488ecb4b534ba1af58a9f671eba9b8 upstream. Add minimal parameters needed by the Exynos CLKOUT driver to Exynos3250 PMU node. This fixes the following warning on boot: exynos_clkout_init: failed to register clkout clock Fixes: d19bb397e19e ("ARM: dts: exynos: Update PMU node with CLKOUT related data") Cc: Signed-off-by: Marek Szyprowski Signed-off-by: Krzysztof Kozlowski Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/exynos3250.dtsi | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/exynos3250.dtsi b/arch/arm/boot/dts/exynos3250.dtsi index 2f30d632f1cc..e81a27214188 100644 --- a/arch/arm/boot/dts/exynos3250.dtsi +++ b/arch/arm/boot/dts/exynos3250.dtsi @@ -150,6 +150,9 @@ interrupt-controller; #interrupt-cells = <3>; interrupt-parent = <&gic>; + clock-names = "clkout8"; + clocks = <&cmu CLK_FIN_PLL>; + #clock-cells = <1>; }; mipi_phy: video-phy@10020710 { -- cgit v1.2.3 From a20168a138368f4d470923f0d446bfcc827fe745 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Mon, 11 Mar 2019 20:28:23 -0400 Subject: Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls" This reverts commit 7212e37cbdf99f48e4a6c689a42f4bda1ae69001. Hedi Berriche notes: > In 4.4-stable efi_runtime_lock as defined in drivers/firmware/efi/runtime-wrappers.c > is a spinlock (given it predates commit dce48e351c0d) and commit > > f331e766c4be x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls > > which 7212e37cbdf9 is a backport of, needs it to be a semaphore. Signed-off-by: Sasha Levin --- arch/x86/include/asm/uv/bios.h | 8 +------- arch/x86/platform/uv/bios_uv.c | 23 ++--------------------- 2 files changed, 3 insertions(+), 28 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 8b7594f2d48f..71605c7d5c5c 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,8 +48,7 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY, - BIOS_STATUS_ABORT = -EINTR, + BIOS_STATUS_UNAVAIL = -EBUSY }; /* @@ -112,9 +111,4 @@ extern long system_serial_number; extern struct kobject *sgi_uv_kobj; /* /sys/firmware/sgi_uv */ -/* - * EFI runtime lock; cf. firmware/efi/runtime-wrappers.c for details - */ -extern struct semaphore __efi_uv_runtime_lock; - #endif /* _ASM_X86_UV_BIOS_H */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index a45a1c5aabea..1584cbed0dce 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -28,8 +28,7 @@ static struct uv_systab uv_systab; -static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, - u64 a4, u64 a5) +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) { struct uv_systab *tab = &uv_systab; s64 ret; @@ -44,19 +43,6 @@ static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, a1, a2, a3, a4, a5); return ret; } - -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) -{ - s64 ret; - - if (down_interruptible(&__efi_uv_runtime_lock)) - return BIOS_STATUS_ABORT; - - ret = __uv_bios_call(which, a1, a2, a3, a4, a5); - up(&__efi_uv_runtime_lock); - - return ret; -} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -65,15 +51,10 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; - if (down_interruptible(&__efi_uv_runtime_lock)) - return BIOS_STATUS_ABORT; - local_irq_save(bios_flags); - ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + ret = uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); - up(&__efi_uv_runtime_lock); - return ret; } -- cgit v1.2.3 From ec117204466e3d32771efa7b66f75497169aaecb Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sat, 11 Feb 2017 22:14:56 +0200 Subject: ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420 commit 28928a3ce142b2e4e5a7a0f067cefb41a3d2c3f9 upstream. In Odroid XU3 Lite board, the temperature levels reported for thermal zone 0 were weird. In warm room: /sys/class/thermal/thermal_zone0/temp:32000 /sys/class/thermal/thermal_zone1/temp:51000 /sys/class/thermal/thermal_zone2/temp:55000 /sys/class/thermal/thermal_zone3/temp:54000 /sys/class/thermal/thermal_zone4/temp:51000 Sometimes after booting the value was even equal to ambient temperature which is highly unlikely to be a real temperature of sensor in SoC. The thermal sensor's calibration (trimming) is based on fused values. In case of the board above, the fused values are: 35, 52, 43, 58 and 43 (corresponding to each TMU device). However driver defined a minimum value for fused data as 40 and for smaller values it was using a hard-coded 55 instead. This lead to mapping data from sensor to wrong temperatures for thermal zone 0. Various vendor 3.10 trees (Hardkernel's based on Samsung LSI, Artik 10) do not impose any limits on fused values. Since we do not have any knowledge about these limits, use 0 as a minimum accepted fused value. This should essentially allow accepting any reasonable fused value thus behaving like vendor driver. The exynos5420-tmu-sensor-conf.dtsi is copied directly from existing exynos4412 with one change - the samsung,tmu_min_efuse_value. Signed-off-by: Krzysztof Kozlowski Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Eduardo Valentin Reviewed-by: Javier Martinez Canillas Tested-by: Javier Martinez Canillas Reviewed-by: Anand Moon Tested-by: Anand Moon Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/exynos5420-tmu-sensor-conf.dtsi | 25 +++++++++++++++++++++++ arch/arm/boot/dts/exynos5420.dtsi | 10 ++++----- 2 files changed, 30 insertions(+), 5 deletions(-) create mode 100644 arch/arm/boot/dts/exynos5420-tmu-sensor-conf.dtsi (limited to 'arch') diff --git a/arch/arm/boot/dts/exynos5420-tmu-sensor-conf.dtsi b/arch/arm/boot/dts/exynos5420-tmu-sensor-conf.dtsi new file mode 100644 index 000000000000..c8771c660550 --- /dev/null +++ b/arch/arm/boot/dts/exynos5420-tmu-sensor-conf.dtsi @@ -0,0 +1,25 @@ +/* + * Device tree sources for Exynos5420 TMU sensor configuration + * + * Copyright (c) 2014 Lukasz Majewski + * Copyright (c) 2017 Krzysztof Kozlowski + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include + +#thermal-sensor-cells = <0>; +samsung,tmu_gain = <8>; +samsung,tmu_reference_voltage = <16>; +samsung,tmu_noise_cancel_mode = <4>; +samsung,tmu_efuse_value = <55>; +samsung,tmu_min_efuse_value = <0>; +samsung,tmu_max_efuse_value = <100>; +samsung,tmu_first_point_trim = <25>; +samsung,tmu_second_point_trim = <85>; +samsung,tmu_default_temp_offset = <50>; +samsung,tmu_cal_type = ; diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi index 1b3d6c769a3c..d5edb7766942 100644 --- a/arch/arm/boot/dts/exynos5420.dtsi +++ b/arch/arm/boot/dts/exynos5420.dtsi @@ -777,7 +777,7 @@ interrupts = <0 65 0>; clocks = <&clock CLK_TMU>; clock-names = "tmu_apbif"; - #include "exynos4412-tmu-sensor-conf.dtsi" + #include "exynos5420-tmu-sensor-conf.dtsi" }; tmu_cpu1: tmu@10064000 { @@ -786,7 +786,7 @@ interrupts = <0 183 0>; clocks = <&clock CLK_TMU>; clock-names = "tmu_apbif"; - #include "exynos4412-tmu-sensor-conf.dtsi" + #include "exynos5420-tmu-sensor-conf.dtsi" }; tmu_cpu2: tmu@10068000 { @@ -795,7 +795,7 @@ interrupts = <0 184 0>; clocks = <&clock CLK_TMU>, <&clock CLK_TMU>; clock-names = "tmu_apbif", "tmu_triminfo_apbif"; - #include "exynos4412-tmu-sensor-conf.dtsi" + #include "exynos5420-tmu-sensor-conf.dtsi" }; tmu_cpu3: tmu@1006c000 { @@ -804,7 +804,7 @@ interrupts = <0 185 0>; clocks = <&clock CLK_TMU>, <&clock CLK_TMU_GPU>; clock-names = "tmu_apbif", "tmu_triminfo_apbif"; - #include "exynos4412-tmu-sensor-conf.dtsi" + #include "exynos5420-tmu-sensor-conf.dtsi" }; tmu_gpu: tmu@100a0000 { @@ -813,7 +813,7 @@ interrupts = <0 215 0>; clocks = <&clock CLK_TMU_GPU>, <&clock CLK_TMU>; clock-names = "tmu_apbif", "tmu_triminfo_apbif"; - #include "exynos4412-tmu-sensor-conf.dtsi" + #include "exynos5420-tmu-sensor-conf.dtsi" }; thermal-zones { -- cgit v1.2.3 From 4e873fa2105264ff7847798153a464761d34e126 Mon Sep 17 00:00:00 2001 From: Yizhuo Date: Fri, 25 Jan 2019 22:32:20 -0800 Subject: ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized [ Upstream commit dc30e70391376ba3987aeb856ae6d9c0706534f1 ] In function omap4_dsi_mux_pads(), local variable "reg" could be uninitialized if function regmap_read() returns -EINVAL. However, it will be used directly in the later context, which is potentially unsafe. Signed-off-by: Yizhuo Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin --- arch/arm/mach-omap2/display.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/display.c b/arch/arm/mach-omap2/display.c index 6ab13d18c636..cde86d1199cf 100644 --- a/arch/arm/mach-omap2/display.c +++ b/arch/arm/mach-omap2/display.c @@ -115,6 +115,7 @@ static int omap4_dsi_mux_pads(int dsi_id, unsigned lanes) u32 enable_mask, enable_shift; u32 pipd_mask, pipd_shift; u32 reg; + int ret; if (dsi_id == 0) { enable_mask = OMAP4_DSI1_LANEENABLE_MASK; @@ -130,7 +131,11 @@ static int omap4_dsi_mux_pads(int dsi_id, unsigned lanes) return -ENODEV; } - regmap_read(omap4_dsi_mux_syscon, OMAP4_DSIPHY_SYSCON_OFFSET, ®); + ret = regmap_read(omap4_dsi_mux_syscon, + OMAP4_DSIPHY_SYSCON_OFFSET, + ®); + if (ret) + return ret; reg &= ~enable_mask; reg &= ~pipd_mask; -- cgit v1.2.3 From 823c717dbf5d63433713e1daaea3578624926210 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Mon, 21 Jan 2019 14:42:42 +0100 Subject: ARM: 8824/1: fix a migrating irq bug when hotplug cpu [ Upstream commit 1b5ba350784242eb1f899bcffd95d2c7cff61e84 ] Arm TC2 fails cpu hotplug stress test. This issue was tracked down to a missing copy of the new affinity cpumask for the vexpress-spc interrupt into struct irq_common_data.affinity when the interrupt is migrated in migrate_one_irq(). Fix it by replacing the arm specific hotplug cpu migration with the generic irq code. This is the counterpart implementation to commit 217d453d473c ("arm64: fix a migrating irq bug when hotplug cpu"). Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y). The vexpress-spc interrupt (irq=22) on this board is affine to CPU0. Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when CPU0 is hotplugged out. Suggested-by: Marc Zyngier Signed-off-by: Dietmar Eggemann Acked-by: Marc Zyngier Reviewed-by: Linus Walleij Signed-off-by: Russell King Signed-off-by: Sasha Levin --- arch/arm/Kconfig | 1 + arch/arm/include/asm/irq.h | 1 - arch/arm/kernel/irq.c | 62 ---------------------------------------------- arch/arm/kernel/smp.c | 2 +- 4 files changed, 2 insertions(+), 64 deletions(-) (limited to 'arch') diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 34e1569a11ee..3a0277c6c060 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1475,6 +1475,7 @@ config NR_CPUS config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP + select GENERIC_IRQ_MIGRATION help Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h index 1bd9510de1b9..cae4df39f02e 100644 --- a/arch/arm/include/asm/irq.h +++ b/arch/arm/include/asm/irq.h @@ -24,7 +24,6 @@ #ifndef __ASSEMBLY__ struct irqaction; struct pt_regs; -extern void migrate_irqs(void); extern void asm_do_IRQ(unsigned int, struct pt_regs *); void handle_IRQ(unsigned int, struct pt_regs *); diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c index 1d45320ee125..900c591913d5 100644 --- a/arch/arm/kernel/irq.c +++ b/arch/arm/kernel/irq.c @@ -31,7 +31,6 @@ #include #include #include -#include #include #include #include @@ -119,64 +118,3 @@ int __init arch_probe_nr_irqs(void) return nr_irqs; } #endif - -#ifdef CONFIG_HOTPLUG_CPU -static bool migrate_one_irq(struct irq_desc *desc) -{ - struct irq_data *d = irq_desc_get_irq_data(desc); - const struct cpumask *affinity = irq_data_get_affinity_mask(d); - struct irq_chip *c; - bool ret = false; - - /* - * If this is a per-CPU interrupt, or the affinity does not - * include this CPU, then we have nothing to do. - */ - if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity)) - return false; - - if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) { - affinity = cpu_online_mask; - ret = true; - } - - c = irq_data_get_irq_chip(d); - if (!c->irq_set_affinity) - pr_debug("IRQ%u: unable to set affinity\n", d->irq); - else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && ret) - cpumask_copy(irq_data_get_affinity_mask(d), affinity); - - return ret; -} - -/* - * The current CPU has been marked offline. Migrate IRQs off this CPU. - * If the affinity settings do not allow other CPUs, force them onto any - * available CPU. - * - * Note: we must iterate over all IRQs, whether they have an attached - * action structure or not, as we need to get chained interrupts too. - */ -void migrate_irqs(void) -{ - unsigned int i; - struct irq_desc *desc; - unsigned long flags; - - local_irq_save(flags); - - for_each_irq_desc(i, desc) { - bool affinity_broken; - - raw_spin_lock(&desc->lock); - affinity_broken = migrate_one_irq(desc); - raw_spin_unlock(&desc->lock); - - if (affinity_broken) - pr_warn_ratelimited("IRQ%u no longer affine to CPU%u\n", - i, smp_processor_id()); - } - - local_irq_restore(flags); -} -#endif /* CONFIG_HOTPLUG_CPU */ diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index e42be5800f37..08ce9e36dc5a 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -218,7 +218,7 @@ int __cpu_disable(void) /* * OK - migrate IRQs away from this CPU */ - migrate_irqs(); + irq_migrate_all_off_this_cpu(); /* * Flush user cache and TLB mappings, and then remove this CPU -- cgit v1.2.3 From aa5740d660ac628b300af200e6a36c094b25fffa Mon Sep 17 00:00:00 2001 From: Vladimir Murzin Date: Wed, 20 Feb 2019 11:43:05 +0000 Subject: arm64: Relax GIC version check during early boot [ Upstream commit 74698f6971f25d045301139413578865fc2bd8f9 ] Updates to the GIC architecture allow ID_AA64PFR0_EL1.GIC to have values other than 0 or 1. At the moment, Linux is quite strict in the way it handles this field at early boot stage (cpufeature is fine) and will refuse to use the system register CPU interface if it doesn't find the value 1. Fixes: 021f653791ad17e03f98aaa7fb933816ae16f161 ("irqchip: gic-v3: Initial support for GICv3") Reported-by: Chase Conklin Reviewed-by: Marc Zyngier Signed-off-by: Vladimir Murzin Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- arch/arm64/kernel/head.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0382eba4bf7b..6299a8a361ee 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -478,8 +478,7 @@ CPU_LE( bic x0, x0, #(3 << 24) ) // Clear the EE and E0E bits for EL1 /* GICv3 system register access */ mrs x0, id_aa64pfr0_el1 ubfx x0, x0, #24, #4 - cmp x0, #1 - b.ne 3f + cbz x0, 3f mrs_s x0, ICC_SRE_EL2 orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1 -- cgit v1.2.3 From 225bbd61b3ab4eb485dbc0ebd45a3968d400b3bf Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Tue, 5 Feb 2019 10:07:07 -0800 Subject: ARC: uacces: remove lp_start, lp_end from clobber list [ Upstream commit d5e3c55e01d8b1774b37b4647c30fb22f1d39077 ] Newer ARC gcc handles lp_start, lp_end in a different way and doesn't like them in the clobber list. Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin --- arch/arc/include/asm/uaccess.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h index 57387b567f34..f077a419cb51 100644 --- a/arch/arc/include/asm/uaccess.h +++ b/arch/arc/include/asm/uaccess.h @@ -209,7 +209,7 @@ __arc_copy_from_user(void *to, const void __user *from, unsigned long n) */ "=&r" (tmp), "+r" (to), "+r" (from) : - : "lp_count", "lp_start", "lp_end", "memory"); + : "lp_count", "memory"); return n; } @@ -438,7 +438,7 @@ __arc_copy_to_user(void __user *to, const void *from, unsigned long n) */ "=&r" (tmp), "+r" (to), "+r" (from) : - : "lp_count", "lp_start", "lp_end", "memory"); + : "lp_count", "memory"); return n; } @@ -658,7 +658,7 @@ static inline unsigned long __arc_clear_user(void __user *to, unsigned long n) " .previous \n" : "+r"(d_char), "+r"(res) : "i"(0) - : "lp_count", "lp_start", "lp_end", "memory"); + : "lp_count", "memory"); return res; } @@ -691,7 +691,7 @@ __arc_strncpy_from_user(char *dst, const char __user *src, long count) " .previous \n" : "+r"(res), "+r"(dst), "+r"(src), "=r"(val) : "g"(-EFAULT), "r"(count) - : "lp_count", "lp_start", "lp_end", "memory"); + : "lp_count", "memory"); return res; } -- cgit v1.2.3 From a2ef87f9d268190e27580ec5ede7cbf7dd6702aa Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 24 Jan 2019 17:33:45 +0100 Subject: crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling commit eaf46edf6ea89675bd36245369c8de5063a0272c upstream. The NEON MAC calculation routine fails to handle the case correctly where there is some data in the buffer, and the input fills it up exactly. In this case, we enter the loop at the end with w8 == 0, while a negative value is assumed, and so the loop carries on until the increment of the 32-bit counter wraps around, which is quite obviously wrong. So omit the loop altogether in this case, and exit right away. Reported-by: Eric Biggers Fixes: a3fd82105b9d1 ("arm64/crypto: AES in CCM mode using ARMv8 Crypto ...") Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/arm64/crypto/aes-ce-ccm-core.S | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 3363560c79b7..7bc459d9235c 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -74,12 +74,13 @@ ENTRY(ce_aes_ccm_auth_data) beq 10f ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ b 7b -8: mov w7, w8 +8: cbz w8, 91f + mov w7, w8 add w8, w8, #16 9: ext v1.16b, v1.16b, v1.16b, #1 adds w7, w7, #1 bne 9b - eor v0.16b, v0.16b, v1.16b +91: eor v0.16b, v0.16b, v1.16b st1 {v0.16b}, [x0] 10: str w8, [x3] ret -- cgit v1.2.3 From 22058c290c9498dd9f0d507b6f0e1a0e8465d87a Mon Sep 17 00:00:00 2001 From: Finn Thain Date: Wed, 16 Jan 2019 16:23:24 +1100 Subject: m68k: Add -ffreestanding to CFLAGS commit 28713169d879b67be2ef2f84dcf54905de238294 upstream. This patch fixes a build failure when using GCC 8.1: /usr/bin/ld: block/partitions/ldm.o: in function `ldm_parse_tocblock': block/partitions/ldm.c:153: undefined reference to `strcmp' This is caused by a new optimization which effectively replaces a strncmp() call with a strcmp() call. This affects a number of strncmp() call sites in the kernel. The entire class of optimizations is avoided with -fno-builtin, which gets enabled by -ffreestanding. This may avoid possible future build failures in case new optimizations appear in future compilers. I haven't done any performance measurements with this patch but I did count the function calls in a defconfig build. For example, there are now 23 more sprintf() calls and 39 fewer strcpy() calls. The effect on the other libc functions is smaller. If this harms performance we can tackle that regression by optimizing the call sites, ideally using semantic patches. That way, clang and ICC builds might benfit too. Cc: stable@vger.kernel.org Reference: https://marc.info/?l=linux-m68k&m=154514816222244&w=2 Signed-off-by: Finn Thain Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- arch/m68k/Makefile | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/m68k/Makefile b/arch/m68k/Makefile index 0b29dcfef69f..0c736ed58abd 100644 --- a/arch/m68k/Makefile +++ b/arch/m68k/Makefile @@ -59,7 +59,10 @@ cpuflags-$(CONFIG_M5206e) := $(call cc-option,-mcpu=5206e,-m5200) cpuflags-$(CONFIG_M5206) := $(call cc-option,-mcpu=5206,-m5200) KBUILD_AFLAGS += $(cpuflags-y) -KBUILD_CFLAGS += $(cpuflags-y) -pipe +KBUILD_CFLAGS += $(cpuflags-y) + +KBUILD_CFLAGS += -pipe -ffreestanding + ifdef CONFIG_MMU # without -fno-strength-reduce the 53c7xx.c driver fails ;-( KBUILD_CFLAGS += -fno-strength-reduce -ffixed-a2 -- cgit v1.2.3 From 788b1a98f41511885ae432fca12df32fc4fb9473 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 27 Feb 2019 11:45:30 +0000 Subject: powerpc/32: Clear on-stack exception marker upon exception return commit 9580b71b5a7863c24a9bd18bcd2ad759b86b1eff upstream. Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order to avoid confusing stacktrace like the one below. Call Trace: [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180 [c0e9dd10] [c0895130] memchr+0x24/0x74 [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4 --- interrupt: c0e9df00 at 0x400f330 LR = init_stack+0x1f00/0x2000 [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable) [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 With this patch the trace becomes: Call Trace: [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180 [c0e9dd10] [c0895150] memchr+0x24/0x74 [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4 [c0e9de80] [c00ae3e4] printk+0xa8/0xcc [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/entry_32.S | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 2405631e91a2..3728e617e17e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -685,6 +685,9 @@ fast_exception_return: mtcr r10 lwz r10,_LINK(r11) mtlr r10 + /* Clear the exception_marker on the stack to avoid confusing stacktrace */ + li r10, 0 + stw r10, 8(r11) REST_GPR(10, r11) mtspr SPRN_SRR1,r9 mtspr SPRN_SRR0,r12 @@ -915,6 +918,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) mtcrf 0xFF,r10 mtlr r11 + /* Clear the exception_marker on the stack to avoid confusing stacktrace */ + li r10, 0 + stw r10, 8(r1) /* * Once we put values in SRR0 and SRR1, we are in a state * where exceptions are not recoverable, since taking an @@ -952,6 +958,9 @@ exc_exit_restart_end: mtlr r11 lwz r10,_CCR(r1) mtcrf 0xff,r10 + /* Clear the exception_marker on the stack to avoid confusing stacktrace */ + li r10, 0 + stw r10, 8(r1) REST_2GPRS(9, r1) .globl exc_exit_restart exc_exit_restart: -- cgit v1.2.3 From aa3995f04e39c6b7cb60ce66e8d20502a42bf78d Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Thu, 21 Feb 2019 19:08:37 +0000 Subject: powerpc/wii: properly disable use of BATs when requested. commit 6d183ca8baec983dc4208ca45ece3c36763df912 upstream. 'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC deny the use of BATS for mapping memory. This patch makes sure that the specific wii RAM mapping function takes it into account as well. Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram") Cc: stable@vger.kernel.org Reviewed-by: Jonathan Neuschafer Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/embedded6xx/wii.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c index 352592d3e44e..7fd19a480422 100644 --- a/arch/powerpc/platforms/embedded6xx/wii.c +++ b/arch/powerpc/platforms/embedded6xx/wii.c @@ -104,6 +104,10 @@ unsigned long __init wii_mmu_mapin_mem2(unsigned long top) /* MEM2 64MB@0x10000000 */ delta = wii_hole_start + wii_hole_size; size = top - delta; + + if (__map_without_bats) + return delta; + for (bl = 128<<10; bl < max_size; bl <<= 1) { if (bl * 2 > size) break; -- cgit v1.2.3 From d9fbe055bc95027b03fc87aba1f88afab317017c Mon Sep 17 00:00:00 2001 From: Jordan Niethe Date: Wed, 27 Feb 2019 14:02:29 +1100 Subject: powerpc/powernv: Make opal log only readable by root commit 7b62f9bd2246b7d3d086e571397c14ba52645ef1 upstream. Currently the opal log is globally readable. It is kernel policy to limit the visibility of physical addresses / kernel pointers to root. Given this and the fact the opal log may contain this information it would be better to limit the readability to root. Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Jordan Niethe Reviewed-by: Stewart Smith Reviewed-by: Andrew Donnellan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/opal-msglog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/platforms/powernv/opal-msglog.c b/arch/powerpc/platforms/powernv/opal-msglog.c index 44ed78af1a0d..9021b7272889 100644 --- a/arch/powerpc/platforms/powernv/opal-msglog.c +++ b/arch/powerpc/platforms/powernv/opal-msglog.c @@ -92,7 +92,7 @@ out: } static struct bin_attribute opal_msglog_attr = { - .attr = {.name = "msglog", .mode = 0444}, + .attr = {.name = "msglog", .mode = 0400}, .read = opal_msglog_read }; -- cgit v1.2.3 From 7ea0c2f9788a18a8cbd634a0ebfee0510aaabf78 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 25 Jan 2019 12:03:55 +0000 Subject: powerpc/83xx: Also save/restore SPRG4-7 during suspend commit 36da5ff0bea2dc67298150ead8d8471575c54c7d upstream. The 83xx has 8 SPRG registers and uses at least SPRG4 for DTLB handling LRU. Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/83xx/suspend-asm.S | 34 ++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/platforms/83xx/suspend-asm.S b/arch/powerpc/platforms/83xx/suspend-asm.S index 3d1ecd211776..8137f77abad5 100644 --- a/arch/powerpc/platforms/83xx/suspend-asm.S +++ b/arch/powerpc/platforms/83xx/suspend-asm.S @@ -26,13 +26,13 @@ #define SS_MSR 0x74 #define SS_SDR1 0x78 #define SS_LR 0x7c -#define SS_SPRG 0x80 /* 4 SPRGs */ -#define SS_DBAT 0x90 /* 8 DBATs */ -#define SS_IBAT 0xd0 /* 8 IBATs */ -#define SS_TB 0x110 -#define SS_CR 0x118 -#define SS_GPREG 0x11c /* r12-r31 */ -#define STATE_SAVE_SIZE 0x16c +#define SS_SPRG 0x80 /* 8 SPRGs */ +#define SS_DBAT 0xa0 /* 8 DBATs */ +#define SS_IBAT 0xe0 /* 8 IBATs */ +#define SS_TB 0x120 +#define SS_CR 0x128 +#define SS_GPREG 0x12c /* r12-r31 */ +#define STATE_SAVE_SIZE 0x17c .section .data .align 5 @@ -103,6 +103,16 @@ _GLOBAL(mpc83xx_enter_deep_sleep) stw r7, SS_SPRG+12(r3) stw r8, SS_SDR1(r3) + mfspr r4, SPRN_SPRG4 + mfspr r5, SPRN_SPRG5 + mfspr r6, SPRN_SPRG6 + mfspr r7, SPRN_SPRG7 + + stw r4, SS_SPRG+16(r3) + stw r5, SS_SPRG+20(r3) + stw r6, SS_SPRG+24(r3) + stw r7, SS_SPRG+28(r3) + mfspr r4, SPRN_DBAT0U mfspr r5, SPRN_DBAT0L mfspr r6, SPRN_DBAT1U @@ -493,6 +503,16 @@ mpc83xx_deep_resume: mtspr SPRN_IBAT7U, r6 mtspr SPRN_IBAT7L, r7 + lwz r4, SS_SPRG+16(r3) + lwz r5, SS_SPRG+20(r3) + lwz r6, SS_SPRG+24(r3) + lwz r7, SS_SPRG+28(r3) + + mtspr SPRN_SPRG4, r4 + mtspr SPRN_SPRG5, r5 + mtspr SPRN_SPRG6, r6 + mtspr SPRN_SPRG7, r7 + lwz r4, SS_SPRG+0(r3) lwz r5, SS_SPRG+4(r3) lwz r6, SS_SPRG+8(r3) -- cgit v1.2.3 From fd2ebccb58843f76d5d5c552a7d2079d76790a7d Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 3 Jan 2019 14:14:08 -0600 Subject: ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify commit e2477233145f2156434afb799583bccd878f3e9f upstream. Fix boolean expressions by using logical AND operator '&&' instead of bitwise operator '&'. This issue was detected with the help of Coccinelle. Fixes: 4fa084af28ca ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva [krzk: Fix -Wparentheses warning] Signed-off-by: Krzysztof Kozlowski Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-s3c24xx/mach-osiris-dvs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c index ce2db235dbaf..5e8a306163de 100644 --- a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c +++ b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c @@ -70,16 +70,16 @@ static int osiris_dvs_notify(struct notifier_block *nb, switch (val) { case CPUFREQ_PRECHANGE: - if (old_dvs & !new_dvs || - cur_dvs & !new_dvs) { + if ((old_dvs && !new_dvs) || + (cur_dvs && !new_dvs)) { pr_debug("%s: exiting dvs\n", __func__); cur_dvs = false; gpio_set_value(OSIRIS_GPIO_DVS, 1); } break; case CPUFREQ_POSTCHANGE: - if (!old_dvs & new_dvs || - !cur_dvs & new_dvs) { + if ((!old_dvs && new_dvs) || + (!cur_dvs && new_dvs)) { pr_debug("entering dvs\n"); cur_dvs = true; gpio_set_value(OSIRIS_GPIO_DVS, 0); -- cgit v1.2.3 From 2866808ffc0f0147e84d5643f9f4731401303df3 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 23 Jan 2019 14:39:23 -0800 Subject: KVM: nVMX: Sign extend displacements of VMX instr's mem operands MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream. The VMCS.EXIT_QUALIFCATION field reports the displacements of memory operands for various instructions, including VMX instructions, as a naturally sized unsigned value, but masks the value by the addr size, e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is reported as 0xffffffd8 for a 32-bit address size. Despite some weird wording regarding sign extension, the SDM explicitly states that bits beyond the instructions address size are undefined: In all cases, bits of this field beyond the instruction’s address size are undefined. Failure to sign extend the displacement results in KVM incorrectly treating a negative displacement as a large positive displacement when the address size of the VMX instruction is smaller than KVM's native size, e.g. a 32-bit address size on a 64-bit KVM. The very original decoding, added by commit 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions"), sort of modeled sign extension by truncating the final virtual/linear address for a 32-bit address size. I.e. it messed up the effective address but made it work by adjusting the final address. When segmentation checks were added, the truncation logic was kept as-is and no sign extension logic was introduced. In other words, it kept calculating the wrong effective address while mostly generating the correct virtual/linear address. As the effective address is what's used in the segment limit checks, this results in KVM incorreclty injecting #GP/#SS faults due to non-existent segment violations when a nested VMM uses negative displacements with an address size smaller than KVM's native address size. Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in KVM using 0x100000fd8 as the effective address when checking for a segment limit violation. This causes a 100% failure rate when running a 32-bit KVM build as L1 on top of a 64-bit KVM L0. Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 14553f6c03a6..b5520c17c2a1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6656,6 +6656,10 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu, /* Addr = segment_base + offset */ /* offset = base + [index * scale] + displacement */ off = exit_qualification; /* holds the displacement */ + if (addr_size == 1) + off = (gva_t)sign_extend64(off, 31); + else if (addr_size == 0) + off = (gva_t)sign_extend64(off, 15); if (base_is_valid) off += kvm_register_read(vcpu, base_reg); if (index_is_valid) -- cgit v1.2.3 From 8c7543e3b8eb160b8458390e424b315855ddf13e Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 23 Jan 2019 14:39:25 -0800 Subject: KVM: nVMX: Ignore limit checks on VMX instructions using flat segments commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream. Regarding segments with a limit==0xffffffff, the SDM officially states: When the effective limit is FFFFFFFFH (4 GBytes), these accesses may or may not cause the indicated exceptions. Behavior is implementation-specific and may vary from one execution to another. In practice, all CPUs that support VMX ignore limit checks for "flat segments", i.e. an expand-up data or code segment with base=0 and limit=0xffffffff. This is subtly different than wrapping the effective address calculation based on the address size, as the flat segment behavior also applies to accesses that would wrap the 4g boundary, e.g. a 4-byte access starting at 0xffffffff will access linear addresses 0xffffffff, 0x0, 0x1 and 0x2. Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b5520c17c2a1..61e3ffe58bce 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6702,10 +6702,16 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu, /* Protected mode: #GP(0)/#SS(0) if the segment is unusable. */ exn = (s.unusable != 0); - /* Protected mode: #GP(0)/#SS(0) if the memory - * operand is outside the segment limit. + + /* + * Protected mode: #GP(0)/#SS(0) if the memory operand is + * outside the segment limit. All CPUs that support VMX ignore + * limit checks for flat segments, i.e. segments with base==0, + * limit==0xffffffff and of type expand-up data or code. */ - exn = exn || (off + sizeof(u64) > s.limit); + if (!(s.base == 0 && s.limit == 0xffffffff && + ((s.type & 8) || !(s.type & 4)))) + exn = exn || (off + sizeof(u64) > s.limit); } if (exn) { kvm_queue_exception_e(vcpu, -- cgit v1.2.3 From 5d8f03acc1a485f47775a41ab5af4e8d050b716e Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Wed, 9 Aug 2017 22:33:12 -0700 Subject: KVM: X86: Fix residual mmio emulation request to userspace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bbeac2830f4de270bb48141681cb730aadf8dce1 upstream. Reported by syzkaller: The kvm-intel.unrestricted_guest=0 WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] CPU: 5 PID: 1014 Comm: warn_test Tainted: G W OE 4.13.0-rc3+ #8 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] Call Trace: ? put_pid+0x3a/0x50 ? rcu_read_lock_sched_held+0x79/0x80 ? kmem_cache_free+0x2f2/0x350 kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 The syszkaller folks reported a residual mmio emulation request to userspace due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs several threads to launch the same vCPU, the thread which lauch this vCPU after the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will trigger the warning. #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include int kvmcpu; struct kvm_run *run; void* thr(void* arg) { int res; res = ioctl(kvmcpu, KVM_RUN, 0); printf("ret1=%d exit_reason=%d suberror=%d\n", res, run->exit_reason, run->internal.suberror); return 0; } void test() { int i, kvm, kvmvm; pthread_t th[4]; kvm = open("/dev/kvm", O_RDWR); kvmvm = ioctl(kvm, KVM_CREATE_VM, 0); kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0); run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0); srand(getpid()); for (i = 0; i < 4; i++) { pthread_create(&th[i], 0, thr, 0); usleep(rand() % 10000); } for (i = 0; i < 4; i++) pthread_join(th[i], 0); } int main() { for (;;) { int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { test(); exit(0); } int status; while (waitpid(pid, &status, __WALL) != pid) {} } return 0; } This patch fixes it by resetting the vcpu->mmio_needed once we receive the triple fault to avoid the residue. Reported-by: Dmitry Vyukov Tested-by: Dmitry Vyukov Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Dmitry Vyukov Signed-off-by: Wanpeng Li Signed-off-by: Paolo Bonzini Cc: Zubin Mithra Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 2 files changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 61e3ffe58bce..098be61a6b4c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5574,6 +5574,7 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu) static int handle_triple_fault(struct kvm_vcpu *vcpu) { vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; + vcpu->mmio_needed = 0; return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6bd0538d8ebf..706c5d63a53f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6478,6 +6478,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) { vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; + vcpu->mmio_needed = 0; r = 0; goto out; } -- cgit v1.2.3