From db91aa793ff984ac048e199ea1c54202543952fe Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Mon, 3 Oct 2016 13:17:08 +0300 Subject: x86/irq: Prevent force migration of irqs which are not in the vector domain When a CPU is about to be offlined we call fixup_irqs() that resets IRQ affinities related to the CPU in question. The same thing is also done when the system is suspended to S-states like S3 (mem). For each IRQ we try to complete any on-going move regardless whether the IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch its chip_data, assume it is of type struct apic_chip_data and manipulate it by clearing old_domain mask etc. For irq_chips that are not part of the x86_vector_domain, like those created by various GPIO drivers, will find their chip_data being changed unexpectly. Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets corrupted after resume: # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in hi # rtcwake -s10 -mmem <10 seconds passes> # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in ? Note '?' in the output. It means the struct gpio_chip ->get function is NULL whereas before suspend it was there. Fix this by first checking that the IRQ belongs to x86_vector_domain before we try to use the chip_data as struct apic_chip_data. Reported-and-tested-by: Sakari Ailus Signed-off-by: Mika Westerberg Cc: stable@vger.kernel.org # 4.4+ Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/vector.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6066d945c40e..5d30c5e42bb1 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -661,11 +661,28 @@ void irq_complete_move(struct irq_cfg *cfg) */ void irq_force_complete_move(struct irq_desc *desc) { - struct irq_data *irqdata = irq_desc_get_irq_data(desc); - struct apic_chip_data *data = apic_chip_data(irqdata); - struct irq_cfg *cfg = data ? &data->cfg : NULL; + struct irq_data *irqdata; + struct apic_chip_data *data; + struct irq_cfg *cfg; unsigned int cpu; + /* + * The function is called for all descriptors regardless of which + * irqdomain they belong to. For example if an IRQ is provided by + * an irq_chip as part of a GPIO driver, the chip data for that + * descriptor is specific to the irq_chip in question. + * + * Check first that the chip_data is what we expect + * (apic_chip_data) before touching it any further. + */ + irqdata = irq_domain_get_irq_data(x86_vector_domain, + irq_desc_get_irq(desc)); + if (!irqdata) + return; + + data = apic_chip_data(irqdata); + cfg = data ? &data->cfg : NULL; + if (!cfg) return; -- cgit v1.2.3 From 2df0e78b44e2cbbaa1e319cbca34f23599a4daa0 Mon Sep 17 00:00:00 2001 From: "sylvain.bertrand@gmail.com" Date: Thu, 29 Sep 2016 16:22:34 +0000 Subject: x86/syscalls: Remove bash-isms in syscall table generator Signed-off-by: Sylvain BERTRAND Link: http://lkml.kernel.org/r/20160929162234.GA29592@freedom Signed-off-by: Thomas Gleixner --- arch/x86/entry/syscalls/syscalltbl.sh | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh index cd3d3015d7df..751d1f992630 100644 --- a/arch/x86/entry/syscalls/syscalltbl.sh +++ b/arch/x86/entry/syscalls/syscalltbl.sh @@ -10,8 +10,11 @@ syscall_macro() { # Entry can be either just a function name or "function/qualifier" real_entry="${entry%%/*}" - qualifier="${entry:${#real_entry}}" # Strip the function name - qualifier="${qualifier:1}" # Strip the slash, if any + if [ "$entry" = "$real_entry" ]; then + qualifier= + else + qualifier=${entry#*/} + fi echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)" } @@ -22,7 +25,7 @@ emit() { entry="$3" compat="$4" - if [ "$abi" == "64" -a -n "$compat" ]; then + if [ "$abi" = "64" -a -n "$compat" ]; then echo "a compat entry for a 64-bit syscall makes no sense" >&2 exit 1 fi @@ -45,17 +48,17 @@ emit() { grep '^[0-9]' "$in" | sort -n | ( while read nr abi name entry compat; do abi=`echo "$abi" | tr '[a-z]' '[A-Z]'` - if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then + if [ "$abi" = "COMMON" -o "$abi" = "64" ]; then # COMMON is the same as 64, except that we don't expect X32 # programs to use it. Our expectation has nothing to do with # any generated code, so treat them the same. emit 64 "$nr" "$entry" "$compat" - elif [ "$abi" == "X32" ]; then + elif [ "$abi" = "X32" ]; then # X32 is equivalent to 64 on an X32-compatible kernel. echo "#ifdef CONFIG_X86_X32_ABI" emit 64 "$nr" "$entry" "$compat" echo "#endif" - elif [ "$abi" == "I386" ]; then + elif [ "$abi" = "I386" ]; then emit "$abi" "$nr" "$entry" "$compat" else echo "Unknown abi $abi" >&2 -- cgit v1.2.3 From b91688f528fe96e09d17e6d87c1b2805eb0c445e Mon Sep 17 00:00:00 2001 From: Renat Valiullin Date: Tue, 4 Oct 2016 13:11:48 -0700 Subject: x86/vmware: Skip lapic calibration on VMware In a virtualized environment the APIC timer calibration can go wrong when the host is overcommitted or the guest is running nested. This results in the APIC timers operating at an incorrect frequency. Since VMware supports a mechanism to retrieve the local APIC frequency we can ask the hypervisor for it and skip the APIC calibration loop. Signed-off-by: Renat Valiullin Acked-by: Alok N Kataria Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161004201148.GA1421@uu64vm Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/vmware.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 1ff0598d309c..81160578b91a 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -27,6 +27,7 @@ #include #include #include +#include #define CPUID_VMWARE_INFO_LEAF 0x40000000 #define VMWARE_HYPERVISOR_MAGIC 0x564D5868 @@ -82,10 +83,17 @@ static void __init vmware_platform_setup(void) VMWARE_PORT(GETHZ, eax, ebx, ecx, edx); - if (ebx != UINT_MAX) + if (ebx != UINT_MAX) { x86_platform.calibrate_tsc = vmware_get_tsc_khz; - else +#ifdef CONFIG_X86_LOCAL_APIC + /* Skip lapic calibration since we know the bus frequency. */ + lapic_timer_frequency = ecx / HZ; + pr_info("Host bus clock speed read from hypervisor : %u Hz\n", + ecx); +#endif + } else { pr_warn("Failed to get TSC freq from the hypervisor\n"); + } } /* -- cgit v1.2.3 From cfee9eddcd61e28b73468647fc4aa7ff2d706254 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 6 Oct 2016 00:28:40 -0500 Subject: x86/unwind: Fix oprofile module link error When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n, the oprofile module fails to link: ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined! The problem was introduced when oprofile was converted to use the new x86 unwinder. When frame pointers are disabled, the "guess" unwinder's unwind_get_return_address() is an inline function which calls ftrace_graph_ret_addr(), which is not exported. Fix it by converting the "guess" version of unwind_get_return_address() to an exported out-of-line function, just like its frame pointer counterpart. Reported-by: Karl Beldan Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: ec2ad9ccf12d ("oprofile/x86: Convert x86_backtrace() to use the new unwinder") Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/unwind.h | 14 ++------------ arch/x86/kernel/unwind_guess.c | 10 ++++++++++ 2 files changed, 12 insertions(+), 12 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index c4b6d1cafa46..46de9ac4b990 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -23,6 +23,8 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, bool unwind_next_frame(struct unwind_state *state); +unsigned long unwind_get_return_address(struct unwind_state *state); + static inline bool unwind_done(struct unwind_state *state) { return state->stack_info.type == STACK_TYPE_UNKNOWN; @@ -48,8 +50,6 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) return state->bp + 1; } -unsigned long unwind_get_return_address(struct unwind_state *state); - #else /* !CONFIG_FRAME_POINTER */ static inline @@ -58,16 +58,6 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) return NULL; } -static inline -unsigned long unwind_get_return_address(struct unwind_state *state) -{ - if (unwind_done(state)) - return 0; - - return ftrace_graph_ret_addr(state->task, &state->graph_idx, - *state->sp, state->sp); -} - #endif /* CONFIG_FRAME_POINTER */ #endif /* _ASM_X86_UNWIND_H */ diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c index b5a834c93065..9298993dc8b7 100644 --- a/arch/x86/kernel/unwind_guess.c +++ b/arch/x86/kernel/unwind_guess.c @@ -5,6 +5,16 @@ #include #include +unsigned long unwind_get_return_address(struct unwind_state *state) +{ + if (unwind_done(state)) + return 0; + + return ftrace_graph_ret_addr(state->task, &state->graph_idx, + *state->sp, state->sp); +} +EXPORT_SYMBOL_GPL(unwind_get_return_address); + bool unwind_next_frame(struct unwind_state *state) { struct stack_info *info = &state->stack_info; -- cgit v1.2.3 From 2a51fe083eba7f99cbda72f5ef90cdf2f4df882c Mon Sep 17 00:00:00 2001 From: Prarit Bhargava Date: Mon, 3 Oct 2016 13:07:12 -0400 Subject: arch/x86: Handle non enumerated CPU after physical hotplug When a CPU is physically added to a system then the MADT table is not updated. If subsequently a kdump kernel is started on that physically added CPU then the ACPI enumeration fails to provide the information for this CPU which is now the boot CPU of the kdump kernel. As a consequence, generic_processor_info() is not invoked for that CPU so the number of enumerated processors is 0 and none of the initializations, including the logical package id management, are performed. We have code which relies on the correctness of the logical package map and other information which is initialized via generic_processor_info(). Executing such code will result in undefined behaviour or kernel crashes. This problem applies only to the kdump kernel because a normal kexec will switch to the original boot CPU, which is enumerated in MADT, before jumping into the kexec kernel. The boot code already has a check for num_processors equal 0 in prefill_possible_map(). We can use that check as an indicator that the enumeration of the boot CPU did not happen and invoke generic_processor_info() for it. That initializes the relevant data for the boot CPU and therefore prevents subsequent failure. [ tglx: Refined the code and rewrote the changelog ] Signed-off-by: Prarit Bhargava Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id") Cc: Peter Zijlstra Cc: Len Brown Cc: Borislav Petkov Cc: Andi Kleen Cc: Jiri Olsa Cc: Juergen Gross Cc: dyoung@redhat.com Cc: Eric Biederman Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/smpboot.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 42a93621f5b0..951f093a96fe 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1407,9 +1407,21 @@ __init void prefill_possible_map(void) { int i, possible; - /* no processor from mptable or madt */ - if (!num_processors) - num_processors = 1; + /* No boot processor was found in mptable or ACPI MADT */ + if (!num_processors) { + int apicid = boot_cpu_physical_apicid; + int cpu = hard_smp_processor_id(); + + pr_warn("Boot CPU (id %d) not listed by BIOS\n", cpu); + + /* Make sure boot cpu is enumerated */ + if (apic->cpu_present_to_apicid(0) == BAD_APICID && + apic->apic_id_valid(apicid)) + generic_processor_info(apicid, boot_cpu_apic_version); + + if (!num_processors) + num_processors = 1; + } i = setup_max_cpus ?: 1; if (setup_possible_cpus == -1) { -- cgit v1.2.3 From f3bf1dbe64b62a2058dd1944c00990df203e8e7a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 7 Oct 2016 14:02:12 +0200 Subject: x86/acpi: Prevent LAPIC id 0xff from being accounted Yinghai reported that the recent changes to make the cpuid - nodeid relationship permanent causes a cpuid ordering regression on a system which has 2apic enabled.. The reason is that the ACPI local APIC parser has no sanity check for apicid 0xff, which is an invalid id. So a CPU id for this invalid local APIC id is allocated and therefor breaks the cpuid ordering. Add a sanity check to acpi_parse_lapic() which ignores the invalid id. Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-by: Yinghai Lu Signed-off-by: Thomas Gleixner Cc: Gu Zheng , Cc: Tang Chen Cc: douly.fnst@cn.fujitsu.com, Cc: zhugh.fnst@cn.fujitsu.com Cc: Tony Luck Cc: Rafael J. Wysocki Cc: Len Brown Cc: Lv Zheng , Cc: robert.moore@intel.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com --- arch/x86/kernel/acpi/boot.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 32a7d70913ac..8a5abaa7d453 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -233,6 +233,10 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) acpi_table_print_madt_entry(header); + /* Ignore invalid ID */ + if (processor->id == 0xff) + return 0; + /* * We need to register disabled CPU as well to permit * counting disabled CPUs. This allows us to size -- cgit v1.2.3 From df610d678893c85b82d3a68eea0d87dd4e03e615 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 7 Oct 2016 15:55:13 +0200 Subject: x86/apic: Prevent pointless warning messages Markus reported that he sees new warnings: APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored. APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored. This comes from the recent persistant cpuid - nodeid changes. The code which emits the warning has been called prior to these changes only for enabled processors. Now it's called for disabled processors as well to get the possible cpu accounting correct. So if the kernel is compiled for the number of actual available/enabled CPUs and the BIOS reports disabled CPUs as well then the above warnings are printed. That's a pointless exercise as it only makes sense if there are more CPUs enabled than the kernel supports. Nake the warning conditional on enabled processors so we are back to the state before these changes. Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-and-tested-by: Markus Trippelsdorf Cc: One Thousand Gnomes Cc: Dou Liyang Cc: linux-acpi@vger.kernel.org Cc: Gu Zheng Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/apic.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index f266b8a92a9e..88c657b057e2 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2128,9 +2128,11 @@ int __generic_processor_info(int apicid, int version, bool enabled) if (num_processors >= nr_cpu_ids) { int thiscpu = max + disabled_cpus; - pr_warning( - "APIC: NR_CPUS/possible_cpus limit of %i reached." - " Processor %d/0x%x ignored.\n", max, thiscpu, apicid); + if (enabled) { + pr_warning("APIC: NR_CPUS/possible_cpus limit of %i " + "reached. Processor %d/0x%x ignored.\n", + max, thiscpu, apicid); + } disabled_cpus++; return -EINVAL; -- cgit v1.2.3 From d4b05923f579c234137317cdf9a5eb69ddab76d1 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 7 Oct 2016 09:23:42 -0700 Subject: x86/pkeys: Make protection keys an "eager" feature Our XSAVE features are divided into two categories: those that generate FPU exceptions, and those that do not. MPX and pkeys do not generate FPU exceptions and thus can not be used lazily. We disable them when lazy mode is forced on. We have a pair of masks to collect these two sets of features, but XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY. Fix it by moving the feature to XFEATURE_MASK_EAGER. Note: this only causes problem if you boot with lazy FPU mode (eagerfpu=off) which is *not* the default. It also only affects hardware which is not currently publicly available. It looks like eager mode is going away, but we still need this patch applied to any kernel that has protection keys and lazy mode, which is 4.6 through 4.8 at this point, and 4.9 if the lazy removal isn't sent to Linus for 4.9. Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures") Signed-off-by: Dave Hansen Cc: Dave Hansen Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/fpu/xstate.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index d4957ac72b48..430bacf73074 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -27,11 +27,12 @@ XFEATURE_MASK_YMM | \ XFEATURE_MASK_OPMASK | \ XFEATURE_MASK_ZMM_Hi256 | \ - XFEATURE_MASK_Hi16_ZMM | \ - XFEATURE_MASK_PKRU) + XFEATURE_MASK_Hi16_ZMM) /* Supported features which require eager state saving */ -#define XFEATURE_MASK_EAGER (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR) +#define XFEATURE_MASK_EAGER (XFEATURE_MASK_BNDREGS | \ + XFEATURE_MASK_BNDCSR | \ + XFEATURE_MASK_PKRU) /* All currently supported features */ #define XCNTXT_MASK (XFEATURE_MASK_LAZY | XFEATURE_MASK_EAGER) -- cgit v1.2.3