summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)Author
2014-01-27Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...
2014-01-27Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc mremap fix from Ben Herrenschmidt: "This is the patch that I had sent after -rc8 and which we decided to wait before merging. It's based on a different tree than my -next branch (it needs some pre-reqs that were in -rc4 or so while my -next is based on -rc1) so I left it as a separate branch for your to pull. It's identical to the request I did 2 or 3 weeks back. This fixes crashes in mremap with THP on powerpc. The fix however requires a small change in the generic code. It moves a condition into a helper we can override from the arch which is harmless, but it *also* slightly changes the order of the set_pmd and the withdraw & deposit, which should be fine according to Kirill (who wrote that code) but I agree -rc8 is a bit late... It was acked by Kirill and Andrew told me to just merge it via powerpc" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/thp: Fix crash on mremap
2014-01-28powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()Michael Ellerman
At a glance these are just the inverse of each other. The one subtlety is that arch_spin_value_unlocked() takes the lock by value, rather than as a pointer, which is important for the lockref code. On the other hand arch_spin_is_locked() doesn't really care, so implement it in terms of arch_spin_value_unlocked(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-28powerpc: Add support for the optimised lockref implementationMichael Ellerman
This commit adds the architecture support required to enable the optimised implementation of lockrefs. That's as simple as defining arch_spin_value_unlocked() and selecting the Kconfig option. We also define cmpxchg64_relaxed(), because the lockref code does not need the cmpxchg to have barrier semantics. Using Linus' test case[1] on one system I see a 4x improvement for the basic enablement, and a further 1.3x for cmpxchg64_relaxed(), for a total of 5.3x vs the baseline. On another system I see more like 2x improvement. [1]: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...
2014-01-23powerpc: use generic fixmap.hMark Salter
Signed-off-by: Mark Salter <msalter@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add Intel RAPL energy counter support (Stephane Eranian) - Clean up uprobes (Oleg Nesterov) - Optimize ring-buffer writes (Peter Zijlstra) Tooling side changes, user visible: - 'perf diff': - Add column colouring improvements (Ramkumar Ramachandra) - 'perf kvm': - Add guest related improvements, including allowing to specify a directory with guest specific /proc information (Dongsheng Yang) - Add shell completion support (Ramkumar Ramachandra) - Add '-v' option (Dongsheng Yang) - Support --guestmount (Dongsheng Yang) - 'perf probe': - Support showing source code, asking for variables to be collected at probe time and other 'perf probe' operations that use DWARF information. This supports only binaries with debugging information at this time, detached debuginfo (aka debuginfo packages) support should come in later patches (Masami Hiramatsu) - 'perf record': - Rename --no-delay option to --no-buffering, better reflecting its purpose and freeing up '--delay' to take the place of '--initial-delay', so that 'record' and 'stat' are consistent (Arnaldo Carvalho de Melo) - Default the -t/--thread option to no inheritance (Adrian Hunter) - Make per-cpu mmaps the default (Adrian Hunter) - 'perf report': - Improve callchain processing performance (Frederic Weisbecker) - Retain bfd reference to lookup source line numbers, greatly optimizing, among other use cases, 'perf report -s srcline' (Adrian Hunter) - Improve callchain processing performance even more (Namhyung Kim) - Add a perf.data file header window in the 'perf report' TUI, associated with the 'i' hotkey, providing a counterpart to the --header option in the stdio UI (Namhyung Kim) - 'perf script': - Add an option in 'perf script' to print the source line number (Adrian Hunter) - Add --header/--header-only options to 'script' and 'report', the default is not tho show the header info, but as this has been the default for some time, leave a single line explaining how to obtain that information (Jiri Olsa) - Add options to show comm, fork, exit and mmap PERF_RECORD_ events (Namhyung Kim) - Print callchains and symbols if they exist (David Ahern) - 'perf timechart' - Add backtrace support to CPU info - Print pid along the name - Add support for CPU topology - Add new option --highlight'ing threads, be it by name or, if a numeric value is provided, that run more than given duration (Stanislav Fomichev) - 'perf top': - Make 'perf top -g' refer to callchains, for consistency with other tools (David Ahern) - 'perf trace': - Handle old kernels where the "raw_syscalls" tracepoints were called plain "syscalls" (David Ahern) - Remove thread summary coloring, by Pekka Enberg. - Honour -m option in 'trace', the tool was offering the option to set the mmap size, but wasn't using it when doing the actual mmap on the events file descriptors (Jiri Olsa) - generic: - Backport libtraceevent plugin support (trace-cmd repository, with plugins for jbd2, hrtimer, kmem, kvm, mac80211, sched_switch, function, xen, scsi, cfg80211 (Jiri Olsa) - Print session information only if --stdio is given (Namhyung Kim) Tooling side changes, developer visible (plumbing): - Improve 'perf probe' exit path, release resources (Masami Hiramatsu) - Improve libtraceevent plugins exit path, allowing the registering of an unregister handler to be called at exit time (Namhyung Kim) - Add an alias to the build test makefile (make -C tools/perf build-test) (Namhyung Kim) - Get rid of die() and friends (good riddance!) in libtraceevent (Namhyung Kim) - Fix cross build problems related to pkgconfig and CROSS_COMPILE not being propagated to the feature tests, leading to features being tested in the host and then being enabled on the target (Mark Rutland) - Improve forked workload error reporting by sending the errno in the signal data queueing integer field, using sigqueue and by doing the signal setup in the evlist methods, removing open coded equivalents in various tools (Arnaldo Carvalho de Melo) - Do more auto exit cleanup chores in the 'evlist' destructor, so that the tools don't have to all do that sequence (Arnaldo Carvalho de Melo) - Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho de Melo) - Add test for building detached source tarballs (Arnaldo Carvalho de Melo) - Move some header files (tools/perf/ to tools/include/ to make them available to other tools/ dwelling codebases (Namhyung Kim) - Move logic to warn about kptr_restrict'ed kernels to separate function in 'report' (Arnaldo Carvalho de Melo) - Move hist browser selection code to separate function (Arnaldo Carvalho de Melo) - Move histogram entries collapsing to separate function (Arnaldo Carvalho de Melo) - Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo) - Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri Olsa) - Move arch setup into seprate Makefile (Jiri Olsa) - Make libtraceevent install target quieter (Jiri Olsa) - Make tests/make output more compact (Jiri Olsa) - Ignore generated files in feature-checks (Chunwei Chen) - Introduce pevent_filter_strerror() in libtraceevent, similar in purpose to libc's strerror() function (Namhyung Kim) - Use perf_data_file methods to write output file in 'record' and 'inject' (Jiri Olsa) - Use pr_*() functions where applicable in 'report' (Namhyumg Kim) - Add 'machine' 'addr_location' struct to have full picture (machine, thread, map, symbol, addr) for a (partially) resolved address, reducing function signatures (Arnaldo Carvalho de Melo) - Reduce code duplication in the histogram entry creation/insertion (Arnaldo Carvalho de Melo) - Auto allocate annotation histogram data structures (Arnaldo Carvalho de Melo) - No need to test against NULL before calling free, also set freed memory in struct pointers to NULL, to help fixing use after free bugs (Arnaldo Carvalho de Melo) - Rename some struct DSO binary_type related members and methods, to clarify its purpose and need for differentiation (symtab_type, ie one is about the files .text, CFI, etc, i.e. its binary contents, and the other is about where the symbol table came from (Arnaldo Carvalho de Melo) - Convert to new topic libraries, starting with an API one (sysfs, debugfs, etc), renaming liblk in the process (Borislav Petkov) - Get rid of some more panic() like error handling in libtraceevent. (Namhyung Kim) - Get rid of panic() like calls in libtraceevent (Namyung Kim) - Start carving out symbol parsing routines (perf, just moving routines to topic files in tools/lib/symbol/, tools that want to use it need to integrate it directly, ie no tools/lib/symbol/Makefile is provided (Arnaldo Carvalho de Melo) - Assorted refactoring patches, moving code around and adding utility evlist methods that will be used in the IPT patchset (Adrian Hunter) - Assorted mmap_pages handling fixes (Adrian Hunter) - Several man pages typo fixes (Dongsheng Yang) - Get rid of several die() calls in libtraceevent (Namhyung Kim) - Use basename() in a more robust way, to avoid problems related to different system library implementations for that function (Stephane Eranian) - Remove open coded management of short_name_allocated member (Adrian Hunter) - Several cleanups in the "dso" methods, constifying some parameters and renaming some fields to clarify its purpose (Arnaldo Carvalho de Melo) - Add per-feature check flags, fixing libunwind related build problems on some architectures (Jean Pihet) - Do not disable source line lookup just because of one failure. (Adrian Hunter) - Several 'perf kvm' man page corrections (Dongsheng Yang) - Correct the message in feature-libnuma checking, swowing the right devel package names for various distros (Dongsheng Yang) - Polish 'readn()' function and introduce its counterpart, 'writen()' (Jiri Olsa) - Start moving timechart state from global variables to a 'perf_tool' derived 'timechart' struct (Arnaldo Carvalho de Melo) ... and lots of fixes and improvements I forgot to list" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf tools: Remove unnecessary callchain cursor state restore on unmatch perf callchain: Spare double comparison of callchain first entry perf tools: Do proper comm override error handling perf symbols: Export elf_section_by_name and reuse perf probe: Release all dynamically allocated parameters perf probe: Release allocated probe_trace_event if failed perf tools: Add 'build-test' make target tools lib traceevent: Unregister handler when xen plugin is unloaded tools lib traceevent: Unregister handler when scsi plugin is unloaded tools lib traceevent: Unregister handler when jbd2 plugin is is unloaded tools lib traceevent: Unregister handler when cfg80211 plugin is unloaded tools lib traceevent: Unregister handler when mac80211 plugin is unloaded tools lib traceevent: Unregister handler when sched_switch plugin is unloaded tools lib traceevent: Unregister handler when kvm plugin is unloaded tools lib traceevent: Unregister handler when kmem plugin is unloaded tools lib traceevent: Unregister handler when hrtimer plugin is unloaded tools lib traceevent: Unregister handler when function plugin is unloaded tools lib traceevent: Add pevent_unregister_print_function() tools lib traceevent: Add pevent_unregister_event_handler() tools lib traceevent: fix pointer-integer size mismatch ...
2014-01-20Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: - futex performance increases: larger hashes, smarter wakeups - mutex debugging improvements - lots of SMP ordering documentation updates - introduce the smp_load_acquire(), smp_store_release() primitives. (There are WIP patches that make use of them - not yet merged) - lockdep micro-optimizations - lockdep improvement: better cover IRQ contexts - liblockdep at last. We'll continue to monitor how useful this is * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) futexes: Fix futex_hashsize initialization arch: Re-sort some Kbuild files to hopefully help avoid some conflicts futexes: Avoid taking the hb->lock if there's nothing to wake up futexes: Document multiprocessor ordering guarantees futexes: Increase hash table size for better performance futexes: Clean up various details arch: Introduce smp_load_acquire(), smp_store_release() arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE mutexes: Give more informative mutex warning in the !lock->owner case powerpc: Full barrier for smp_mb__after_unlock_lock() rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier Documentation/memory-barriers.txt: Document ACCESS_ONCE() Documentation/memory-barriers.txt: Prohibit speculative writes Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" ...
2014-01-20Merge branch 'core-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core debug changes from Ingo Molnar: "Currently there are two methods to set the panic_timeout: via 'panic=X' boot commandline option, or via /proc/sys/kernel/panic. This tree adds a third panic_timeout configuration method: configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to distros that generally want their kernel defaults to come with the .config. CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default value of panic_timeout. Doing that unearthed a few arch trickeries regarding arch-special panic_timeout values and related complications - hopefully all resolved to the satisfaction of everyone" * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc: Clean up panic_timeout usage MIPS: Remove panic_timeout settings panic: Make panic_timeout configurable
2014-01-18net: introduce SO_BPF_EXTENSIONSMichal Sekletar
For user space packet capturing libraries such as libpcap, there's currently only one way to check which BPF extensions are supported by the kernel, that is, commit aa1113d9f85d ("net: filter: return -EINVAL if BPF_S_ANC* operation is not supported"). For querying all extensions at once this might be rather inconvenient. Therefore, this patch introduces a new option which can be used as an argument for getsockopt(), and allows one to obtain information about which BPF extensions are supported by the current kernel. As David Miller suggests, we do not need to define any bits right now and status quo can just return 0 in order to state that this versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later additions to BPF extensions need to add their bits to the bpf_tell_extensions() function, as documented in the comment. Signed-off-by: Michal Sekletar <msekleta@redhat.com> Cc: David Miller <davem@davemloft.net> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16Merge branch 'perf/urgent' into perf/coreIngo Molnar
Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15powerpc/powernv: Call OPAL sync before kexec'ingVasant Hegde
Its possible that OPAL may be writing to host memory during kexec (like dump retrieve scenario). In this situation we might end up corrupting host memory. This patch makes OPAL sync call to make sure OPAL stops writing to host memory before kexec'ing. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Handle multiple EEH errorsGavin Shan
For one PCI error relevant OPAL event, we possibly have multiple EEH errors for that. For example, multiple frozen PEs detected on different PHBs. Unfortunately, we didn't cover the case. The patch enumarates the return value from eeh_ops::next_error() and change eeh_handle_special_event() and eeh_ops::next_error() to handle all existing EEH errors. As Ben pointed out, we needn't list_for_each_entry_safe() since we are not deleting any PHB from the hose_list and the EEH serialized lock should be held while purging EEH events. The patch covers those suggestions as well. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/thp: Fix crash on mremapAneesh Kumar K.V
This patch fix the below crash NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440 LR [c0000000000439ac] .hash_page+0x18c/0x5e0 ... Call Trace: [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable) [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0 [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58 On ppc64 we use the pgtable for storing the hpte slot information and store address to the pgtable at a constant offset (PTRS_PER_PMD) from pmd. On mremap, when we switch the pmd, we need to withdraw and deposit the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset from new pmd. We also want to move the withdraw and deposit before the set_pmd so that, when page fault find the pmd as trans huge we can be sure that pgtable can be located at the offset. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt
Freescale updates from Scott: << Highlights include 32-bit booke relocatable support, e6500 hardware tablewalk support, various e500 SPE fixes, some new/revived boards, and e6500 deeper idle and altivec powerdown modes. >>
2014-01-15powerpc: Don't corrupt transactional state when using FP/VMX in kernelPaul Mackerras
Currently, when we have a process using the transactional memory facilities on POWER8 (that is, the processor is in transactional or suspended state), and the process enters the kernel and the kernel then uses the floating-point or vector (VMX/Altivec) facility, we end up corrupting the user-visible FP/VMX/VSX state. This happens, for example, if a page fault causes a copy-on-write operation, because the copy_page function will use VMX to do the copy on POWER8. The test program below demonstrates the bug. The bug happens because when FP/VMX state for a transactional process is stored in the thread_struct, we store the checkpointed state in .fp_state/.vr_state and the transactional (current) state in .transact_fp/.transact_vr. However, when the kernel wants to use FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(), which saves the current state in .fp_state/.vr_state. Furthermore, when we return to the user process we return with FP/VMX/VSX disabled. The next time the process uses FP/VMX/VSX, we don't know which set of state (the current register values, .fp_state/.vr_state, or .transact_fp/.transact_vr) we should be using, since we have no way to tell if we are still in the same transaction, and if not, whether the previous transaction succeeded or failed. Thus it is necessary to strictly adhere to the rule that if FP has been enabled at any point in a transaction, we must keep FP enabled for the user process with the current transactional state in the FP registers, until we detect that it is no longer in a transaction. Similarly for VMX; once enabled it must stay enabled until the process is no longer transactional. In order to keep this rule, we add a new thread_info flag which we test when returning from the kernel to userspace, called TIF_RESTORE_TM. This flag indicates that there is FP/VMX/VSX state to be restored before entering userspace, and when it is set the .tm_orig_msr field in the thread_struct indicates what state needs to be restored. The restoration is done by restore_tm_state(). The TIF_RESTORE_TM bit is set by new giveup_fpu/altivec_maybe_transactional helpers, which are called from enable_kernel_fp/altivec, giveup_vsx, and flush_fp/altivec_to_thread instead of giveup_fpu/altivec. The other thing to be done is to get the transactional FP/VMX/VSX state from .fp_state/.vr_state when doing reclaim, if that state has been saved there by giveup_fpu/altivec_maybe_transactional. Having done this, we set the FP/VMX bit in the thread's MSR after reclaim to indicate that that part of the state is now valid (having been reclaimed from the processor's checkpointed state). Finally, in the signal handling code, we move the clearing of the transactional state bits in the thread's MSR a bit earlier, before calling flush_fp_to_thread(), so that we don't unnecessarily set the TIF_RESTORE_TM bit. This is the test program: /* Michael Neuling 4/12/2013 * * See if the altivec state is leaked out of an aborted transaction due to * kernel vmx copy loops. * * gcc -m64 htm_vmxcopy.c -o htm_vmxcopy * */ /* We don't use all of these, but for reference: */ int main(int argc, char *argv[]) { long double vecin = 1.3; long double vecout; unsigned long pgsize = getpagesize(); int i; int fd; int size = pgsize*16; char tmpfile[] = "/tmp/page_faultXXXXXX"; char buf[pgsize]; char *a; uint64_t aborted = 0; fd = mkstemp(tmpfile); assert(fd >= 0); memset(buf, 0, pgsize); for (i = 0; i < size; i += pgsize) assert(write(fd, buf, pgsize) == pgsize); unlink(tmpfile); a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); assert(a != MAP_FAILED); asm __volatile__( "lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value TBEGIN "beq 3f ;" TSUSPEND "xxlxor 40,40,40 ; " // set 40 to 0 "std 5, 0(%[map]) ;" // cause kernel vmx copy page TABORT TRESUME TEND "li %[res], 0 ;" "b 5f ;" "3: ;" // Abort handler "li %[res], 1 ;" "5: ;" "stxvd2x 40,0,%[vecoutptr] ; " : [res]"=r"(aborted) : [vecinptr]"r"(&vecin), [vecoutptr]"r"(&vecout), [map]"r"(a) : "memory", "r0", "r3", "r4", "r5", "r6", "r7"); if (aborted && (vecin != vecout)){ printf("FAILED: vector state leaked on abort %f != %f\n", (double)vecin, (double)vecout); exit(1); } munmap(a, size); close(fd); printf("PASSED!\n"); return 0; } Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Reclaim two unused thread_info flag bitsPaul Mackerras
TIF_PERFMON_WORK and TIF_PERFMON_CTXSW are completely unused. They appear to be related to the old perfmon2 code, which has been superseded by the perf_event infrastructure. This removes their definitions so that the bits can be used for other purposes. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15Move precessing of MCE queued event out from syscall exit path.Mahesh Salgaonkar
Huge Dickins reported an issue that b5ff4211a829 "powerpc/book3s: Queue up and process delayed MCE events" breaks the PowerMac G5 boot. This patch fixes it by moving the mce even processing away from syscall exit, which was wrong to do that in first place, and using irq work framework to delay processing of mce event. Reported-by: Hugh Dickins <hughd@google.com Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Fix the setup of CPU-to-Node mappings during CPU onlineSrivatsa S. Bhat
On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Hotplug improvementGavin Shan
When EEH error comes to one specific PCI device before its driver is loaded, we will apply hotplug to recover the error. During the plug time, the PCI device will be probed and its driver is loaded. Then we wrongly calls to the error handlers if the driver supports EEH explicitly. The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and set it before we remove the PCI device. In turn, we can avoid wrongly calls the error handlers of the PCI device after its driver loaded. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config spaceGavin Shan
The patch implements the EEH operation backend restore_config() for PowerNV platform. That relies on OPAL API opal_pci_reinit() where we reinitialize the error reporting properly after PE or PHB reset. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Add restore_config operationGavin Shan
After reset on the specific PE or PHB, we never configure AER correctly on PowerNV platform. We needn't care it on pSeries platform. The patch introduces additional EEH operation eeh_ops:: restore_config() so that we have chance to configure AER correctly for PowerNV platform. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-13Merge tag 'v3.13-rc8' into core/lockingIngo Molnar
Refresh the tree with the latest fixes, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12arch: Introduce smp_load_acquire(), smp_store_release()Peter Zijlstra
A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-10powerpc/85xx: handle the eLBC error interrupt if it exists in dtsShaohui Xie
On P1020, P1021, P1022, and P1023, eLBC event interrupts are routed to internal interrupt 3 while ELBC error interrupts are routed to internal interrupt 0. We need to call request_irq for each. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> [scottwood@freescale.com: reworded commit message and fixed author] Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09powerpc/e6500: TLB miss handler with hardware tablewalk supportScott Wood
There are a few things that make the existing hw tablewalk handlers unsuitable for e6500: - Indirect entries go in TLB1 (though the resulting direct entries go in TLB0). - It has threads, but no "tlbsrx." -- so we need a spinlock and a normal "tlbsx". Because we need this lock, hardware tablewalk is mandatory on e6500 unless we want to add spinlock+tlbsx to the normal bolted TLB miss handler. - TLB1 has no HES (nor next-victim hint) so we need software round robin (TODO: integrate this round robin data with hugetlb/KVM) - The existing tablewalk handlers map half of a page table at a time, because IBM hardware has a fixed 1MiB indirect page size. e6500 has variable size indirect entries, with a minimum of 2MiB. So we can't do the half-page indirect mapping, and even if we could it would be less efficient than mapping the full page. - Like on e5500, the linear mapping is bolted, so we don't need the overhead of supporting nested tlb misses. Note that hardware tablewalk does not work in rev1 of e6500. We do not expect to support e6500 rev1 in mainline Linux. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com>
2014-01-09powerpc: introduce macro LOAD_REG_ADDR_PICKevin Hao
This is used to get the address of a variable when the kernel is not running at the linked or relocated address. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 defineWang Dongsheng
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle patches. Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07powerpc: fix exception clearing in e500 SPE float emulationJoseph Myers
The e500 SPE floating-point emulation code clears existing exceptions (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the emulated operation. However, these exception bits are the "sticky", cumulative exception bits, and should only be cleared by the user program setting SPEFSCR, not implicitly by any floating-point instruction (whether executed purely by the hardware or emulated). The spurious clearing of these bits shows up as missing exceptions in glibc testing. Fixing this, however, is not as simple as just not clearing the bits, because while the bits may be from previous floating-point operations (in which case they should not be cleared), the processor can also set the sticky bits itself before the interrupt for an exception occurs, and this can happen in cases when IEEE 754 semantics are that the sticky bit should not be set. Specifically, the "invalid" sticky bit is set in various cases with non-finite operands, where IEEE 754 semantics do not involve raising such an exception, and the "underflow" sticky bit is set in cases of exact underflow, whereas IEEE 754 semantics are that this flag is set only for inexact underflow. Thus, for correct emulation the kernel needs to know the setting of these two sticky bits before the instruction being emulated. When a floating-point operation raises an exception, the kernel can note the state of the sticky bits immediately afterwards. Some <fenv.h> functions that affect the state of these bits, such as fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and PR_SET_FPEXC anyway, and so it is natural to record the state of those bits during that call into the kernel and so avoid any need for a separate call into the kernel to inform it of a change to those bits. Thus, the interface I chose to use (in this patch and the glibc port) is that one of those prctl calls must be made after any userspace change to those sticky bits, other than through a floating-point operation that traps into the kernel anyway. feclearexcept and fesetexceptflag duly make those calls, which would not be required were it not for this issue. The previous EGLIBC port, and the uClibc code copied from it, is fundamentally broken as regards any use of prctl for floating-point exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl calls (and did various worse things, such as passing a pointer when prctl expected an integer). If you avoid anything where prctl is used, the clearing of sticky bits still means it will never give anything approximating correct exception semantics with existing kernels. I don't believe the patch makes things any worse for existing code that doesn't try to inform the kernel of changes to sticky bits - such code may get incorrect exceptions in some cases, but it would have done so anyway in other cases. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07powerpc/booke64: Add LRAT error exception handlerMihai Caraman
LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware translation from a logical page number (LPN) to a real page number (RPN) when tlbwe is executed by a guest or when a page table translation occurs from a guest virtual address. Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM handler to avoid build breakage. This is a prerequisite for KVM LRAT support that will follow. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-30Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "A bit more endian problems found during testing of 3.13 and a few other simple fixes and regressions fixes" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Fix alignment of secondary cpu spin vars powerpc: Align p_end powernv/eeh: Add buffer for P7IOC hub error data powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag() powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian powerpc: Make unaligned accesses endian-safe for powerpc powerpc: Fix bad stack check in exception entry powerpc/512x: dts: disable MPC5125 usb module powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
2013-12-30Merge branch 'merge' into nextBenjamin Herrenschmidt
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such as Anton Little Endian fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Update the generic code to use dynamic iommu page sizesAlistair Popple
This patch updates the generic iommu backend code to use the it_page_shift field to determine the iommu page size instead of using hardcoded values. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Add it_page_shift field to determine iommu page sizeAlistair Popple
This patch adds a it_page_shift field to struct iommu_table and initiliases it to 4K for all platforms. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Update constant names to reflect their hardcoded page sizeAlistair Popple
The powerpc iommu uses a hardcoded page size of 4K. This patch changes the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A future patch will use the existing names to support dynamic page sizes. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc: Make unaligned accesses endian-safe for powerpcRajesh B Prathipati
The generic put_unaligned/get_unaligned macros were made endian-safe by calling the appropriate endian dependent macros based on the endian type of the powerpc processor. Signed-off-by: Rajesh B Prathipati <rprathip@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc: Fix bad stack check in exception entryMichael Neuling
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1) is valid when coming from the kernel. If it's not valid, we die but with a nice oops message. Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we check to see if the stack pointer is negative. Unfortunately, this won't detect a bad stack where r1 is less than INT_FRAME_SIZE. This patch fixes the check to compare the modified r1 with -INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL pointers) are correctly detected again. Kudos to Paulus for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "The PPC folks had a large amount of changes queued for 3.13, and now they are fixing the bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Don't drop low-order page address bits powerpc: book3s: kvm: Don't abuse host r2 in exit path powerpc/kvm/booke: Fix build break due to stack frame size warning KVM: PPC: Book3S: PR: Enable interrupts earlier KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Don't clobber our exit handler id powerpc: kvm: fix rare but potential deadlock scene KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Fix physical address calculations
2013-12-20Merge tag 'signed-for-3.13' of git://github.com/agraf/linux-2.6 into kvm-masterPaolo Bonzini
Patch queue for 3.13 - 2013-12-18 This fixes some grave issues we've only found after 3.13-rc1: - Make the modularized HV/PR book3s kvm work well as modules - Fix some race conditions - Fix compilation with certain compilers (booke) - Fix THP for book3s_hv - Fix preemption for book3s_pr Alexander Graf (4): KVM: PPC: Book3S: PR: Don't clobber our exit handler id KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Enable interrupts earlier Aneesh Kumar K.V (1): powerpc: book3s: kvm: Don't abuse host r2 in exit path Paul Mackerras (5): KVM: PPC: Book3S HV: Fix physical address calculations KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Don't drop low-order page address bits Scott Wood (1): powerpc/kvm/booke: Fix build break due to stack frame size warning pingfan liu (1): powerpc: kvm: fix rare but potential deadlock scene
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18powerpc: book3s: kvm: Don't abuse host r2 in exit pathAneesh Kumar K.V
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-17lib: Add missing arch generic-y entries for asm-generic/hash.hDavid S. Miller
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge tag 'v3.13-rc4' into core/lockingIngo Molnar
Merge Linux 3.13-rc4, to refresh this rather old tree with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16Merge tag 'v3.13-rc4' into perf/coreIngo Molnar
Merge Linux 3.13-rc4, to refresh this branch with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16powerpc: Full barrier for smp_mb__after_unlock_lock()Paul E. McKenney
The powerpc lock acquisition sequence is as follows: lwarx; cmpwi; bne; stwcx.; lwsync; Lock release is as follows: lwsync; stw; If CPU 0 does a store (say, x=1) then a lock release, and CPU 1 does a lock acquisition then a load (say, r1=y), then there is no guarantee of a full memory barrier between the store to 'x' and the load from 'y'. To see this, suppose that CPUs 0 and 1 are hardware threads in the same core that share a store buffer, and that CPU 2 is in some other core, and that CPU 2 does the following: y = 1; sync; r2 = x; If 'x' and 'y' are both initially zero, then the lock acquisition and release sequences above can result in r1 and r2 both being equal to zero, which could not happen if unlock+lock was a full barrier. This commit therefore makes powerpc's smp_mb__after_unlock_lock() be a full barrier. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-8-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-13powerpc/powernv: Fix OPAL LPC access in Little EndianBenjamin Herrenschmidt
We are passing pointers to the firmware for reads, we need to properly convert the result as OPAL is always BE. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/powernv: Fix endian issue in opal_xscom_readAnton Blanchard
opal_xscom_read uses a pointer to return the data so we need to byteswap it on LE builds. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-11powerpc/kvm/booke: Fix build break due to stack frame size warningScott Wood
Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>