summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-05-05rcu: Use WARN_ON_ONCE for DEBUG_OBJECTS_RCU_HEAD warningsPaul E. McKenney
Avoid additional multiple-warning confusion in memory-corruption scenarios. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: mark rcutorture boosting callback as being on-stackPaul E. McKenney
The CONFIG_DEBUG_OBJECTS_RCU_HEAD facility requires that on-stack RCU callbacks be flagged explicitly to debug-objects using the init_rcu_head_on_stack() and destroy_rcu_head_on_stack() functions. This commit applies those functions to the rcutorture code that tests RCU priority boosting. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: add DEBUG_OBJECTS_RCU_HEAD check for alignmentPaul E. McKenney
Verify that rcu_head structures are aligned to a four-byte boundary. This check is enabled by CONFIG_DEBUG_OBJECTS_RCU_HEAD. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Enable DEBUG_OBJECTS_RCU_HEAD from !PREEMPTMathieu Desnoyers
The prohibition of DEBUG_OBJECTS_RCU_HEAD from !PREEMPT was due to the fixup actions. So just produce a warning from !PREEMPT. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Add forward-progress diagnostic for per-CPU kthreadsPaul E. McKenney
Increment a per-CPU counter on each pass through rcu_cpu_kthread()'s service loop, and add it to the rcudata trace output. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: add grace-period age and more kthread state to tracingPaul E. McKenney
This commit adds the age in jiffies of the current grace period along with the duration in jiffies of the longest grace period since boot to the rcu/rcugp debugfs file. It also adds an additional "O" state to kthread tracing to differentiate between the kthread waiting due to having nothing to do on the one hand and waiting due to being on the wrong CPU on the other hand. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: fix tracing bug thinko on boost-balk attributionPaul E. McKenney
The rcu_initiate_boost_trace() function mis-attributed refusals to initiate RCU priority boosting that were in fact due to its not yet being time to boost. This patch fixes the faulty comparison. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: update tracing documentation for new rcutorture and rcuboostPaul E. McKenney
This commit documents the new debugfs rcu/rcutorture and rcu/rcuboost trace files. The description has been updated as suggested by Josh Triplett. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: make rcutorture version numbers available through debugfsPaul E. McKenney
It is not possible to accurately correlate rcutorture output with that of debugfs. This patch therefore adds a debugfs file that prints out the rcutorture version number, permitting easy correlation. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: add tracing for RCU's kthread run states.Paul E. McKenney
Add tracing to help debugging situations when RCU's kthreads are not running but are supposed to be. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: add callback-queue information to rcudata outputPaul E. McKenney
This commit adds an indication of the state of the callback queue using a string of four characters following the "ql=" integer queue length. The first character is "N" if there are callbacks that have been queued that are not yet ready to be handled by the next grace period, or "." otherwise. The second character is "R" if there are callbacks queued that are ready to be handled by the next grace period, or "." otherwise. The third character is "W" if there are callbacks waiting for the current grace period, or "." otherwise. Finally, the fourth character is "D" if there are callbacks that have been handled by a prior grace period and are waiting to be invoked, or ".". Note that callbacks that are in the process of being invoked are not shown. These callbacks would have been removed from the rcu_data structure's list by rcu_do_batch() prior to being executed. (These callbacks are also not reflected in the "ql=" total, FWIW.) Also, document the new callback-queue trace information. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Update RCU's trace.txt documentation for new formatPaul E. McKenney
The trace.txt file had obsolete output for the debugfs rcu/rcudata file, so update it. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Add boosting to TREE_PREEMPT_RCU tracingPaul E. McKenney
Includes total number of tasks boosted, number boosted on behalf of each of normal and expedited grace periods, and statistics on attempts to initiate boosting that failed for various reasons. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: eliminate unused boosting statisticsPaul E. McKenney
The n_rcu_torture_boost_allocerror and n_rcu_torture_boost_afferror statistics are not actually incremented anymore, so eliminate them. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: avoid hammering sched with yet another bound RT kthreadPaul E. McKenney
The scheduler does not appear to take kindly to having multiple real-time threads bound to a CPU that is going offline. So this commit is a temporary hack-around to avoid that happening. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: put per-CPU kthread at non-RT priority during CPU hotplug operationsPaul E. McKenney
If you are doing CPU hotplug operations, it is best not to have CPU-bound realtime tasks running CPU-bound on the outgoing CPU. So this commit makes per-CPU kthreads run at non-realtime priority during that time. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Force per-rcu_node kthreads off of the outgoing CPUPaul E. McKenney
The scheduler has had some heartburn in the past when too many real-time kthreads were affinitied to the outgoing CPU. So, this commit lightens the load by forcing the per-rcu_node and the boost kthreads off of the outgoing CPU. Note that RCU's per-CPU kthread remains on the outgoing CPU until the bitter end, as it must in order to preserve correctness. Also avoid disabling hardirqs across calls to set_cpus_allowed_ptr(), given that this function can block. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: priority boosting for TREE_PREEMPT_RCUPaul E. McKenney
Add priority boosting for TREE_PREEMPT_RCU, similar to that for TINY_PREEMPT_RCU. This is enabled by the default-off RCU_BOOST kernel parameter. The priority to which to boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter (defaulting to real-time priority 1) and the time to wait before boosting the readers who are blocking a given grace period is controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: move TREE_RCU from softirq to kthreadPaul E. McKenney
If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Also add comments and properly synchronized all accesses to rcu_cpu_kthread_task, as suggested by Lai Jiangshan. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: merge TREE_PREEPT_RCU blocked_tasks[] listsPaul E. McKenney
Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the rcu_node structure into a single ->blkd_tasks list with ->gp_tasks and ->exp_tasks tail pointers. This is in preparation for RCU priority boosting, which will add a third dimension to the combinatorial explosion in the ->blocked_tasks[] case, but simply a third pointer in the new ->blkd_tasks case. Also update documentation to reflect blocked_tasks[] merge Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Decrease memory-barrier usage based on semi-formal proofPaul E. McKenney
Commit d09b62d fixed grace-period synchronization, but left some smp_mb() invocations in rcu_process_callbacks() that are no longer needed, but sheer paranoia prevented them from being removed. This commit removes them and provides a proof of correctness in their absence. It also adds a memory barrier to rcu_report_qs_rsp() immediately before the update to rsp->completed in order to handle the theoretical possibility that the compiler or CPU might move massive quantities of code into a lock-based critical section. This also proves that the sheer paranoia was not entirely unjustified, at least from a theoretical point of view. In addition, the old dyntick-idle synchronization depended on the fact that grace periods were many milliseconds in duration, so that it could be assumed that no dyntick-idle CPU could reorder a memory reference across an entire grace period. Unfortunately for this design, the addition of expedited grace periods breaks this assumption, which has the unfortunate side-effect of requiring atomic operations in the functions that track dyntick-idle state for RCU. (There is some hope that the algorithms used in user-level RCU might be applied here, but some work is required to handle the NMIs that user-space applications can happily ignore. For the short term, better safe than sorry.) This proof assumes that neither compiler nor CPU will allow a lock acquisition and release to be reordered, as doing so can result in deadlock. The proof is as follows: 1. A given CPU declares a quiescent state under the protection of its leaf rcu_node's lock. 2. If there is more than one level of rcu_node hierarchy, the last CPU to declare a quiescent state will also acquire the ->lock of the next rcu_node up in the hierarchy, but only after releasing the lower level's lock. The acquisition of this lock clearly cannot occur prior to the acquisition of the leaf node's lock. 3. Step 2 repeats until we reach the root rcu_node structure. Please note again that only one lock is held at a time through this process. The acquisition of the root rcu_node's ->lock must occur after the release of that of the leaf rcu_node. 4. At this point, we set the ->completed field in the rcu_state structure in rcu_report_qs_rsp(). However, if the rcu_node hierarchy contains only one rcu_node, then in theory the code preceding the quiescent state could leak into the critical section. We therefore precede the update of ->completed with a memory barrier. All CPUs will therefore agree that any updates preceding any report of a quiescent state will have happened before the update of ->completed. 5. Regardless of whether a new grace period is needed, rcu_start_gp() will propagate the new value of ->completed to all of the leaf rcu_node structures, under the protection of each rcu_node's ->lock. If a new grace period is needed immediately, this propagation will occur in the same critical section that ->completed was set in, but courtesy of the memory barrier in #4 above, is still seen to follow any pre-quiescent-state activity. 6. When a given CPU invokes __rcu_process_gp_end(), it becomes aware of the end of the old grace period and therefore makes any RCU callbacks that were waiting on that grace period eligible for invocation. If this CPU is the same one that detected the end of the grace period, and if there is but a single rcu_node in the hierarchy, we will still be in the single critical section. In this case, the memory barrier in step #4 guarantees that all callbacks will be seen to execute after each CPU's quiescent state. On the other hand, if this is a different CPU, it will acquire the leaf rcu_node's ->lock, and will again be serialized after each CPU's quiescent state for the old grace period. On the strength of this proof, this commit therefore removes the memory barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp(). The effect is to reduce the number of memory barriers by one and to reduce the frequency of execution from about once per scheduling tick per CPU to once per grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: Remove conditional compilation for RCU CPU stall warningsPaul E. McKenney
The RCU CPU stall warnings can now be controlled using the rcu_cpu_stall_suppress boot-time parameter or via the same parameter from sysfs. There is therefore no longer any reason to have kernel config parameters for this feature. This commit therefore removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter remains to allow task-stall information to be suppressed if desired. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-03Linux 2.6.39-rc6v2.6.39-rc6Linus Torvalds
2011-05-03Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/radeon/kms: fix gart setup on fusion parts (v2) drm: Send pending vblank events before disabling vblank. drm/radeon: fix regression on atom cards with hardcoded EDID record. drm/radeon/kms: add some new pci ids
2011-05-04drm/radeon/kms: fix gart setup on fusion parts (v2)Alex Deucher
Out of the entire GART/VM subsystem, the hw designers changed the location of 3 regs. v2: airlied: add parameter for userspace to work from. Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Jerome Glisse <jglisse@redhat.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-04drm: Send pending vblank events before disabling vblank.Christopher James Halse Rogers
This is the least-bad behaviour. It means that we signal the vblank event before it actually happens, but since we're disabling vblanks there's no guarantee that it will *ever* happen otherwise. This prevents GL applications which use WaitMSC from hanging indefinitely. Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-04drm/radeon: fix regression on atom cards with hardcoded EDID record.Dave Airlie
Since fafcf94e2b5732d1e13b440291c53115d2b172e9 introduced an edid size, it seems to have broken this path. This manifest as oops on T500 Lenovo laptops with dual graphics primarily. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33812 cc: stable@kernel.org Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-04drm/radeon/kms: add some new pci idsAlex Deucher
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-03logfs: initialize superblock entries earlierLinus Torvalds
In particular, s_freeing_list needs to be initialized early, since it is used on some of the error paths when mounts fail. The mapping inode, for example, would be initialized and then free'd on an error path before s_freeing_list was initialized, but the inode drop operation needs the s_freeing_list to be set up. Normally you'd never see this, because not only is logfs fairly rare, but a successful mount will never have any issues. Reported-by: werner <w.landgraf@ru.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-03Merge branch 'stable/bug-fixes-for-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top xen/mmu: Add workaround "x86-64, mm: Put early page table high"
2011-05-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: sdhci: Check mrq != NULL in sdhci_tasklet_finish mmc: sdhci: Check mrq->cmd in sdhci_tasklet_finish mmc: tmio: fix .set_ios(MMC_POWER_UP) handling mmc: fix a race between card-detect rescan and clock-gate work instances mmc: omap: Fix possible NULL pointer deref mmc: core: mmc_add_card(): fix missing break in switch statement mmc: sdhci-pci: Fix error case in sdhci_pci_probe_slot()
2011-05-03Merge branches 'x86-fixes-for-linus' and 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, reboot: Fix relocations in reboot_32.S x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL
2011-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wm831x-ts - move BTN_TOUCH reporting to data transfer Input: wm831x-ts - allow IRQ flags to be specified Input: wm831x-ts - fix races with IRQ management
2011-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) sysctl: net: call unregister_net_sysctl_table where needed Revert: veth: remove unneeded ifname code from veth_newlink() smsc95xx: fix reset check tg3: Fix failure to enable WoL by default when possible networking: inappropriate ioctl operation should return ENOTTY amd8111e: trivial typo spelling: Negotitate -> Negotiate ipv4: don't spam dmesg with "Using LC-trie" messages af_unix: Only allow recv on connected seqpacket sockets. mii: add support of pause frames in mii_get_an net: ftmac100: fix scheduling while atomic during PHY link status change usbnet: Transfer of maintainership usbnet: add support for some Huawei modems with cdc-ether ports bnx2: cancel timer on device removal iwl4965: fix "Received BA when not expected" iwlagn: fix "Received BA when not expected" dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085 usbnet: Resubmit interrupt URB if device is open iwl4965: fix "TX Power requested while scanning" iwlegacy: led stay solid on when no traffic b43: trivial: update module info about ucode16_mimo firmware ...
2011-05-02sysctl: net: call unregister_net_sysctl_table where neededLucian Adrian Grijincu
ctl_table_headers registered with register_net_sysctl_table should have been unregistered with the equivalent unregister_net_sysctl_table Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02Revert: veth: remove unneeded ifname code from veth_newlink()Jiri Pirko
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 ("veth: remove unneeded ifname code from veth_newlink()") caused regression on veth creation. This patch reverts the original one. Reported-by: Michał Mirosław <mirqus@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02smsc95xx: fix reset checkRabin Vincent
The reset loop check should check the MII_BMCR register value for BMCR_RESET rather than for MII_BMCR (the register address, which also happens to be zero). Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02tg3: Fix failure to enable WoL by default when possibleRafael J. Wysocki
tg3 is supposed to enable WoL by default on adapters which support that, but it fails to do so unless the adapter's /sys/devices/.../power/wakeup file contains 'enabled' during the initialization of the adapter. Fix that by making tg3 use device_set_wakeup_enable() to enable wakeup automatically whenever WoL should be enabled by default. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02networking: inappropriate ioctl operation should return ENOTTYLifeng Sun
ioctl() calls against a socket with an inappropriate ioctl operation are incorrectly returning EINVAL rather than ENOTTY: [ENOTTY] Inappropriate I/O control operation. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992 Signed-off-by: Lifeng Sun <lifongsun@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02x86, reboot: Fix relocations in reboot_32.SH. Peter Anvin
The use of base for %ebx in this file is arbitrary, *except* that we also use it to compute the real-mode segment. Therefore, make it so that r_base really is the true address to which %ebx points. This resolves kernel bugzilla 33302. Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
2011-05-02amd8111e: trivial typo spelling: Negotitate -> NegotiateJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_topStefano Stabellini
mask_rw_pte is currently checking if a pfn is a pagetable page if it falls in the range pgt_buf_start - pgt_buf_end but that is incorrect because pgt_buf_end is a moving target: pgt_buf_top is the real boundary. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02xen/mmu: Add workaround "x86-64, mm: Put early page table high"Konrad Rzeszutek Wilk
As a consequence of the commit: commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e Author: Yinghai Lu <yinghai@kernel.org> Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high it causes the Linux kernel to crash under Xen: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7) (XEN) mm.c:3027:d0 Error while pinning mfn b1d89 (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: ... The reason is that at some point init_memory_mapping is going to reach the pagetable pages area and map those pages too (mapping them as normal memory that falls in the range of addresses passed to init_memory_mapping as argument). Some of those pages are already pagetable pages (they are in the range pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and everything is fine. Some of these pages are not pagetable pages yet (they fall in the range pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they are going to be mapped RW. When these pages become pagetable pages and are hooked into the pagetable, xen will find that the guest has already a RW mapping of them somewhere and fail the operation. The reason Xen requires pagetables to be RO is that the hypervisor needs to verify that the pagetables are valid before using them. The validation operations are called "pinning" (more details in arch/x86/xen/mmu.c). In order to fix the issue we mark all the pages in the entire range pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation is completed only the range pgt_buf_start-pgt_buf_end is reserved by init_memory_mapping. Hence the kernel is going to crash as soon as one of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those ranges are RO). For this reason, this function is introduced which is called _after_ the init_memory_mapping has completed (in a perfect world we would call this function from init_memory_mapping, but lets ignore that). Because we are called _after_ init_memory_mapping the pgt_buf_[start, end,top] have all changed to new values (b/c another init_memory_mapping is called). Hence, the first time we enter this function, we save away the pgt_buf_start value and update the pgt_buf_[end,top]. When we detect that the "old" pgt_buf_start through pgt_buf_end PFNs have been reserved (so memblock_x86_reserve_range has been called), we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top. And then we update those "old" pgt_buf_[end|top] with the new ones so that we can redo this on the next pagetable. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [v1: Updated with Jeremy's comments] [v2: Added the crash output] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2011-05-02Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'for-linus' of git://git.infradead.org/ubifs-2.6: UBIFS: seek journal heads to the latest bud in replay UBIFS: do not free write-buffers when in R/O mode
2011-05-02Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits) CLKDEV: Fix clkdev return value for NULL clk case ARM: 6891/1: prevent heap corruption in OABI semtimedop ARM: kprobes: Tidy-up kprobes-decode.c ARM: kprobes: Add emulation of hint instructions like NOP and WFI ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions ARM: kprobes: Add emulation of MOVW and MOVT instructions ARM: kprobes: Reject probing of undefined data processing instructions ARM: kprobes: Remove redundant code in space_1111 ARM: kprobes: Fix emulation of PLD instructions ARM: kprobes: Reject probing of SETEND instructions ARM: kprobes: Consolidate stub decoding functions ARM: kprobes: Reject probing of all coprocessor instructions ARM: kprobes: Fix emulation of USAD8 instructions ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions ARM: kprobes: Reject probing of undefined media instructions ARM: kprobes: Add emulation of RBIT instruction ARM: kprobes: Reject probing of LDRB instructions which load PC ARM: kprobes: Fix emulation of LDRD and STRD instructions ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably ...
2011-05-02genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVELGeert Uytterhoeven
commit ab7798ffcf98b11a9525cf65bacdae3fd58d357f ("genirq: Expand generic show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to accomodate PowerPC, but this doesn't actually enable the functionality due to a typo in the #ifdef check. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Linux/PPC Development <linuxppc-dev@lists.ozlabs.org> Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02UBIFS: seek journal heads to the latest bud in replayArtem Bityutskiy
This is the second fix of the following symptom: UBIFS error (pid 34456): could not find an empty LEB which sometimes happens after power cuts when we mount the file-system - UBIFS refuses it with the above error message which comes from the 'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test with the UBIFS power cut emulation enabled. Analysis of the problem. Currently UBIFS replay seeks the journal heads to the last _replayed_ bud. But the buds are replayed out-of-order, so the replay basically seeks journal heads to the "random" bud belonging to this head, and not to the _last_ one. The result of this is that the GC head may be seeked to a full LEB with no free space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a fully or mostly dirty LEB to match the current GC head (because we need to garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum). So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and also fail. As a result - recovery fails and mounting fails. This patch teaches the replay to initialize the GC heads exactly to the latest buds, i.e. the buds which have the largest sequence number in corresponding log reference nodes. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-05-02UBIFS: do not free write-buffers when in R/O modeArtem Bityutskiy
Currently UBIFS has a small optimization - it frees write-buffers when it is re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it does not allocate write-buffers as well. This optimization is nice but it leads to subtle problems and complications in recovery, which I can reproduce using the integck test. The symptoms are that after a power cut the file-system cannot be mounted if we first mount it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints: UBIFS error (pid 34456): could not find an empty LEB Analysis of the problem. When mounting R/W, the reply process sets journal heads to buds [1], but when mounting R/O - it does not do this, because the write-buffers are not allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the same file-system but for the following 2 cases: 1. mounting R/W after a power cut and recover 2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery In the former case, we have journal heads seeked to the a bud, in the latter case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not try to recover the GC LEB by garbage-collecting to the GC head, but we just try to find an empty LEB, and there may be no empty LEBs, so we just fail. On the other hand, in the former case (mount R/W), we are able to make a GC LEB (@c->gc_lnum) by garbage-collecting. Thus, let's remove this small nice optimization and always allocate write-buffers. This should not make too big difference - we have only 3 of them, each of max. write unit size, which is usually 2KiB. So this is about 6KiB of RAM for the typical case, and only when mounted R/O. [1]: Note, currently the replay process is setting (seeking) the journal heads to _some_ buds, not necessarily to the buds which had been the journal heads before the power cut happened. This will be fixed separately. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: hda - Fix Realtek's chained fixup checks Revert "ALSA: hda - Fix pin-config of Gigabyte mobo" ALSA: HDA: Fix automute for Gateway NV79 ALSA: hda: add beep quirk for Realtek 0x1043:831a ALSA: usb-audio - Terratec Aureon 7.1 USB ID as C-Media cm6206 quirks ALSA: hda - VIA: Fix notify_aa_path_ctls() invalid issue. ALSA - au88x0 - Add buffer bytes constraints