summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm/hash_utils_64.c
AgeCommit message (Collapse)Author
2014-07-17powerpc/mm: Check paca psize is up to date for huge mappingsMichael Ellerman
commit 09567e7fd44291bfc08accfdd67ad8f467842332 upstream. We have a bug in our hugepage handling which exhibits as an infinite loop of hash faults. If the fault is being taken in the kernel it will typically trigger the softlockup detector, or the RCU stall detector. The bug is as follows: 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..) 2. Slice code converts the slice psize to 16M. 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area() synchronises the mm->context with the paca->context. So the paca slice mask is updated to include the 16M slice. 3. Either: * mmap() fails because there are no huge pages available. * mmap() succeeds and the mapping is then munmapped. In both cases the slice psize remains at 16M in both the paca & mm. 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..) 5. The slice psize is converted back to 64K. Because of the check on line 539 of slice.c we DO NOT update the paca->context. The paca slice mask is now out of sync with the mm slice mask. 6. User/kernel accesses 0xa0000000. 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask** to create an SLB entry and inserts it in the SLB. 18. With the 16M SLB entry in place the hardware does a hash lookup, no entry is found so a data access exception is generated. 19. The data access handler calls do_page_fault() -> handle_mm_fault(). 10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page(). 11. The hardware retries the access, there is still nothing in the hash table so once again a data access exception is generated. 12. hash_page() calls into __hash_page_thp() and inserts a mapping in the hash. Although the THP mapping maps 16M the hashing is done using 64K as the segment page size. 13. hash_page() returns immediately after calling __hash_page_thp(), skipping over the code at line 1125. Resulting in the mismatch between the paca->context and mm->context not being detected. 14. The hardware retries the access, the hash it generates using the 16M SLB entry does NOT match the hash we inserted. 15. We take another data access and go into __hash_page_thp(). 16. We see a valid entry in the hpte_slot_array and so we call updatepp() which succeeds. 17. Goto 14. We could fix this in two ways. The first would be to remove or modify the check on line 539 of slice.c. The second option is to cause the check of paca psize in hash_page() on line 1125 to also be done for THP pages. We prefer the latter, because the check & update of the paca psize is not done until we know it's necessary. It's also done only on the current cpu, so we don't need to IPI all other cpus. Without further rearranging the code, the simplest fix is to pull out the code that checks paca psize and call it in two places. Firstly for THP/hugetlb, and secondly for other mappings as before. Thanks to Dave Jones for trinity, which originally found this bug. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2013-08-14powerpc: Fix a number of sparse warningsAnton Blanchard
Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc: Delete __cpuinit usage from all usersPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc: Make linux pagetable walk safe with THP enabledAneesh Kumar K.V
We need to have irqs disabled to handle all the possible parallel update for linux page table without holding locks. Events that we are intersted in while walking page tables are 1) Page fault 2) umap 3) THP split 4) THP collapse A) local_irq_disabled: ------------------------ 1) page fault: A none to valid transition via page fault is not an issue because we would either see a none or valid. If it is none, we would error out the page table walk. We may need to use on stack values when checking for type of page table elements, because if we do if (!is_hugepd()) { if (!pmd_none() { if (pmd_bad() { We could take that bad condition because the pmd got converted to a hugepd after the !is_hugepd check via a hugetlb fault. The right way would be to check for pmd_none higher up or use on stack value. 2) A valid to none conversion via unmap: We can safely walk the upper level table, because we don't remove the the page table entries until rcu grace period. So even if we followed a wrong pointer we still have the pointer valid till the grace period. A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent pte_clear we take _PAGE_BUSY. 3) THP split: A valid transparent hugepage is converted to nomal page. Before we split we do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING So when walking page table we need to check for pmd_trans_splitting and handle that. The pte returned should also need to be checked for _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save the value of PTE on stack and check for the flag in the local pte value. If we don't have the value set we can safely operate on the local pte value and we atomicaly set _PAGE_BUSY. 4) THP collapse: A normal page gets converted to hugepage. In the collapse path, we mark the pmd none early (pmdp_clear_flush). With irq disabled, if we are aleady walking page table we would see the pmd_none and won't continue. If we see a valid PMD, we should still check for _PAGE_PRESENT before setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc/THP: Add code to handle HPTE faults for hugepagesAneesh Kumar K.V
The deposted PTE page in the second half of the PMD table is used to track the state on hash PTEs. After updating the HPTE, we mark the coresponding slot in the deposted PTE page valid. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc: Replace find_linux_pte with find_linux_pte_or_hugepteAneesh Kumar K.V
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc/mm: handle hugepage size correctly when invalidating hpte entriesAneesh Kumar K.V
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Exception hooks for context tracking subsystemLi Zhong
This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 But after the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06powerpc/tm: Fix null pointer deference in flush_hash_pageMichael Neuling
Make sure that current->thread.reg exists before we deference it in flush_hash_page. Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: John J Miller <millerjo@us.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Print page size info during bootAneesh Kumar K.V
This gives hint about different base and actual page size combination supported by the platform. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: print both base and actual page size on hash failureAneesh Kumar K.V
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Decode the pte-lp-encoding bits correctly.Aneesh Kumar K.V
We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-18powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()Li Zhong
This patch fixes the following oops, which could be trigged by build the kernel with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC. hpte_insert() might return -1, indicating that the bucket (primary here) is full. We are not necessarily reporting a BUG in this case. Instead, we could try repeatedly (try secondary, remove and try again) until we find a slot. [ 543.075675] ------------[ cut here ]------------ [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594 [ 543.075771] REGS: c0000000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 22224482 XER: 00000000 [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c00000000004c200 [ 543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1 GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594 GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854 GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001 GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000 GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001 GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950 GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230 [ 543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18powerpc: Split the code trying to insert hpte repeatedly as an helper functionLi Zhong
Move the logic trying to insert hpte in __hash_page_huge() to an helper function, so it could also be used by others. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-03-17powerpc: Update kernel VSID rangeAneesh Kumar K.V
This patch change the kernel VSID range so that we limit VSID_BITS to 37. This enables us to support 64TB with 65 bit VA (37+28). Without this patch we have boot hangs on platforms that only support 65 bit VA. With this patch we now have proto vsid generated as below: We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated from mmu context id and effective segment id of the address. For user processes max context id is limited to ((1ul << 19) - 5) for kernel space, we use the top 4 context ids to map address as below 0x7fffc - [ 0xc000000000000000 - 0xc0003fffffffffff ] 0x7fffd - [ 0xd000000000000000 - 0xd0003fffffffffff ] 0x7fffe - [ 0xe000000000000000 - 0xe0003fffffffffff ] 0x7ffff - [ 0xf000000000000000 - 0xf0003fffffffffff ] Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]
2013-03-13powerpc: Fix STAB initializationBenjamin Herrenschmidt
Commit f5339277eb8d3aed37f12a27988366f68ab68930 accidentally removed more than just iSeries bits and took out the call to stab_initialize() thus breaking support for POWER3 processors. Put it back. (Yes, nobody noticed until now ...) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.4+]
2013-02-15powerpc: Hook in new transactional memory codeMichael Neuling
This hooks the new transactional memory code into context switching, FP/VMX/VMX unavailable and exception return. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17powerpc/mm: Increase the slice range to 64TBAneesh Kumar K.V
This patch makes the high psizes mask as an unsigned char array so that we can have more than 16TB. Currently we support upto 64TB Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17powerpc/mm: Convert virtual address to vpnAneesh Kumar K.V
This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05powerpc/mm: Replace abs_to_virt() with __va()Michael Ellerman
abs_to_virt() is just a wrapper around __va(), call __va() directly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-28Disintegrate asm/system.h for PowerPCDavid Howells
Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
2012-03-21powerpc: Remove FW_FEATURE ISERIES from arch codeStephen Rothwell
This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23fadump: Register for firmware assisted dump.Mahesh Salgaonkar
On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote: > On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote: > > If I have read the code correctly, we are going to get this printk on > non-pSeries machines or on older pSeries machines, even if the user > has not put the fadump=on option on the kernel command line. The > printk will be annoying since there is no actual error condition. It > seems to me that the condition for the printk should include > fw_dump.fadump_enabled. In other words you should probably add > > if (!fw_dump.fadump_enabled) > return 0; > > at the beginning of the function. Hi Paul, Thanks for pointing it out. Please find the updated patch below. The existing patches above this (4/10 through 10/10) cleanly applies on this update. Thanks, -Mahesh. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-10-31powerpc: add export.h to files making use of EXPORT_SYMBOLPaul Gortmaker
With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-20powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probeAnton Blanchard
If we echo an address the hypervisor doesn't like to /sys/devices/system/memory/probe we oops the box: # echo 0x10000000000 > /sys/devices/system/memory/probe kernel BUG at arch/powerpc/mm/hash_utils_64.c:541! The backtrace is: create_section_mapping arch_add_memory add_memory memory_probe_store sysdev_class_store sysfs_write_file vfs_write SyS_write In create_section_mapping we BUG if htab_bolt_mapping returned an error. A better approach is to return an error which will propagate back to userspace. Rerunning the test with this patch applied: # echo 0x10000000000 > /sys/devices/system/memory/probe -bash: echo: write error: Invalid argument Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20powerpc: Hugetlb for BookEBecky Bruce
Enable hugepages on Freescale BookE processors. This allows the kernel to use huge TLB entries to map pages, which can greatly reduce the number of TLB misses and the amount of TLB thrashing experienced by applications with large memory footprints. Care should be taken when using this on FSL processors, as the number of large TLB entries supported by the core is low (16-64) on current processors. The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g. Page sizes larger than the max zone size are called "gigantic" pages and must be allocated on the command line (and cannot be deallocated). This is currently only fully implemented for Freescale 32-bit BookE processors, but there is some infrastructure in the code for 64-bit BooKE. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27powerpc: Free up some CPU feature bits by moving out MMU-related featuresMatt Evans
Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20powerpc: Replace open coded instruction patching with ↵Anton Blanchard
patch_instruction/patch_branch There are a few places we patch instructions without using patch_instruction and patch_branch, probably because they predated it. Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2010-11-29powerpc/mm: Avoid avoidable void* pointerMichael Neuling
Change pgdir from a void to real type. Having this as a void is stupid and has already caused 1 bug. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-18powerpc: Fix call to subpage_protection()Michael Neuling
In: powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT commit d28513bc7f675d28b479db666d572e078ecf182d Author: David Gibson <david@gibson.dropbear.id.au> subpage_protection() was changed to to take an mm rather a pgdir but it didn't change calling site in hashpage_preload(). The change wasn't noticed at compile time since hashpage_preload() used a void* as the parameter to subpage_protection(). This is obviously wrong and can trigger the following crash when CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES CONFIG_PPC_SUBPAGE_PROT are enabled. Freeing unused kernel memory: 704k freed Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7 Faulting instruction address: 0xc0000000000410f4 cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590] pc: c0000000000410f4: .hash_preload+0x258/0x338 lr: c000000000041054: .hash_preload+0x1b8/0x338 sp: c00000004233f810 msr: 8000000000009032 dar: 6b6b6b6b6b6c49b7 dsisr: 40000000 current = 0xc00000007e2c0070 paca = 0xc000000007fe0500 pid = 1, comm = init enter ? for help [c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable) [c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0 [c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc [c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c [c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac [c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05memblock: Remove rmo_size, burry it in arch/powerpc where it belongsBenjamin Herrenschmidt
The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact server ppc64 though I hijack it on embedded ppc64 for similar purposes) and represents the area of memory that can be accessed in real mode (aka with MMU off), or on embedded, from the exception vectors (which is bolted in the TLB) which pretty much boils down to the same thing. We take that out of the generic MEMBLOCK data structure and move it into arch/powerpc where it belongs, renaming it to "RMA" while at it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05memblock: Introduce default allocation limit and use it to replace explicit onesBenjamin Herrenschmidt
This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05memblock: Expose MEMBLOCK_ALLOC_ANYWHEREBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04memblock/powerpc: Use new accessorsBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04memblock: Rename memblock_region to memblock_type and memblock_property to ↵Benjamin Herrenschmidt
memblock_region Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23powerpc/mm: Add some debug output when hash insertion failsBenjamin Herrenschmidt
This adds some debug output to our MMU hash code to print out some useful debug data if the hypervisor refuses the insertion (which should normally never happen). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---
2010-07-23powerpc/mm: Move around testing of _PAGE_PRESENT in hash codeBenjamin Herrenschmidt
Instead of adding _PAGE_PRESENT to the access permission mask in each low level routine independently, we add it once from hash_page(). We also move the preliminary access check (the racy one before the PTE is locked) up so it applies to the huge page case. This duplicates code in __hash_page_huge() which we'll remove in a subsequent patch to fix a race in there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14lmb: rename to memblockYinghai Lu
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18powerpc/mm: Fix stupid bug in subpge protection handlingDavid Gibson
Commit d28513bc7f675d28b479db666d572e078ecf182d ("Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for breakage caused by an earlier clean up patch of mine, contains a stupid bug. I changed the parameters of the subpage_protection() function, but failed to update one of the callers. This patch fixes it, and replaces a void * with a typed pointer so that the compiler will warn on such an error in future. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.Sachin P. Sant
This time without the funny characters. Fix following build errors generated with DEBUG=1 cc1: warnings being treated as errors arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes': arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize': arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' ... SNIP ... Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROTDavid Gibson
Commit a0668cdc154e54bf0c85182e0535eea237d53146 cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-02Revert "powerpc/mm: Fix bug in pagetable cache cleanup with ↵Benjamin Herrenschmidt
CONFIG_PPC_SUBPAGE_PROT" This reverts commit c045256d146800ea1d741a8e9e377dada6b7e195. It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will commit a fixed version separately Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROTDavid Gibson
Commit a0668cdc154e54bf0c85182e0535eea237d53146 cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05Export symbols for KVM moduleAlexander Graf
We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal ↵David Gibson
accessors The hugepage arch code provides a number of hook functions/macros which mirror the functionality of various normal page pte access functions. Various changes in the normal page accessors (in particular BenH's recent changes to the handling of lazy icache flushing and PAGE_EXEC) have caused the hugepage versions to get out of sync with the originals. In some cases, this is a bug, at least on some MMU types. One of the reasons that some hooks were not identical to the normal page versions, is that the fact we're dealing with a hugepage needed to be passed down do use the correct dcache-icache flush function. This patch makes the main flush_dcache_icache_page() function hugepage aware (by checking for the PageCompound flag). That in turn means we can make set_huge_pte_at() just a call to set_pte_at() bringing it back into sync. As a bonus, this lets us remove the hash_huge_page_do_lazy_icache() function, replacing it with a call to the hash_page_do_lazy_icache() function it was based on. Some other hugepage pte access hooks - huge_ptep_get_and_clear() and huge_ptep_clear_flush() - are not so easily unified, but this patch at least brings them back into sync with the current versions of the corresponding normal page functions. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30powerpc/mm: Cleanup initialization of hugepages on powerpcDavid Gibson
This patch simplifies the logic used to initialize hugepages on powerpc. The somewhat oddly named set_huge_psize() is renamed to add_huge_page_size() and now does all necessary verification of whether it's given a valid hugepage sizes (instead of just some) and instantiates the generic hstate structure (but no more). hugetlbpage_init() now steps through the available pagesizes, checks if they're valid for hugepages by calling add_huge_page_size() and initializes the kmem_caches for the hugepage pagetables. This means we can now eliminate the mmu_huge_psizes array, since we no longer need to pass the sizing information for the pagetable caches from set_huge_psize() into hugetlbpage_init() Determination of the default huge page size is also moved from the hash code into the general hugepage code. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30powerpc/mm: Allow more flexible layouts for hugepage pagetablesDavid Gibson
Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-04-22powerpc: Fix crash on CPU hotplugMichael Ellerman
early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>