summaryrefslogtreecommitdiff
path: root/arch/s390/include/asm/pgtable.h
AgeCommit message (Collapse)Author
2017-10-08s390/mm: make pmdp_invalidate() do invalidation onlyGerald Schaefer
commit 91c575b335766effa6103eba42a82aea560c365f upstream. Commit 227be799c39a ("s390/mm: uninline pmdp_xxx functions from pgtable.h") inadvertently changed the behavior of pmdp_invalidate(), so that it now clears the pmd instead of just marking it as invalid. Fix this by restoring the original behavior. A possible impact of the misbehaving pmdp_invalidate() would be the MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we should not have any negative impact on the related dirty/young flags, since those flags are not set by the hardware on s390. Fixes: 227be799c39a ("s390/mm: uninline pmdp_xxx functions from pgtable.h") Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-09s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangsChristian Borntraeger
commit fa41ba0d08de7c975c3e94d0067553f9b934221f upstream. Right now there is a potential hang situation for postcopy migrations, if the guest is enabling storage keys on the target system during the postcopy process. For storage key virtualization, we have to forbid the empty zero page as the storage key is a property of the physical page frame. As we enable storage key handling lazily we then drop all mappings for empty zero pages for lazy refaulting later on. This does not work with the postcopy migration, which relies on the empty zero page never triggering a fault again in the future. The reason is that postcopy migration will simply read a page on the target system if that page is a known zero page to fault in an empty zero page. At the same time postcopy remembers that this page was already transferred - so any future userfault on that page will NOT be retransmitted again to avoid races. If now the guest enters the storage key mode while in postcopy, we will break this assumption of postcopy. The solution is to disable the empty zero page for KVM guests early on and not during storage key enablement. With this change, the postcopy migration process is guaranteed to start after no zero pages are left. As guest pages are very likely not empty zero pages anyway the memory overhead is also pretty small. While at it this also adds proper page table locking to the zero page removal. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27s390/mm: fix CMMA vs KSM vs othersChristian Borntraeger
commit a8f60d1fadf7b8b54449fcc9d6b15248917478ba upstream. On heavy paging with KSM I see guest data corruption. Turns out that KSM will add pages to its tree, where the mapping return true for pte_unused (or might become as such later). KSM will unmap such pages and reinstantiate with different attributes (e.g. write protected or special, e.g. in replace_page or write_protect_page)). This uncovered a bug in our pagetable handling: We must remove the unused flag as soon as an entry becomes present again. Signed-of-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-24s390/mm: merge local / non-local IDTE helperMartin Schwidefsky
Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a single __p[m|u]dp_idte function with an additional parameter. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24s390/mm: merge local / non-local IPTE helperMartin Schwidefsky
Merge the __ptep_ipte and __ptep_ipte_local functions into a single __ptep_ipte function with an additional parameter. The __pte_ipte_range function is still extra as the while loops makes it hard to merge. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-07-31s390/mm: clean up pte/pmd encodingGerald Schaefer
The hugetlbfs pte<->pmd conversion functions currently assume that the pmd bit layout is consistent with the pte layout, which is not really true. The SW read and write bits are encoded as the sequence "wr" in a pte, but in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence is identical in both cases, which results in swapped read and write bits in the pmd. In practice this is not a problem, because those pmd bits are only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code works on (fake) ptes, and the converted pte bits are correct. There is another variation in pte/pmd encoding which affects dirty prot-none ptes/pmds. In this case, a pmd has both its HW read-only and invalid bit set, while it is only the invalid bit for a pte. This also has no effect in practice, but it should better be consistent. This patch fixes both inconsistencies by changing the SW read/write bit layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes the hugetlbfs conversion functions more robust by introducing a move_set_bit() macro that uses the pte/pmd bit #defines instead of constant shifts. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-20s390/mm: shadow pages with real guest requested protectionDavid Hildenbrand
We really want to avoid manually handling protection for nested virtualization. By shadowing pages with the protection the guest asked us for, the SIE can handle most protection-related actions for us (e.g. special handling for MVPG) and we can directly forward protection exceptions to the guest. PTEs will now always be shadowed with the correct _PAGE_PROTECT flag. Unshadowing will take care of any guest changes to the parent PTE and any host changes to the host PTE. If the host PTE doesn't have the fitting access rights or is not available, we have to fix it up. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: add shadow gmap supportMartin Schwidefsky
For a nested KVM guest the outer KVM host needs to create shadow page tables for the nested guest. This patch adds the basic support to the guest address space (gmap) code. For each guest address space the inner KVM host creates, the first outer KVM host needs to create shadow page tables. The address space is identified by the ASCE loaded into the control register 1 at the time the inner SIE instruction for the second nested KVM guest is executed. The outer KVM host creates the shadow tables starting with the table identified by the ASCE on a on-demand basis. The outer KVM host will get repeated faults for all the shadow tables needed to run the second KVM guest. While a shadow page table for the second KVM guest is active the access to the origin region, segment and page tables needs to be restricted for the first KVM guest. For region and segment and page tables the first KVM guest may read the memory, but write attempt has to lead to an unshadow. This is done using the page invalid and read-only bits in the page table of the first KVM guest. If the first guest re-accesses one of the origin pages of a shadow, it gets a fault and the affected parts of the shadow page table hierarchy needs to be removed again. PGSTE tables don't have to be shadowed, as all interpretation assist can't deal with the invalid bits in the shadow pte being set differently than the original ones provided by the first KVM guest. Many bug fixes and improvements by David Hildenbrand. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: extended gmap pte notifierMartin Schwidefsky
The current gmap pte notifier forces a pte into to a read-write state. If the pte is invalidated the gmap notifier is called to inform KVM that the mapping will go away. Extend this approach to allow read-write, read-only and no-access as possible target states and call the pte notifier for any change to the pte. This mechanism is used to temporarily set specific access rights for a pte without doing the heavy work of a true mprotect call. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-13s390/mm: align swapper_pg_dir to 16kHeiko Carstens
The segment/region table that is part of the kernel image must be properly aligned to 16k in order to make the crdte inline assembly work. Otherwise it will calculate a wrong segment/region table start address and access incorrect memory locations if the swapper_pg_dir is not aligned to 16k. Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir at the beginning of the bss section and also align the bss section to 16k just like other architectures did. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: add mapping statisticsHeiko Carstens
Add statistics that show how memory is mapped within the kernel identity mapping. This is more or less the same like git commit ce0c0e50f94e ("x86, generic: CPA add statistics about state of direct mapping v4") for x86. I also intentionally copied the lower case "k" within DirectMap4k vs the upper case "M" and "G" within the two other lines. Let's have consistent inconsistencies across architectures. The output of /proc/meminfo now contains these additional lines: DirectMap4k: 2048 kB DirectMap1M: 3991552 kB DirectMap2G: 4194304 kB The implementation on s390 is lockless unlike the x86 version, since I assume changes to the kernel mapping are a very rare event. Therefore it really doesn't matter if these statistics could potentially be inconsistent if read while kernel pages tables are being changed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pageattr: allow kernel page table splittingHeiko Carstens
set_memory_ro() and set_memory_rw() currently only work on 4k mappings, which is good enough for module code aka the vmalloc area. However we stumbled already twice into the need to make this also work on larger mappings: - the ro after init patch set - the crash kernel resize code Therefore this patch implements automatic kernel page table splitting if e.g. set_memory_ro() would be called on parts of a 2G mapping. This works quite the same as the x86 code, but is much simpler. In order to make this work and to be architecturally compliant we now always use the csp, cspg or crdte instructions to replace valid page table entries. This means that set_memory_ro() and set_memory_rw() will be much more expensive than before. In order to avoid huge latencies the code contains a couple of cond_resched() calls. The current code only splits page tables, but does not merge them if it would be possible. The reason for this is that currently there is no real life scenarion where this would really happen. All current use cases that I know of only change access rights once during the life time. If that should change we can still implement kernel page table merging at a later time. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: make pmd and pud helper functions availableHeiko Carstens
Make pmd_wrprotect() and pmd_mkwrite() available independently from CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_HUGETLB_PAGE so these can be used on the kernel mapping. Also introduce a couple of pud helper functions, namely pud_pfn(), pud_wrprotect(), pud_mkwrite(), pud_mkdirty() and pud_mkclean() which only work on the kernel mapping. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: get rid of _REGION3_ENTRY_ROHeiko Carstens
_REGION3_ENTRY_RO is a duplicate of _REGION_ENTRY_PROTECT. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNELHeiko Carstens
Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use defines. Also to make e.g. pmd_wrprotect() work on the kernel mapping a couple more flags must be set. Therefore add the missing flags also. In order to make everything symmetrical this patch also adds software dirty, young, read and write bits for region 3 table entries. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: introduce and use generic csp inline asmHeiko Carstens
We have already two inline assemblies which make use of the csp instruction. Since I need a third instance let's introduce a generic inline assmebly which can be used by everyone. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-10KVM: s390: handle missing storage-key facilityDavid Hildenbrand
Without the storage-key facility, SIE won't interpret SSKE, ISKE and RRBE for us. So let's add proper interception handlers that will be called if lazy sske cannot be enabled. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: pfmf: support conditional-sske facilityDavid Hildenbrand
We already indicate that facility but don't implement it in our pfmf interception handler. Let's add a new storage key handling function for conditionally setting the guest storage key. As we will reuse this function later on, let's directly implement returning the old key via parameter and indicating if any change happened via rc. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: return key via pointer in get_guest_storage_keyDavid Hildenbrand
Let's just split returning the key and reporting errors. This makes calling code easier and avoids bugs as happened already. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: don't drop errors in get_guest_storage_keyDavid Hildenbrand
Commit 1e133ab296f3 ("s390/mm: split arch/s390/mm/pgtable.c") changed the return value of get_guest_storage_key to an unsigned char, resulting in -EFAULT getting interpreted as a valid storage key. Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-19arch: fix has_transparent_hugepage()Hugh Dickins
I've just discovered that the useful-sounding has_transparent_hugepage() is actually an architecture-dependent minefield: on some arches it only builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when not, but on some of those (arm and arm64) it then gives the wrong answer; and on mips alone it's marked __init, which would crash if called later (but so far it has not been called later). Straighten this out: make it available to all configs, with a sensible default in asm-generic/pgtable.h, removing its definitions from those arches (arc, arm, arm64, sparc, tile) which are served by the default, adding #define has_transparent_hugepage has_transparent_hugepage to those (mips, powerpc, s390, x86) which need to override the default at runtime, and removing the __init from mips (but maybe that kind of code should be avoided after init: set a static variable the first time it's called). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390] Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23s390/mm: correct comment about segment table entriesMartin Schwidefsky
The comment describing the bit encoding for segment table entries is incorrect in regard to the read and write bits. The segment read bit is 0x0002 and write is 0x0001, not the other way around. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23s390/mm: remove unnecessary indirection with pgste_update_allMartin Schwidefsky
The first parameter of pgste_update_all is a pointer to a pte. Simplify the code by passing the pte value. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-15s390, thp: remove infrastructure for handling splitting PMDsKirill A. Shutemov
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-14s390/mm: try to avoid storage key operation in ptep_set_access_flagsMartin Schwidefsky
The call to pgste_set_key in ptep_set_access_flags can be avoided if the old pte is found to be valid at the time the new access rights are set. The function that created the old, valid pte already completed the required storage key operation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14s390/mm: implement soft-dirty bits for user memory change trackingMartin Schwidefsky
Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty bit. The fault mechanism to do dirty tracking is already in place. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03s390/mm: add NUMA balancing primitivesMartin Schwidefsky
Define pte_protnone and pmd_protnone for NUMA memory migration. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-24mm: clarify that the function operates on hugepage pteAneesh Kumar K.V
We have confusing functions to clear pmd, pmd_clear_* and pmd_clear. Add _huge_ to pmdp_clear functions so that we are clear that they operate on hugepage pte. We don't bother about other functions like pmdp_set_wrprotect, pmdp_clear_flush_young, because they operate on PTE bits and hence indicate they are operating on hugepage ptes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24powerpc/mm: use generic version of pmdp_clear_flush()Aneesh Kumar K.V
Also move the pmd_trans_huge check to generic code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "Pretty boring for a merge window pull. One change in behaviour is the patch for dasd driver, the module which provides the diagnose discipline is now loaded automatically. The SCLP code got a nice cleanup, a new global structure replaces a bunch of accessor functions. And a couple of random, small improvements" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: improve handling of hotplug event 0x301 s390/setup: fix DMA_API_DEBUG warnings s390/zcrypt: remove obsolete __constant s390/keyboard: avoid off-by-one when using strnlen_user() s390/sclp: pass timeout as HZ independent value s390/mm: s/specifiation/specification/, s/an specification/a specification/ s390/sclp: Use DECLARE_BITMAP s390/dasd: Enable automatic loading of dasd_diag_mod s390/sclp: move sclp_facilities into "struct sclp" s390/sclp: get rid of sclp_get_mtid() and sclp_get_mtid_max() s390/sclp: unify basic sclp access by exposing "struct sclp" s390/sclp: prepare smp_fill_possible_mask for global "struct sclp"
2015-06-15s390/mm: s/specifiation/specification/, s/an specification/a specification/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-05-19s390/mm: correct return value of pmd_pfnMartin Schwidefsky
Git commit 152125b7a882df36a55a8eadbea6d0edf1461ee7 "s390/mm: implement dirty bits for large segment table entries" broke the pmd_pfn function, it changed the return value from 'unsigned long' to 'int'. This breaks all machine configurations with memory above the 8TB line. Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23s390/mm: change swap pte encoding and pgtable cleanupMartin Schwidefsky
After the file ptes have been removed the bit combination used to encode non-linear mappings can be reused for the swap ptes. This frees up a precious pte software bit. Reflect the change in the swap encoding in the comments and do some cleanup while we are at it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23s390/kvm: remove delayed reallocation of page tables for KVMMartin Schwidefsky
Replacing a 2K page table with a 4K page table while a VMA is active for the affected memory region is fundamentally broken. Rip out the page table reallocation code and replace it with a simple system control 'vm.allocate_pgste'. If the system control is set the page tables for all processes are allocated as full 4K pages, even for processes that do not need it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25s390: remove 31 bit supportHeiko Carstens
Remove the 31 bit support in order to reduce maintenance cost and effectively remove dead code. Since a couple of years there is no distribution left that comes with a 31 bit kernel. The 31 bit kernel also has been broken since more than a year before anybody noticed. In addition I added a removal warning to the kernel shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning message") which let everybody know about the plan to remove 31 bit code. We didn't get any response. Given that the last 31 bit only machine was introduced in 1999 let's remove the code. Anybody with 31 bit user space code can still use the compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-28mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED definesKirill A. Shutemov
Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page table levels folded. Usually, these defines are provided by <asm-generic/pgtable-nopmd.h> and <asm-generic/pgtable-nopud.h>. But some architectures fold page table levels in a custom way. They need to define these macros themself. This patch adds missing defines. The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc() and __pud_alloc() on architectures without these page table levels. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second set of updates from Andrew Morton: "More of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() vmstat: Reduce time interval to stat update on idle cpu mm/page_owner.c: remove unnecessary stack_trace field Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files mm: incorporate read-only pages into transparent huge pages vmstat: do not use deferrable delayed work for vmstat_update mm: more aggressive page stealing for UNMOVABLE allocations mm: always steal split buddies in fallback allocations mm: when stealing freepages, also take pages created by splitting buddy page mincore: apply page table walker on do_mincore() mm: /proc/pid/clear_refs: avoid split_huge_page() mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP) mempolicy: apply page table walker on queue_pages_range() arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() memcg: cleanup preparation for page table walk numa_maps: remove numa_maps->vma numa_maps: fix typo in gather_hugetbl_stats pagemap: use walk->vma instead of calling find_vma() clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk() ...
2015-02-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - The remaining patches for the z13 machine support: kernel build option for z13, the cache synonym avoidance, SMT support, compare-and-delay for spinloops and the CES5S crypto adapater. - The ftrace support for function tracing with the gcc hotpatch option. This touches common code Makefiles, Steven is ok with the changes. - The hypfs file system gets an extension to access diagnose 0x0c data in user space for performance analysis for Linux running under z/VM. - The iucv hvc console gets wildcard spport for the user id filtering. - The cacheinfo code is converted to use the generic infrastructure. - Cleanup and bug fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/process: free vx save area when releasing tasks s390/hypfs: Eliminate hypfs interval s390/hypfs: Add diagnose 0c support s390/cacheinfo: don't use smp_processor_id() in preemptible context s390/zcrypt: fixed domain scanning problem (again) s390/smp: increase maximum value of NR_CPUS to 512 s390/jump label: use different nop instruction s390/jump label: add sanity checks s390/mm: correct missing space when reporting user process faults s390/dasd: cleanup profiling s390/dasd: add locking for global_profile access s390/ftrace: hotpatch support for function tracing ftrace: let notrace function attribute disable hotpatching if necessary ftrace: allow architectures to specify ftrace compile options s390: reintroduce diag 44 calls for cpu_relax() s390/zcrypt: Add support for new crypto express (CEX5S) adapter. s390/zcrypt: Number of supported ap domains is not retrievable. s390/spinlock: add compare-and-delay to lock wait loops s390/tape: remove redundant if statement s390/hvc_iucv: add simple wildcard matches to the iucv allow filter ...
2015-02-11mm: make FIRST_USER_ADDRESS unsigned long on all archsKirill A. Shutemov
LKP has triggered a compiler warning after my recent patch "mm: account pmd page tables to the process": mm/mmap.c: In function 'exit_mmap': >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default] The code: > 2857 WARN_ON(mm_nr_pmds(mm) > 2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has the same type -- int. PUD_SHIFT. I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned long. On every arch for consistency. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10s390: drop pte_file()-related helpersKirill A. Shutemov
We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-22s390: avoid z13 cache aliasingMartin Schwidefsky
Avoid cache aliasing on z13 by aligning shared objects to multiples of 512K. The virtual addresses of a page from a shared file needs to have identical bits in the range 2^12 to 2^18. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: pmdp_get_and_clear_full optimizationMartin Schwidefsky
Analog to ptep_get_and_clear_full define a variant of the pmpd_get_and_clear primitive which gets the full hint from the mmu_gather struct. This allows s390 to avoid a costly instruction when destroying an address space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/ftrace,kprobes: allow to patch first instructionHeiko Carstens
If the function tracer is enabled, allow to set kprobes on the first instruction of a function (which is the function trace caller): If no kprobe is set handling of enabling and disabling function tracing of a function simply patches the first instruction. Either it is a nop (right now it's an unconditional branch, which skips the mcount block), or it's a branch to the ftrace_caller() function. If a kprobe is being placed on a function tracer calling instruction we encode if we actually have a nop or branch in the remaining bytes after the breakpoint instruction (illegal opcode). This is possible, since the size of the instruction used for the nop and branch is six bytes, while the size of the breakpoint is only two bytes. Therefore the first two bytes contain the illegal opcode and the last four bytes contain either "0" for nop or "1" for branch. The kprobes code will then execute/simulate the correct instruction. Instruction patching for kprobes and function tracer is always done with stop_machine(). Therefore we don't have any races where an instruction is patched concurrently on a different cpu. Besides that also the program check handler which executes the function trace caller instruction won't be executed concurrently to any stop_machine() execution. This allows to keep full fault based kprobes handling which generates correct pt_regs contents automatically. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: disable KSM for storage key enabled pagesDominik Dingel
When storage keys are enabled unmerge already merged pages and prevent new pages from being merged. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: prevent and break zero page mappings in case of storage keysDominik Dingel
As soon as storage keys are enabled we need to stop working on zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe against changes on entries we already fixed, as every fault would need to take the mmap_sem (read). Other vCPUs executing storage key instructions will get a one time interception and be serialized also with mmap_sem. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>