summaryrefslogtreecommitdiff
path: root/arch/x86/mm/pat.c
AgeCommit message (Collapse)Author
2010-08-06Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Ioremap: fix wrong physical address handling in PAT code x86, tlb: Clean up and correct used type x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86, mm: Create symbolic index into address_markers array x86, ioremap: Fix normal ram range check x86, ioremap: Fix incorrect physical address handling in PAE mode x86-64, mm: Initialize VDSO earlier on 64 bits x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
2010-07-29x86: Ioremap: fix wrong physical address handling in PAT codeYasuaki Ishimatsu
The following two commits fixed a problem that x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. ffa71f33a820d1ab3f2fc5723819ac60fb76080b (x86, ioremap: Fix incorrect physical address handling in PAE mode) 35be1b716a475717611b2dc04185e9d80b9cb693 (x86, ioremap: Fix normal ram range check) But these fixes are not enough, since pat_pagerange_is_ram() in PAT code also has a same problem. This patch fixes it. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C47DDCF.80300@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-06-11x86, pat: Proper init of memtype subtree_max_endVenkatesh Pallipadi
subtree_max_end that was recently added to struct memtype was not getting properly initialized resulting in WARNING: kmemcheck: Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb() reported here https://bugzilla.kernel.org/show_bug.cgi?id=16092 This change fixes the problem. Reported-by: Christian Casteyde <casteyde.christian@free.fr> Tested-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-05-26x86, pat: Fix memory leak in free_memtypeXiaotian Feng
Reserve_memtype will allocate memory for new memtype, but in free_memtype, after the memtype erased from rbtree, the memory is not freed. Changes since V1: make rbt_memtype_erase return erased memtype so that it can be freed in free_memtype. [ hpa: not for -stable: 2.6.34 and earlier not affected ] Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-18Merge branch 'x86-pat-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Update the page flags for memtype atomically instead of using memtype_lock x86, pat: In rbt_memtype_check_insert(), update new->type only if valid x86, pat: Migrate to rbtree only backend for pat memtype management x86, pat: Preparatory changes in pat.c for bigger rbtree change rbtree: Add support for augmented rbtrees
2010-04-23x86, pat: Update the page flags for memtype atomically instead of using ↵Robin Holt
memtype_lock While testing an application using the xpmem (out of kernel) driver, we noticed a significant page fault rate reduction of x86_64 with respect to ia64. For one test running with 32 cpus, one thread per cpu, it took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages. For the same test running on 256 cpus, one thread per cpu, it took 14:48 to vm_insert_pfn 2 GB worth of pages. The slowdown was tracked to lookup_memtype which acquires the spinlock memtype_lock. This heavily contended lock was slowing down vm_insert_pfn(). With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu cases take approx 00:01.3 seconds to complete. Signed-off-by: Robin Holt <holt@sgi.com> LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Cc: Rafael Wysocki <rjw@novell.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-18x86, pat: Migrate to rbtree only backend for pat memtype managementPallipadi, Venkatesh
Move pat backend to fully rbtree based implementation from the existing rbtree and linked list hybrid. New rbtree based solution uses interval trees (augmented rbtrees) in order to store the PAT ranges. The new code seprates out the pat backend to pat_rbtree.c file, making is cleaner. The change also makes the PAT lookup, reserve and free operations more optimal, as we don't have to traverse linear linked list of few tens of entries in normal case. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210232607.GB11465@linux-os.sc.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-18x86, pat: Preparatory changes in pat.c for bigger rbtree changevenkatesh.pallipadi@intel.com
Minor changes in pat.c to cleanup code and make it smoother to introduce bigger rbtree only change in the following patch. The changes are cleaup only and should not have any functional impact. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210195909.792781000@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10vfs: Implement proper O_SYNC semanticsChristoph Hellwig
While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-08Merge branch 'x86-pat-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: pat: Remove ioremap_default() x86: pat: Clean up req_type special case for reserve_memtype() x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
2009-11-26x86/pat: Trivial: don't create debugfs for memtype if pat is disabledXiaotian Feng
If pat is disabled (boot with nopat), there's no need to create debugfs for it, it's empty all the time. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-23x86, platform: Change is_untracked_pat_range() to bool; cleanup initH. Peter Anvin
- Change is_untracked_pat_range() to return bool. - Clean up the initialization of is_untracked_pat_range() -- by default, we simply point it at is_ISA_range() directly. - Move is_untracked_pat_range to the end of struct x86_platform, since it is the newest field. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com>
2009-11-23x86, mm: is_untracked_pat_range() takes a normal semiclosed rangeH. Peter Anvin
is_untracked_pat_range() -- like its components, is_ISA_range() and is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the PAT code called it as if it took a closed range (>=, <=). Fix. Although this is a bug, I believe it is non-manifest, simply because none of the callers will call this with non-page-aligned addresses. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091119202341.GA4420@sgi.com>
2009-11-23x86: UV SGI: Don't track GRU space in PATJack Steiner
GRU space is always mapped as WB in the page table. There is no need to track the mappings in the PAT. This also eliminates the "freeing invalid memtype" messages when the GRU space is unmapped. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com> [ v2: fix build failure ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10x86: pat: Clean up req_type special case for reserve_memtype()Xiaotian Feng
Commit: b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated code in pat_x_mtrr_type() and reserve_memtype(), which removed the special case (req_type is -1) for the PAT-enabled part. We should also change comments and the PAT-disabled part. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24x86: Reduce verbosity of "PAT enabled" kernel messageRoland Dreier
On modern systems, the kernel prints the message x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 once for every CPU. This gets kind of ridiculous on huge systems; for example, on a 64-thread system I was lucky enough to get: dmesg| grep 'PAT enabled' | wc 64 704 5174 There is already a BUG() if non-boot CPUs have PAT capabilities that don't match the boot CPU, so just print the message on the boot CPU. (I kept the print after the wrmsrl() that enables PAT, so that the log output continues to mean that the system survived enabling PAT on the boot CPU) Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <adavdj92sso.fsf@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: don't use rb-tree based lookup in reserve_memtype() x86: Increase MIN_GAP to include randomized stack
2009-09-17x86, pat: don't use rb-tree based lookup in reserve_memtype()Suresh Siddha
Recent enhancement of rb-tree based lookup exposed a bug with the lookup mechanism in the reserve_memtype() which ensures that there are no conflicting memtype requests for the memory range. memtype_rb_search() returns an entry which has a start address <= new start address. And from here we traverse the linear linked list to check if there any conflicts with the existing mappings. As the rbtree is based on the start address of the memory range, it is quite possible that we have several overlapped mappings whose start address is much less than new requested start but the end is >= new requested end. This results in conflicting memtype mappings. Same bug exists with the old code which uses cached_entry from where we traverse the linear linked list. But the new rb-tree code exposes this bug fairly easily. For now, don't use the memtype_rb_search() and always start the search from the head of linear linked list in reserve_memtype(). Linear linked list for most of the systems grow's to few 10's of entries(as we track memory type of RAM pages using struct page). So we should be ok for now. We still retain the rbtree and use it to speed up free_memtype() which doesn't have the same bug(as we know what exactly we are searching for in free_memtype). Also use list_for_each_entry_from() in free_memtype() so that we start the search from rb-tree lookup result. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-15Merge branch 'x86-pat-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Fix cacheflush address in change_page_attr_set_clr() mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set x86: Fix earlyprintk=dbgp for machines without NX x86, pat: Sanity check remap_pfn_range for RAM region x86, pat: Lookup the protection from memtype list on vm_insert_pfn() x86, pat: Add lookup_memtype to get the current memtype of a paddr x86, pat: Use page flags to track memtypes of RAM pages x86, pat: Generalize the use of page flag PG_uncached x86, pat: Add rbtree to do quick lookup in memtype tracking x86, pat: Add PAT reserve free to io_mapping* APIs x86, pat: New i/f for driver to request memtype for IO regions x86, pat: ioremap to follow same PAT restrictions as other PAT users x86, pat: Keep identity maps consistent with mmaps even when pat_disabled x86, mtrr: make mtrr_aps_delayed_init static bool x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled x86: Fix an incorrect argument of reserve_bootmem() x86: Fix system crash when loading with "reservetop" parameter
2009-09-14Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Make memtype_seq_ops const x86: uv: Clean up uv_ptc_init(), use proc_create() x86: Use printk_once() x86/cpu: Clean up various files a bit x86: Remove duplicated #include x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number x86: Further clean up of mtrr/generic.c x86: Clean up mtrr/main.c x86: Clean up mtrr/state.c x86: Clean up mtrr/mtrr.h x86: Clean up mtrr/if.c x86: Clean up mtrr/generic.c x86: Clean up mtrr/cyrix.c x86: Clean up mtrr/cleanup.c x86: Clean up mtrr/centaur.c x86: Clean up mtrr/amd.c: x86: ds.c fix invalid assignment
2009-09-06x86: Make memtype_seq_ops constTobias Klauser
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26Merge branch 'x86/urgent' into x86/patH. Peter Anvin
Reason: Change to is_new_memtype_allowed() in x86/urgent Resolved semantic conflicts in: arch/x86/mm/pat.c arch/x86/mm/ioremap.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Sanity check remap_pfn_range for RAM regionVenkatesh Pallipadi
Add sanity check for remap_pfn_range of RAM regions using lookup_memtype(). Previously, we did not have anyway to get the type of RAM memory regions as they were tracked using a single bit in page_struct (WB, nonWB). Now we can get the actual type from page struct (WB, WC, UC_MINUS) and make sure the requester gets that type. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Lookup the protection from memtype list on vm_insert_pfn()Venkatesh Pallipadi
Lookup the reserved memtype during vm_insert_pfn and use that memtype for the new mapping. This takes care or handling of vm_insert_pfn() interface in track_pfn_vma*/untrack_pfn_vma. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Add lookup_memtype to get the current memtype of a paddrVenkatesh Pallipadi
Add a new routine lookup_memtype() to get the current memtype based on the PAT reserves and frees. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Use page flags to track memtypes of RAM pagesVenkatesh Pallipadi
Change reserve_ram_pages_type and free_ram_pages_type to use 2 page flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking just tracked WB or NonWB, which was not complete and did not allow tracking of RAM fully and there was no way to get the actual type reserved by looking at the page flags. We use the memtype_lock spinlock for atomicity in dealing with memtype tracking in struct page. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Add rbtree to do quick lookup in memtype trackingVenkatesh Pallipadi
PAT memtype tracking uses a linear link list to keep track of IO (non-RAM) regions and their memtypes. The code used a last_accessed pointer as a cache to speedup the lookup. As per discussions with H. Peter Anvin a while back, having a rbtree here will avoid bad performances in pathological cases where we may end up with huge linked list. This may not add any noticable performance speedup in normal case as the number of entires in PAT memtype list tend to be ~20-30 range. The patch removes the "cached_entry" logic as with rbtree we have more generic way of speeding up the lookup. With this patch, we use rbtree to do the quick lookup. We still use linked list as the memtype range tracked can be of different sizes and can overlap in different ways. We also keep track of usage counts with linked list. Example: Multiple ioremaps with different sizes uncached-minus @ 0xfffff00000-0xfffff04000 uncached-minus @ 0xfffff02000-0xfffff03000 And one userlevel mmap and the thread forks a new process uncached-minus @ 0xbf453000-0xbf454000 uncached-minus @ 0xbf453000-0xbf454000 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: New i/f for driver to request memtype for IO regionsVenkatesh Pallipadi
Add new routines to request memtype for IO regions. This will currently be a backend for io_mapping_* routines. But, it can also be made available to drivers directly in future, in case it is needed. reserve interface reserves the memory, makes sure we have a compatible memory type available and keeps the identity map in sync when needed. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26x86, pat: Keep identity maps consistent with mmaps even when pat_disabledVenkatesh Pallipadi
Make reserve_memtype internally take care of pat disabled case and fallback to default return values. Remove the specific pat_disabled checks in track_* routines. Change kernel_map_sync_memtype to sync identity map even when pat_disabled. This change ensures that, even for pat_disabled case, we take care of keeping identity map in sync. Before this patch, in pat disabled case, ioremap() keeps the identity maps in sync and other APIs like pci and /dev/mem mmap don't, which is not a very consistent behavior. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-17x86, pat: Allow ISA memory range uncacheable mapping requestsSuresh Siddha
Max Vozeler reported: > Bug 13877 - bogl-term broken with CONFIG_X86_PAT=y, works with =n > > strace of bogl-term: > 814 mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) > = -1 EAGAIN (Resource temporarily unavailable) > 814 write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n", > 57) = 57 PAT code maps the ISA memory range as WB in the PAT attribute, so that fixed range MTRR registers define the actual memory type (UC/WC/WT etc). But the upper level is_new_memtype_allowed() API checks are failing, as the request here is for UC and the return tracked type is WB (Tracked type is WB as MTRR type for this legacy range potentially will be different for each 4k page). Fix is_new_memtype_allowed() by always succeeding the ISA address range checks, as the null PAT (WB) and def MTRR fixed range register settings satisfy the memory type needs of the applications that map the ISA address range. Reported-and-Tested-by: Max Vozeler <xam@debian.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix microcode driver newly spewing warnings x86, PAT: Remove page granularity tracking for vm_insert_pfn maps x86: disable X86_PTRACE_BTS for now x86, documentation: kernel-parameters replace X86-32,X86-64 with X86 x86: pci-swiotlb.c swiotlb_dma_ops should be static x86, PAT: Remove duplicate memtype reserve in devmem mmap x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() x86, PAT: Changing memtype to WC ensuring no WB alias x86, PAT: Handle faults cleanly in set_memory_ APIs x86, PAT: Change order of cpa and free in set_memory_wb x86, CPA: Change idmap attribute before ioremap attribute setup
2009-04-17x86, PAT: Remove page granularity tracking for vm_insert_pfn mapsPallipadi, Venkatesh
This change resolves the problem of too many single page entries in pat_memtype_list and "freeing invalid memtype" errors with i915, reported here: http://marc.info/?l=linux-kernel&m=123845244713183&w=2 Remove page level granularity track and untrack of vm_insert_pfn. memtype tracking at page granularity does not scale and cleaner approach would be for the driver to request a type for a bigger IO address range or PCI io memory range for that device, either at mmap time or driver init time and just use that type during vm_insert_pfn. This patch just removes the track/untrack of vm_insert_pfn. That means we will be in same state as 2.6.28, with respect to these APIs. Newer APIs for the drivers to request a memtype for a bigger region is coming soon. [ Impact: fix Xorg startup warnings and hangs ] Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <20090408223716.GC3493@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12x86: fix wrong section of pat_disable & make it staticMarcin Slusarz
pat_disable cannot be __cpuinit anymore because it's called from pat_init and the callchain looks like this: pat_disable [cpuinit] <- pat_init <- generic_set_all <- ipi_handler <- set_mtrr <- (other non init/cpuinit functions) WARNING: arch/x86/mm/built-in.o(.text+0x449e): Section mismatch in reference from the function pat_init() to the function .cpuinit.text:pat_disable() The function pat_init() references the function __cpuinit pat_disable(). This is often because pat_init lacks a __cpuinit annotation or the annotation of pat_disable is wrong. Non CONFIG_X86_PAT version of pat_disable is static inline, so this version can be static too (and there are no callers outside of this file). Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <49DFB055.6070405@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86, PAT: Remove duplicate memtype reserve in devmem mmapSuresh Siddha
/dev/mem mmap code was doing memtype reserve/free for a while now. Recently we added memtype tracking in remap_pfn_range, and /dev/mem mmap uses it indirectly. So, we don't need seperate tracking in /dev/mem code any more. That means another ~100 lines of code removed :-). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212709.085210000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()Suresh Siddha
Fix pat_x_mtrr_type() to use UC_MINUS when the mtrr type return UC. This is to be consistent with ioremap() and ioremap_nocache() which uses UC_MINUS. Consolidate the code such that reserve_memtype() also uses pat_x_mtrr_type() when the caller doesn't specify any special attribute (non WB attribute). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212708.939936000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14Merge branches 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/debug', ↵Ingo Molnar
'x86/kconfig', 'x86/mm', 'x86/ptrace', 'x86/setup' and 'x86/urgent'; commit 'v2.6.29-rc8' into x86/core
2009-03-13VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoffPallipadi, Venkatesh
Impact: fix false positive PAT warnings - also fix VirtalBox hang Use of vma->vm_pgoff to identify the pfnmaps that are fully mapped at mmap time is broken. vm_pgoff is set by generic mmap code even for cases where drivers are setting up the mappings at the fault time. The problem was originally reported here: http://marc.info/?l=linux-kernel&m=123383810628583&w=2 Change is_linear_pfn_mapping logic to overload VM_INSERTPAGE flag along with VM_PFNMAP to mean full PFNMAP setup at mmap time. Problem also tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=12800 Reported-by: Thomas Hellstrom <thellstrom@vmware.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha>@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: "ebiederm@xmission.com" <ebiederm@xmission.com> Cc: <stable@kernel.org> # only for 2.6.29.1, not .28 LKML-Reference: <20090313004527.GA7176@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-01Merge branch 'x86/urgent' into x86/patIngo Molnar
2009-02-28x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()Ingo Molnar
Impact: build fix Theodore Ts reported that the i915 driver needs these symbols: ERROR: "pgprot_writecombine" [drivers/gpu/drm/i915/i915.ko] undefined! ERROR: "is_io_mapping_possible" [drivers/gpu/drm/i915/i915.ko] undefined! Reported-by: Theodore Ts'o <tytso@mit.edu> wrote: Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26Merge branches 'x86/urgent' and 'x86/pat' into x86/coreIngo Molnar
Conflicts: arch/x86/include/asm/pat.h
2009-02-25gpu/drm, x86, PAT: routine to keep identity map in syncVenkatesh Pallipadi
Add a function to check and keep identity maps in sync, when changing any memory type. One of the follow on patches will also use this routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Eric Anholt <eric@anholt.net> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-13Merge branches 'x86/paravirt', 'x86/pat', 'x86/setup-v2', 'x86/subarch', ↵Ingo Molnar
'x86/uaccess' and 'x86/urgent' into x86/core
2009-02-12x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/memSuresh Siddha
Jeff Mahoney reported: > With Suse's hwinfo tool, on -tip: > WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d() reserve_pfn_range() is not tracking the memory range below 1MB as non-RAM and as such is inconsistent with similar checks in reserve_memtype() and free_memtype() Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the "track legacy 1MB region as non RAM" condition. And also, fix reserve_pfn_range() to return -EINVAL, when the pfn range is RAM. This is to be consistent with this API design. Reported-and-tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-28Merge branches 'x86/asm', 'x86/cleanups', 'x86/cpudetect', 'x86/debug', ↵Ingo Molnar
'x86/doc', 'x86/header-fixes', 'x86/mm', 'x86/paravirt', 'x86/pat', 'x86/setup-v2', 'x86/subarch', 'x86/uaccess' and 'x86/urgent' into x86/core
2009-01-26Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) xen: unitialised return value in xenbus_write_transaction x86: fix section mismatch warning x86: unmask CPUID levels on Intel CPUs, fix x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn. x86: use standard PIT frequency xen: handle highmem pages correctly when shrinking a domain x86, mm: fix pte_free() xen: actually release memory when shrinking domain x86: unmask CPUID levels on Intel CPUs x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> x86: fix PTE corruption issue while mapping RAM using /dev/mem x86: mtrr fix debug boot parameter x86: fix page attribute corruption with cpa() Revert "x86: signal: change type of paramter for sys_rt_sigreturn()" x86: use early clobbers in usercopy*.c x86: remove kernel_physical_mapping_init() from init section fix: crash: IP: __bitmap_intersects+0x48/0x73 cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write work_on_cpu: Use our own workqueue. work_on_cpu: don't try to get_online_cpus() in work_on_cpu. ...
2009-01-23x86: handle PAT more like other CPU featuresH. Peter Anvin
Impact: Cleanup When PAT was originally introduced, it was handled specially for a few reasons: - PAT bugs are hard to track down, so we wanted to maintain a whitelist of CPUs. - The i386 and x86-64 CPUID code was not yet unified. Both of these are now obsolete, so handle PAT like any other features, including ordinary feature blacklisting due to known bugs. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-21x86: fix PTE corruption issue while mapping RAM using /dev/memSuresh Siddha
Beschorner Daniel reported: > hwinfo problem since 2.6.28, showing this in the oops: > Corrupted page table at address 7fd04de3ec00 Also, PaX Team reported a regression with this commit: > commit 9542ada803198e6eba29d3289abb39ea82047b92 > Author: Suresh Siddha <suresh.b.siddha@intel.com> > Date: Wed Sep 24 08:53:33 2008 -0700 > > x86: track memtype for RAM in page struct This commit breaks mapping any RAM page through /dev/mem, as the reserve_memtype() was not initializing the return attribute type and as such corrupting the PTE entry that was setup with the return attribute type. Because of this bug, application mapping this RAM page through /dev/mem will die with "Corrupted page table at address xxxx" message in the kernel log and also the kernel identity mapping which maps the underlying RAM page gets converted to UC. Fix this by initializing the return attribute type before calling reserve_ram_pages_type() Reported-by: PaX Team <pageexec@freemail.hu> Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com> Tested-and-Acked-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15Revert "x86 PAT: remove CPA WARN_ON for zero pte"Linus Torvalds
This reverts commit 58dab916dfb57328d50deb0aa9b3fc92efa248ff, which makes my Nehalem come to a nasty crawling almost-halt. It looks like it turns off caching of regular kernel RAM, with the understandable slowdown of a few orders of magnitude as a result. Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14x86, pat: fix reserve_memtype() for legacy 1MB rangeSuresh Siddha
Thierry Vignaud reported: > http://bugzilla.kernel.org/show_bug.cgi?id=12372 > > On P4 with an SiS motherboard (video card is a SiS 651) > X server fails to start with error: > xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid > argument) Here X is trying to map first 8KB of memory using /dev/mem. Existing code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent code changes don't allow to map memory with different attributes at the same time. Fix this by treating the first 1MB legacy region as special and always track the attribute requests with in this region using linear linked list (and don't bother if the range is RAM or non-RAM or mixed) Reported-and-tested-by: Thierry Vignaud <tvignaud@mandriva.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>