summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2013-04-25crypto: aesni_intel - add more optimized XTS mode for x86-64Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: x86/camellia-aesni-avx - add more optimized XTS codeJussi Kivilinna
Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage and small boost for speed. tcrypt results, with Intel i5-2450M: enc dec 16B 1.10x 1.01x 64B 0.82x 0.77x 256B 1.14x 1.10x 1024B 1.17x 1.16x 8192B 1.10x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of camellia-2way for block sized smaller than 256 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: cast6-avx: use new optimized XTS codeJussi Kivilinna
Change cast6-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.01x 1.01x 64B 1.01x 1.00x 256B 1.09x 1.02x 1024B 1.08x 1.06x 8192B 1.08x 1.07x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: x86/twofish-avx - use optimized XTS codeJussi Kivilinna
Change twofish-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.03x 1.02x 64B 0.91x 0.91x 256B 1.10x 1.09x 1024B 1.12x 1.11x 8192B 1.12x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of twofish-3way for block sized smaller than 128 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: x86 - add more optimized XTS-mode for serpent-avxJussi Kivilinna
This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: crc32-pclmul - Use gas macro for pclmulqdqSandy Wu
Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y. Older versions of bintuils do not support the pclmulqdq instruction. The PCLMULQDQ gas macro is used instead. Signed-off-by: Sandy Wu <sandyw@twitter.com> Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha512 - Create module providing optimized SHA512 routines using ↵Tim Chen
SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX ↵Tim Chen
instruction. Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX ↵Tim Chen
instructions. Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental ↵Tim Chen
SSE3 instructions. Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha256 - Create module providing optimized SHA256 routines using ↵Tim Chen
SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-03crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructionsTim Chen
Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-03crypto: sha256 - Optimized sha256 x86_64 assembly routine with AVX instructions.Tim Chen
Provides SHA256 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-03crypto: sha256 - Optimized sha256 x86_64 assembly routine using Supplemental ↵Tim Chen
SSE3 instructions. Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-03crypto: x86 - build AVX block cipher implementations only if assembler ↵Jussi Kivilinna
supports AVX instructions These modules require AVX support in assembler, so add new check to Makefile for this. Other option would be to use CONFIG_AS_AVX inside source files, but that would result dummy/empty/no-fuctionality modules being created. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-03crypto: x86/crc32-pclmul - assembly clean-ups: use ENTRY/ENDPROCJussi Kivilinna
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-03-10crypto: crc32c - Update the links to the white papers on CRC32C calculations ↵Tim Chen
with PCLMULQDQ instructions. Herbert, The following patch update the stale link to the CRC32C white paper that was referenced. Tim Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-03-10ARM: AT91SAM9G45: same platform data structure for all crypto peripheralsNicolas Royer
Only AES use DMA in AT91SAM9G45 (TDES and SHA use PDC). However latest Atmel TDES and SHA IP releases use DMA instead of PDC. --> Atmel TDES and SHA drivers need DMA platform data for those IP releases. Goal of this patch is to use the same platform data structure for all Atmel crypto peripherals. This structure contains information about DMA interface. Signed-off-by: Nicolas Royer <nicolas@eukrea.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Eric Bénard <eric@eukrea.com> Tested-by: Eric Bénard <eric@eukrea.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-02-26crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 optionHerbert Xu
This bool option can never be set to anything other than y. So let's just kill it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-02-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: "Here is the crypto update for 3.9: - Added accelerated implementation of crc32 using pclmulqdq. - Added test vector for fcrypt. - Added support for OMAP4/AM33XX cipher and hash. - Fixed loose crypto_user input checks. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits) crypto: user - ensure user supplied strings are nul-terminated crypto: user - fix empty string test in report API crypto: user - fix info leaks in report API crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding. crypto: use ERR_CAST crypto: atmel-aes - adjust duplicate test crypto: crc32-pclmul - Kill warning on x86-32 crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_* crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: aesni-intel - add ENDPROC statements for assembler functions crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets crypto: testmgr - add test vector for fcrypt ...
2013-02-25Merge tag 'please-pull-vm_unwrapped' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 update from Tony Luck: "ia64 vm patch series that was cooking in -mm tree" * tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: mm: use vm_unmapped_area() in hugetlbfs on ia64 architecture mm: use vm_unmapped_area() on ia64 architecture
2013-02-25Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Add option to not sign modules during modules_install MODSIGN: Add -s <signature> option to sign-file MODSIGN: Specify the hash algorithm on sign-file command line MODSIGN: Simplify Makefile with a Kconfig helper module: clean up load_module a little more. modpost: Ignore ARC specific non-alloc sections module: constify within_module_* taint: add explicit flag to show whether lock dep is still OK. module: printk message when module signature fail taints kernel.
2013-02-24Merge tag 'mfd-3.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFS updates from Samuel Ortiz: "This is the MFD pull request for the 3.9 merge window. No new drivers this time, but a bunch of fairly big cleanups: - Roger Quadros worked on a OMAP USBHS and TLL platform data consolidation, OMAP5 support and clock management code cleanup. - The first step of a major sync for the ab8500 driver from Lee Jones. In particular, the debugfs and the sysct interfaces got extended and improved. - Peter Ujfalusi sent a nice patchset for cleaning and fixing the twl-core driver, with a much needed module id lookup code improvement. - The regular wm5102 and arizona cleanups and fixes from Mark Brown. - Laxman Dewangan extended the palmas APIs in order to implement the palmas GPIO and rt drivers. - Laxman also added DT support for the tps65090 driver. - The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra and Darren Hart. - Linus Walleij patchset for the ab8500 driver allowed ab8500 and ab9540 based devices to switch to the new abx500 pin-ctrl driver. - The max8925 now has device tree and irqdomain support thanks to Qing Xu. - The recently added rtsx driver got a few cleanups and fixes for a better card detection code path and now also supports the RTS5227 chipset, thanks to Wei Wang and Roger Tseng." * tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (109 commits) mfd: lpc_ich: Use devres API to allocate private data mfd: lpc_ich: Add Device IDs for Intel Wellsburg PCH mfd: lpc_sch: Accomodate partial population of the MFD devices mfd: da9052-i2c: Staticize da9052_i2c_fix() mfd: syscon: Fix sparse warning mfd: twl-core: Fix kernel panic on boot mfd: rtsx: Fix issue that booting OS with SD card inserted mfd: ab8500: Fix compile error mfd: Add missing GENERIC_HARDIRQS dependecies Documentation: Add docs for max8925 dt mfd: max8925: Add dts mfd: max8925: Support dt for backlight mfd: max8925: Fix onkey driver irq base mfd: max8925: Fix mfd device register failure mfd: max8925: Add irqdomain for dt mfd: vexpress: Allow vexpress-sysreg to self-initialise mfd: rtsx: Support RTS5227 mfd: rtsx: Implement driving adjustment to device-dependent callbacks mfd: vexpress: Add pseudo-GPIO based LEDs mfd: ab8500: Rename ab8500 to abx500 for hwmon driver ...
2013-02-24Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - Some cleanups at V4L2 documentation - new drivers: ts2020 frontend, ov9650 sensor, s5c73m3 sensor, sh-mobile veu mem2mem driver, radio-ma901, davinci_vpfe staging driver - Lots of missing MAINTAINERS entries added - several em28xx driver improvements, including its conversion to videobuf2 - several fixups on drivers to make them to better comply with the API - DVB core: add support for DVBv5 stats, allowing the implementation of statistics for new standards like ISDB - mb86a20s: add statistics to the driver - lots of new board additions, cleanups, and driver improvements. * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (596 commits) [media] media: Add 0x3009 USB PID to ttusb2 driver (fixed diff) [media] rtl28xxu: Add USB IDs for Compro VideoMate U620F [media] em28xx: add usb id for terratec h5 rev. 3 [media] media: rc: gpio-ir-recv: add support for device tree parsing [media] mceusb: move check earlier to make smatch happy [media] radio-si470x doc: add info about v4l2-ctl and sox+alsa [media] staging: media: Remove unnecessary OOM messages [media] sh_vou: Use vou_dev instead of vou_file wherever possible [media] sh_vou: Use video_drvdata() [media] drivers/media/platform/soc_camera/pxa_camera.c: use devm_ functions [media] mt9t112: mt9t111 format set up differs from mt9t112 [media] sh-mobile-ceu-camera: fix SHARPNESS control default Revert "[media] fc0011: Return early, if the frequency is already tuned" [media] cx18/ivtv: fix regression: remove __init from a non-init function [media] em28xx: fix analog streaming with USB bulk transfers [media] stv0900: remove unnecessary null pointer check [media] fc0011: Return early, if the frequency is already tuned [media] fc0011: Add some sanity checks and cleanups [media] fc0011: Fix xin value clamping Revert "[media] [PATH,1/2] mxl5007 move reset to attach" ...
2013-02-24Merge tag 'stable/for-linus-3.9-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "This has two new ACPI drivers for Xen - a physical CPU offline/online and a memory hotplug. The way this works is that ACPI kicks the drivers and they make the appropiate hypercall to the hypervisor to tell it that there is a new CPU or memory. There also some changes to the Xen ARM ABIs and couple of fixes. One particularly nasty bug in the Xen PV spinlock code was fixed by Stefan Bader - and has been there since the 2.6.32! Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery" Fix up a few semantic conflicts with the ACPI interface changes in drivers/xen/xen-acpi-{cpu,mem}hotplug.c. * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: event channel arrays are xen_ulong_t and not unsigned long xen: Send spinlock IPI to all waiters xen: introduce xen_remap, use it instead of ioremap xen: close evtchn port if binding to irq fails xen-evtchn: correct comment and error output xen/tmem: Add missing %s in the printk statement. xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0 xen/acpi: ACPI cpu hotplug xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h xen/stub: driver for CPU hotplug xen/acpi: ACPI memory hotplug xen/stub: driver for memory hotplug xen: implement updated XENMEM_add_to_physmap_range ABI xen/smp: Move the common CPU init code a bit to prep for PVH patch.
2013-02-24Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...
2013-02-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...
2013-02-23Merge branch 'akpm' (more incoming from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...
2013-02-23ia64: use %ld to print pages calculated in nr_free_buffer_pagesZhang Yanfei
Now the function nr_free_buffer_pages returns unsigned long, so use %ld to print its return value. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23swap: add per-partition lock for swapfileShaohua Li
swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. [arnd@arndb.de: fix it for nommu] [hughd@google.com: add missing unlock] [minchan@kernel.org: get rid of lockdep whinge on sys_swapon] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23acpi, memory-hotplug: support getting hotplug info from SRATTang Chen
We now provide an option for users who don't want to specify physical memory address in kernel commandline. /* * For movablemem_map=acpi: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * hotpluggable: n y y n * movablemem_map: |_____| |_________| * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. */ So user just specify movablemem_map=acpi, and the kernel will use hotpluggable info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. If all the memory ranges in SRAT is hotpluggable, then no memory can be used by kernel. But before parsing SRAT, memblock has already reserve some memory ranges for other purposes, such as for kernel image, and so on. We cannot prevent kernel from using these memory. So we need to exclude these ranges even if these memory is hotpluggable. Furthermore, there could be several memory ranges in the single node which the kernel resides in. We may skip one range that have memory reserved by memblock, but if the rest of memory is too small, then the kernel will fail to boot. So, make the whole node which the kernel resides in un-hotpluggable. Then the kernel has enough memory to use. NOTE: Using this way will cause NUMA performance down because the whole node will be set as ZONE_MOVABLE, and kernel cannot use memory on it. If users don't want to lose NUMA performance, just don't use it. [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: use strcmp()] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23acpi, memory-hotplug: extend movablemem_map ranges to the end of nodeTang Chen
When implementing movablemem_map boot option, we introduced an array movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE. Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify the whole node memory range, we need to extend it to the node end so that we can use it to prevent memblock from allocating memory in the ranges user didn't specify. We now implement movablemem_map boot option like this: /* * For movablemem_map=nn[KMG]@ss[KMG]: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * user specified: |__| |___| * movablemem_map: |___| |_________| |______| ...... * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. * * NOTE: In this case, SRAT info will be ingored. */ [akpm@linux-foundation.org: clean up code, fix build warning] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23acpi, memory-hotplug: parse SRAT before memblock is readyTang Chen
On linux, the pages used by kernel could not be migrated. As a result, if a memory range is used by kernel, it cannot be hot-removed. So if we want to hot-remove memory, we should prevent kernel from using it. The way now used to prevent this is specify a memory range by movablemem_map boot option and set it as ZONE_MOVABLE. But when the system is booting, memblock will allocate memory, and reserve the memory for kernel. And before we parse SRAT, and know the node memory ranges, memblock is working. And it may allocate memory in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel, and never be freed. So, let's parse SRAT before memblock is called first. And it is early enough. The first call of memblock_find_in_range_node() is in: setup_arch() |-->setup_real_mode() so, this patch add a function early_parse_srat() to parse SRAT, and call it before setup_real_mode() is called. NOTE: 1) early_parse_srat() is called before numa_init(), and has initialized numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO NOT zero numa_meminfo in numa_init(), otherwise we will lose memory numa info. 2) I don't know why using count of memory affinities parsed from SRAT as a return value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to remember this count and use it as the return value of the new acpi_numa_init() [mhocko@suse.cz: parse SRAT before memblock is ready fix] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23x86: get pg_data_t's memory from other nodeYasuaki Ishimatsu
During the implementation of SRAT support, we met a problem. In setup_arch(), we have the following call series: 1) memblock is ready; 2) some functions use memblock to allocate memory; 3) parse ACPI tables, such as SRAT. Before 3), we don't know which memory is hotpluggable, and as a result, we cannot prevent memblock from allocating hotpluggable memory. So, in 2), there could be some hotpluggable memory allocated by memblock. Now, we are trying to parse SRAT earlier, before memblock is ready. But I think we need more investigation on this topic. So in this v5, I dropped all the SRAT support, and v5 is just the same as v3, and it is based on 3.8-rc3. As we planned, we will support getting info from SRAT without users' participation at last. And we will post another patch-set to do so. And also, I think for now, we can add this boot option as the first step of supporting movable node. Since Linux cannot migrate the direct mapped pages, the only way for now is to limit the whole node containing only movable memory. Using SRAT is one way. But even if we can use SRAT, users still need an interface to enable/disable this functionality if they don't want to loose their NUMA performance. So I think, a user interface is always needed. For now, users can disable this functionality by not specifying the boot option. Later, we will post SRAT support, and add another option value "movablecore_map=acpi" to using SRAT. This patch: If system can create movable node which all memory of the node is allocated as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid() to retry when the first allocation fails. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the nodeWen Congyang
When the node is offlined, there is no memory/cpu on the node. If a sleep task runs on a cpu of this node, it will be migrated to the cpu on the other node. So we can clear cpu-to-node mapping. [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23cpu_hotplug: clear apicid to node when the cpu is hotremovedWen Congyang
When a cpu is hotpluged, we call acpi_map_cpu2node() in _acpi_map_lsapic() to store the cpu's node and apicid's node. But we don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is hotremoved. If the node is also hotremoved, we will get the following messages: kernel BUG at include/linux/gfp.h:329! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB RIP: 0010:[<ffffffff811bc3fd>] [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP: 0018:ffff88078a049cf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246 RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0 R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003 FS: 00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650) Call Trace: new_slab+0x30/0x1b0 __slab_alloc+0x358/0x4c0 kmem_cache_alloc_node_trace+0xb4/0x1e0 alloc_fair_sched_group+0xd0/0x1b0 sched_create_group+0x3e/0x110 sched_autogroup_create_attach+0x4d/0x180 sys_setsid+0xd4/0xf0 system_call_fastpath+0x16/0x1b Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44 RIP [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP <ffff88078a049cf8> ---[ end trace adf84c90f3fea3e5 ]--- The reason is that the cpu's node is not NUMA_NO_NODE, we will call alloc_pages_exact_node() to alloc memory on the node, but the node is offlined. If the node is onlined, we still need cpu's node. For example: a task on the cpu is sleeped when the cpu is hotremoved. We will choose another cpu to run this task when it is waked up. If we know the cpu's node, we will choose the cpu on the same node first. So we should clear cpu-to-node mapping when the node is offlined. This patch only clears apicid-to-node mapping when the cpu is hotremoved. [akpm@linux-foundation.org: fix section error] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: remove memmap of sparse-vmemmapTang Chen
Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: remove page table of x86_64 architectureTang Chen
Search a page table about the removed memory, and clear page table for x86_64 architecture. [akpm@linux-foundation.org: make kernel_physical_mapping_remove() static] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: common APIs to support page tables hot-removeWen Congyang
When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmapYasuaki Ishimatsu
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: introduce new arch_remove_memory() for removing page tableWen Congyang
For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: remove flags argument to mmap_regionMichel Lespinasse
After the MAP_POPULATE handling has been moved to mmap_region() call sites, the only remaining use of the flags argument is to pass the MAP_NORESERVE flag. This can be just as easily handled by do_mmap_pgoff(), so do that and remove the mmap_region() flags parameter. [akpm@linux-foundation.org: remove double parens] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "So from the depth of frozen Minnesota, here's the powerpc pull request for 3.9. It has a few interesting highlights, in addition to the usual bunch of bug fixes, minor updates, embedded device tree updates and new boards: - Hand tuned asm implementation of SHA1 (by Paulus & Michael Ellerman) - Support for Doorbell interrupts on Power8 (kind of fast thread-thread IPIs) by Ian Munsie - Long overdue cleanup of the way we handle relocation of our open firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard - Support for saving/restoring & context switching the PPR (Processor Priority Register) on server processors that support it. This allows the kernel to preserve thread priorities established by userspace. By Haren Myneni. - DAWR (new watchpoint facility) support on Power8 by Michael Neuling - Ability to change the DSCR (Data Stream Control Register) which controls cache prefetching on a running process via ptrace by Alexey Kardashevskiy - Support for context switching the TAR register on Power8 (new branch target register meant to be used by some new specific userspace perf event interrupt facility which is yet to be enabled) by Ian Munsie. - Improve preservation of the CFAR register (which captures the origin of a branch) on various exception conditions by Paulus. - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where it belongs by Philippe De Muyter - Support for Transactional Memory on Power8 by Michael Neuling (based on original work by Matt Evans). For those curious about the feature, the patch contains a pretty good description." (See commit db8ff907027b: "powerpc: Documentation for transactional memory on powerpc" for the mentioned description added to the file Documentation/powerpc/transactional_memory.txt) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits) powerpc/kexec: Disable hard IRQ before kexec powerpc/85xx: l2sram - Add compatible string for BSC9131 platform powerpc/85xx: bsc9131 - Correct typo in SDHC device node powerpc/e500/qemu-e500: enable coreint powerpc/mpic: allow coreint to be determined by MPIC version powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct powerpc/85xx: Board support for ppa8548 powerpc/fsl: remove extraneous DIU platform functions arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test powerpc: Documentation for transactional memory on powerpc powerpc: Add transactional memory to pseries and ppc64 defconfigs powerpc: Add config option for transactional memory powerpc: Add transactional memory to POWER8 cpu features powerpc: Add new transactional memory state to the signal context powerpc: Hook in new transactional memory code powerpc: Routines for FP/VSX/VMX unavailable during a transaction powerpc: Add transactional memory unavaliable execption handler powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes powerpc: Add FP/VSX and VMX register load functions for transactional memory powerpc: Add helper functions for transactional memory context switching ...
2013-02-24powerpc/kexec: Disable hard IRQ before kexecPhileas Fogg
Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin Pull small blackfin update from Bob Liu. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin: blackfin: time-ts: Remove duplicate assignment blackfin: pm: fix build error blackfin: sync data in blackfin write buffer blackfin: use bitmap library functions blackfin: mem_init: update dmc config register
2013-02-22Merge branch 'parisc-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller. The bulk of this is optimized page coping/clearing and cache flushing (virtual caches are lovely) by John David Anglin. * 'parisc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (31 commits) arch/parisc/include/asm: use ARRAY_SIZE macro in mmzone.h parisc: remove empty lines and unnecessary #ifdef coding in include/asm/signal.h parisc: sendfile and sendfile64 syscall cleanups parisc: switch to available compat_sched_rr_get_interval implementation parisc: fix fallocate syscall parisc: fix error return codes for rt_sigaction and rt_sigprocmask parisc: convert msgrcv and msgsnd syscalls to use compat layer parisc: correctly wire up mq_* functions for CONFIG_COMPAT case parisc: fix personality on 32bit kernel parisc: wire up process_vm_readv, process_vm_writev, kcmp and finit_module syscalls parisc: led driver requires CONFIG_VM_EVENT_COUNTERS parisc: remove unused compat_rt_sigframe.h header parisc/mm/fault.c: Port OOM changes to do_page_fault parisc: space register variables need to be in native length (unsigned long) parisc: fix ptrace breakage parisc: always detect multiple physical ranges parisc: ensure that mmapped shared pages are aligned at SHMLBA addresses parisc: disable preemption while flushing D- or I-caches through TMPALIAS region parisc: remove IRQF_DISABLED parisc: fixes and cleanups in page cache flushing (4/4) ...
2013-02-22Merge branch 'x86/microcode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading update from Peter Anvin: "This patchset lets us update the CPU microcode very, very early in initialization if the BIOS fails to do so (never happens, right?) This is handy for dealing with things like the Atom erratum where we have to run without PSE because microcode loading happens too late. As I mentioned in the x86/mm push request it depends on that infrastructure but it is otherwise a standalone feature." * 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Make early microcode loading a configuration feature x86/mm/init.c: Copy ucode from initrd image to kernel memory x86/head64.c: Early update ucode in 64-bit x86/head_32.S: Early update ucode in 32-bit x86/microcode_intel_early.c: Early update ucode on Intel's CPU x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() x86/microcode_intel_lib.c: Early update ucode on Intel's CPU x86/microcode_core_early.c: Define interfaces for early loading ucode x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP x86/common.c: Make have_cpuid_p() a global function x86/microcode_intel.h: Define functions and macros for early loading ucode x86, doc: Documentation for early microcode loading
2013-02-22x86-64, xen, mmu: Provide an early version of write_cr3.Konrad Rzeszutek Wilk
With commit 8170e6bed465 ("x86, 64bit: Use a #PF handler to materialize early mappings on demand") we started hitting an early bootup crash where the Xen hypervisor would inform us that: (XEN) d7:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffea000005b2d0: (XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]---- .. that Xen was unable to context switch back to dom0. Looking at the calling stack we find: [<ffffffff8103feba>] xen_get_user_pgd+0x5a <-- [<ffffffff8103feba>] xen_get_user_pgd+0x5a [<ffffffff81042d27>] xen_write_cr3+0x77 [<ffffffff81ad2d21>] init_mem_mapping+0x1f9 [<ffffffff81ac293f>] setup_arch+0x742 [<ffffffff81666d71>] printk+0x48 We are trying to figure out whether we need to up-date the user PGD as well. Please keep in mind that under 64-bit PV guests we have a limited amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel and user-space. As such the Linux pvops'fied version of write_cr3 checks if it has to update the user-space cr3 as well. That clearly is not needed during early bootup. The recent changes (see above git commit) streamline the x86 page table allocation to be much simpler (And also incidentally the #PF handler ends up in spirit being similar to how the Xen toolstack sets up the initial page-tables). The fix is to have an early-bootup version of cr3 that just loads the kernel %cr3. The later version - which also handles user-page modifications will be used after the initial page tables have been setup. [ hpa: removed a redundant #ifdef and made the new function __init. Also note that x86-32 already has such an early xen_write_cr3. ] Tested-by: "H. Peter Anvin" <hpa@zytor.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22mm: use vm_unmapped_area() in hugetlbfs on ia64 architectureMichel Lespinasse
Update the ia64 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-02-22mm: use vm_unmapped_area() on ia64 architectureMichel Lespinasse
Update the ia64 arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>