summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2011-06-23Revert "x86, efi: Retain boot service code until after switching to virtual ↵Greg Kroah-Hartman
mode" This reverts commit 0aed459e8487eb6ebdb4efe8cefe1eafbc704b30, which was commit 916f676f8dc016103f983c7ec54c18ecdbb6e349 upstream. It breaks some people's machines, so this will all get worked out in the 3.0 kernel release, it's not quite ready for 2.6.39 just yet. Thanks to Maarten Lankhorst <m.b.lankhorst@gmail.com> for reporting the issue. Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com> Cc: Jim Bos <jim876@xs4all.nl> Cc: Matthew Garrett <mjg@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23usb-storage: redo incorrect readsAlan Stern
commit 21c13a4f7bc185552c4b402b792c3bbb9aa69df0 upstream. Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Sven Geggus <sven-usbst@geggus.net> Tested-by: Paul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03block: don't block events on excl write for non-optical devicesTejun Heo
commit d4dc210f69bcb0b4bef5a83b1c323817be89bad1 upstream. Disk event code automatically blocks events on excl write. This is primarily to avoid issuing polling commands while burning is in progress. This behavior doesn't fit other types of devices with removeable media where polling commands don't have adverse side effects and door locking usually doesn't exist. This patch introduces new genhd flag which controls the auto-blocking behavior and uses it to enable auto-blocking only on optical devices. Note for stable: 2.6.38 and later only Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03idle governor: Avoid lock acquisition to read pm_qos before entering idleTim Chen
commit 333c5ae9948194428fe6c5ef5c088304fc98263b upstream. Thanks to the reviews and comments by Rafael, James, Mark and Andi. Here's version 2 of the patch incorporating your comments and also some update to my previous patch comments. I noticed that before entering idle state, the menu idle governor will look up the current pm_qos target value according to the list of qos requests received. This look up currently needs the acquisition of a lock to access the list of qos requests to find the qos target value, slowing down the entrance into idle state due to contention by multiple cpus to access this list. The contention is severe when there are a lot of cpus waking and going into idle. For example, for a simple workload that has 32 pair of processes ping ponging messages to each other, where 64 cpu cores are active in test system, I see the following profile with 37.82% of cpu cycles spent in contention of pm_qos_lock: - 37.82% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave - _raw_spin_lock_irqsave - 95.65% pm_qos_request menu_select cpuidle_idle_call - cpu_idle 99.98% start_secondary A better approach will be to cache the updated pm_qos target value so reading it does not require lock acquisition as in the patch below. With this patch the contention for pm_qos_lock is removed and I saw a 2.2X increase in throughput for my message passing workload. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: James Bottomley <James.Bottomley@suse.de> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03seqlock: Don't smp_rmb in seqlock reader spin loopMilton Miller
commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream. Move the smp_rmb after cpu_relax loop in read_seqlock and add ACCESS_ONCE to make sure the test and return are consistent. A multi-threaded core in the lab didn't like the update from 2.6.35 to 2.6.36, to the point it would hang during boot when multiple threads were active. Bisection showed af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents: Remove the per cpu tick skew) as the culprit and it is supported with stack traces showing xtime_lock waits including tick_do_update_jiffies64 and/or update_vsyscall. Experimentation showed the combination of cpu_relax and smp_rmb was significantly slowing the progress of other threads sharing the core, and this patch is effective in avoiding the hang. A theory is the rmb is affecting the whole core while the cpu_relax is causing a resource rebalance flush, together they cause an interfernce cadance that is unbroken when the seqlock reader has interrupts disabled. At first I was confused why the refactor in 3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise seqlock) didn't affect this patch application, but after some study that affected seqcount not seqlock. The new seqcount was not factored back into the seqlock. I defer that the future. While the removal of the timer interrupt offset created contention for the xtime lock while a cpu does the additonal work to update the system clock, the seqlock implementation with the tight rmb spin loop goes back much further, and is just waiting for the right trigger. Signed-off-by: Milton Miller <miltonm@bga.com> Cc: <linuxppc-dev@lists.ozlabs.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03x86, efi: Retain boot service code until after switching to virtual modeMatthew Garrett
commit 916f676f8dc016103f983c7ec54c18ecdbb6e349 upstream. UEFI stands for "Unified Extensible Firmware Interface", where "Firmware" is an ancient African word meaning "Why do something right when you can do it so wrong that children will weep and brave adults will cower before you", and "UEI" is Celtic for "We missed DOS so we burned it into your ROMs". The UEFI specification provides for runtime services (ie, another way for the operating system to be forced to depend on the firmware) and we rely on these for certain trivial tasks such as setting up the bootloader. But some hardware fails to work if we attempt to use these runtime services from physical mode, and so we have to switch into virtual mode. So far so dreadful. The specification makes it clear that the operating system is free to do whatever it wants with boot services code after ExitBootServices() has been called. SetVirtualAddressMap() can't be called until ExitBootServices() has been. So, obviously, a whole bunch of EFI implementations call into boot services code when we do that. Since we've been charmingly naive and trusted that the specification may be somehow relevant to the real world, we've already stuffed a picture of a penguin or something in that address space. And just to make things more entertaining, we've also marked it non-executable. This patch allocates the boot services regions during EFI init and makes sure that they're executable. Then, after SetVirtualAddressMap(), it discards them and everyone lives happily ever after. Except for the ones who have to work on EFI, who live sad lives haunted by the knowledge that someone's eventually going to write yet another firmware specification. [ hpa: adding this to urgent with a stable tag since it fixes currently-broken hardware. However, I do not know what the dependencies are and so I do not know which -stable versions this may be a candidate for. ] Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1306331593-28715-1-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03pata_cm64x: fix boot crash on pariscJames Bottomley
commit 9281b16caac1276817b77033c5b8a1f5ca30102c upstream. The old IDE cmd64x checks the status of the CNTRL register to see if the ports are enabled before probing them. pata_cmd64x doesn't do this, which causes a HPMC on parisc when it tries to poke at the secondary port because apparently the BAR isn't wired up (and a non-responding piece of memory causes a HPMC). Fix this by porting the CNTRL register port detection logic from IDE cmd64x. In addition, following converns from Alan Cox, add a check to see if a mobility electronics bridge is the immediate parent and forgo the check if it is (prevents problems on hotplug controllers). Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03block: Fix discard topology stacking and reportingMartin K. Petersen
commit a934a00a69e940b126b9bdbf83e630ef5fe43523 upstream. In some cases we would end up stacking discard_zeroes_data incorrectly. Fix this by enabling the feature by default for stacking drivers and clearing it for low-level drivers. Incorporating a device that does not support dzd will then cause the feature to be disabled in the stacking driver. Also ensure that the maximum discard value does not overflow when exported in sysfs and return 0 in the alignment and dzd fields for devices that don't support discard. Reported-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03block: hold queue if flush is running for non-queueable flush driveshaohua.li@intel.com
commit 3ac0cc4508709d42ec9aa351086c7d38bfc0660c upstream. In some drives, flush requests are non-queueable. When flush request is running, normal read/write requests can't run. If block layer dispatches such request, driver can't handle it and requeue it. Tejun suggested we can hold the queue when flush is running. This can avoid unnecessary requeue. Also this can improve performance. For example, we have request flush1, write1, flush 2. flush1 is dispatched, then queue is hold, write1 isn't inserted to queue. After flush1 is finished, flush2 will be dispatched. Since disk cache is already clean, flush2 will be finished very soon, so looks like flush2 is folded to flush1. In my test, the queue holding completely solves a regression introduced by commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794: block: make the flush insertion use the tail of the dispatch list It's not a preempt type request, in fact we have to insert it behind requests that do specify INSERT_FRONT. which causes about 20% regression running a sysbench fileio workload. Stable: 2.6.39 only Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-03block: add a non-queueable flush flagshaohua.li@intel.com
commit f3876930952390a31c3a7fd68dd621464a36eb80 upstream. flush request isn't queueable in some drives. Add a flag to let driver notify block layer about this. We can optimize flush performance with the knowledge. Stable: 2.6.39 only Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-18drivercore: revert addition of of_match to struct deviceGrant Likely
Commit b826291c, "drivercore/dt: add a match table pointer to struct device" added an of_match pointer to struct device to cache the of_match_table entry discovered at driver match time. This was unsafe because matching is not an atomic operation with probing a driver. If two or more drivers are attempted to be matched to a driver at the same time, then the cached matching entry pointer could get overwritten. This patch reverts the of_match cache pointer and reworks all users to call of_match_device() directly instead. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-05-18of: fix race when matching driversMilton Miller
If two drivers are probing devices at the same time, both will write their match table result to the dev->of_match cache at the same time. Only write the result if the device matches. In a thread titled "SBus devices sometimes detected, sometimes not", Meelis reported his SBus hme was not detected about 50% of the time. From the debug suggested by Grant it was obvious another driver matched some devices between the call to match the hme and the hme discovery failling. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Milton Miller <miltonm@bga.com> [grant.likely: modified to only call of_match_device() once] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-05-18procfs: add stub for proc_mkdir_mode()Randy Dunlap
Provide a stub for proc_mkdir_mode() when CONFIG_PROC_FS is not enabled, just like the stub for proc_mkdir(). Fixes this linux-next build error: drivers/net/wireless/airo.c:4504: error: implicit declaration of function 'proc_mkdir_mode' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-16Revert "mmc: fix a race between card-detect rescan and clock-gate work ↵Chris Ball
instances" This reverts commit 26fc8775b51484d8c0a671198639c6d5ae60533e, which has been reported to cause boot/resume-time crashes for some users: https://bbs.archlinux.org/viewtopic.php?id=118751. Signed-off-by: Chris Ball <cjb@laptop.org> Cc: <stable@kernel.org>
2011-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix FS_IOC_SETFLAGS ioctl Btrfs: fix FS_IOC_GETFLAGS ioctl fs: remove FS_COW_FL Btrfs: fix easily get into ENOSPC in mixed case Prevent oopsing in posix_acl_valid()
2011-05-14fs: remove FS_COW_FLLi Zefan
FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-13Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4.1: Ensure that layoutget uses the correct gfp modes NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP
2011-05-13Cache user_ns in struct credSerge E. Hallyn
If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns). Get rid of _current_user_ns. This requires nsown_capable() to be defined in capability.c rather than as static inline in capability.h, so do that. Request_key needs init_user_ns defined at current_user_ns if !CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS at current_user_ns() define. Compile-tested with and without CONFIG_USERNS. Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> [ This makes a huge performance difference for acl_permission_check(), up to 30%. And that is one of the hottest kernel functions for loads that are pathname-lookup heavy. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-12Merge branch 'fbmem'Linus Torvalds
* fbmem: fbmem: make read/write/ioctl use the frame buffer at open time fbcon: add lifetime refcount to opened frame buffers
2011-05-12fbcon: add lifetime refcount to opened frame buffersLinus Torvalds
This just adds the refcount and the new registration lock logic. It does not (for example) actually change the read/write/ioctl routines to actually use the frame buffer that was opened: those function still end up alway susing whatever the current frame buffer is at the time of the call. Without this, if something holds the frame buffer open over a framebuffer switch, the close() operation after the switch will access a fb_info that has been free'd by the unregistering of the old frame buffer. (The read/write/ioctl operations will normally not cause problems, because they will - illogically - pick up the new fbcon instead. But a switch that happens just as one of those is going on might see problems too, the window is just much smaller: one individual op rather than the whole open-close sequence.) This use-after-free is apparently fairly easily triggered by the Ubuntu 11.04 boot sequence. Acked-by: Tim Gardner <tim.gardner@canonical.com> Tested-by: Daniel J Blueman <daniel.blueman@gmail.com> Tested-by: Anca Emanuel <anca.emanuel@gmail.com> Cc: Bruno Prémont <bonbons@linux-vserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-11NFSv4.1: Ensure that layoutget uses the correct gfp modesTrond Myklebust
Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-11mm: add alloc_pages_exact_nid()Andi Kleen
Add a alloc_pages_exact_nid() that allocates on a specific node. The naming is quite broken, but fixing that would need a larger renaming action. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: tweak comment] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-11mm: use alloc_bootmem_node_nopanic() on really needed pathYinghai Lu
Stefan found nobootmem does not work on his system that has only 8M of RAM. This causes an early panic: BIOS-provided physical RAM map: BIOS-88: 0000000000000000 - 000000000009f000 (usable) BIOS-88: 0000000000100000 - 0000000000840000 (usable) bootconsole [earlyser0] enabled Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS! DMI not present or invalid. last_pfn = 0x840 max_arch_pfn = 0x100000 init_memory_mapping: 0000000000000000-0000000000840000 8MB LOWMEM available. mapped low ram: 0 - 00840000 low ram: 0 - 00840000 Zone PFN ranges: DMA 0x00000001 -> 0x00001000 Normal empty Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000001 -> 0x0000009f 0: 0x00000100 -> 0x00000840 BUG: Int 6: CR2 (null) EDI c034663c ESI (null) EBP c0329f38 ESP c0329ef4 EBX c0346380 EDX 00000006 ECX ffffffff EAX fffffff4 err (null) EIP c0353191 CS c0320060 flg 00010082 Stack: (null) c030c533 000007cd (null) c030c533 00000001 (null) (null) 00000003 0000083f 00000018 00000002 00000002 c0329f6c c03534d6 (null) (null) 00000100 00000840 (null) c0329f64 00000001 00001000 (null) Pid: 0, comm: swapper Not tainted 2.6.36 #5 Call Trace: [<c02e3707>] ? 0xc02e3707 [<c035e6e5>] 0xc035e6e5 [<c0353191>] ? 0xc0353191 [<c03534d6>] 0xc03534d6 [<c034f1cd>] 0xc034f1cd [<c034a824>] 0xc034a824 [<c03513cb>] ? 0xc03513cb [<c0349432>] 0xc0349432 [<c0349066>] 0xc0349066 It turns out that we should ignore the low limit of 16M. Use alloc_bootmem_node_nopanic() in this case. [akpm@linux-foundation.org: less mess] Signed-off-by: Yinghai LU <yinghai@kernel.org> Reported-by: Stefan Hellermann <stefan@the2masters.de> Tested-by: Stefan Hellermann <stefan@the2masters.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09Don't lock guardpage if the stack is growing upMikulas Patocka
Linux kernel excludes guard page when performing mlock on a VMA with down-growing stack. However, some architectures have up-growing stack and locking the guard page should be excluded in this case too. This patch fixes lvm2 on PA-RISC (and possibly other architectures with up-growing stack). lvm2 calculates number of used pages when locking and when unlocking and reports an internal error if the numbers mismatch. [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the grows-up case, and to move things around a bit to clean it all up and share the infrstructure with the /proc bits. Tested on ia64 that has both grow-up and grow-down segments - Linus ] Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-07Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tools: Makefile: Use gcc to determine ARCH perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg() sh, hw_breakpoints: Fix racy access to ptrace breakpoints arm, hw_breakpoints: Fix racy access to ptrace breakpoints powerpc, hw_breakpoints: Fix racy access to ptrace breakpoints x86, hw_breakpoints: Fix racy access to ptrace breakpoints ptrace: Prepare to fix racy accesses on task breakpoints
2011-05-06Regression: partial revert "tracing: Remove lock_depth from event entry"Arjan van de Ven
This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266. That commit changed the structure layout of the trace structure, which in turn broke PowerTOP (1.9x generation) quite badly. I appreciate not wanting to expose the variable in question, and PowerTOP was not using it, so I've replaced the variable with just a padding field - that way if in the future a new field is needed it can just use this padding field. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06Merge branch 'master' of ↵Ingo Molnar
ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent
2011-05-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: flex_arrays: allow zero length flex arrays flex_array: flex_array_prealloc takes a number of elements, not an end SELinux: pass last path component in may_create
2011-05-04slub: Fix the lockless code on 32-bit platforms with no 64-bit cmpxchgThomas Gleixner
The SLUB allocator use of the cmpxchg_double logic was wrong: it actually needs the irq-safe one. That happens automatically when we use the native unlocked 'cmpxchg8b' instruction, but when compiling the kernel for older x86 CPUs that do not support that instruction, we fall back to the generic emulation code. And if you don't specify that you want the irq-safe version, the generic code ends up just open-coding the cmpxchg8b equivalent without any protection against interrupts or preemption. Which definitely doesn't work for SLUB. This was reported by Werner Landgraf <w.landgraf@ru.ru>, who saw instability with his distro-kernel that was compiled to support pretty much everything under the sun. Most big Linux distributions tend to compile for PPro and later, and would never have noticed this problem. This also fixes the prototypes for the irqsafe cmpxchg_double functions to use 'bool' like they should. [ Btw, that whole "generic code defaults to no protection" design just sounds stupid - if the code needs no protection, there is no reason to use "cmpxchg_double" to begin with. So we should probably just remove the unprotected version entirely as pointless. - Linus ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: werner <w.landgraf@ru.ru> Acked-and-tested-by: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-04Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2011-05-04Merge branch 'for-linus' of git://git.infradead.org/users/eparis/selinux ↵James Morris
into for-linus
2011-05-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: sdhci: Check mrq != NULL in sdhci_tasklet_finish mmc: sdhci: Check mrq->cmd in sdhci_tasklet_finish mmc: tmio: fix .set_ios(MMC_POWER_UP) handling mmc: fix a race between card-detect rescan and clock-gate work instances mmc: omap: Fix possible NULL pointer deref mmc: core: mmc_add_card(): fix missing break in switch statement mmc: sdhci-pci: Fix error case in sdhci_pci_probe_slot()
2011-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wm831x-ts - move BTN_TOUCH reporting to data transfer Input: wm831x-ts - allow IRQ flags to be specified Input: wm831x-ts - fix races with IRQ management
2011-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) sysctl: net: call unregister_net_sysctl_table where needed Revert: veth: remove unneeded ifname code from veth_newlink() smsc95xx: fix reset check tg3: Fix failure to enable WoL by default when possible networking: inappropriate ioctl operation should return ENOTTY amd8111e: trivial typo spelling: Negotitate -> Negotiate ipv4: don't spam dmesg with "Using LC-trie" messages af_unix: Only allow recv on connected seqpacket sockets. mii: add support of pause frames in mii_get_an net: ftmac100: fix scheduling while atomic during PHY link status change usbnet: Transfer of maintainership usbnet: add support for some Huawei modems with cdc-ether ports bnx2: cancel timer on device removal iwl4965: fix "Received BA when not expected" iwlagn: fix "Received BA when not expected" dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085 usbnet: Resubmit interrupt URB if device is open iwl4965: fix "TX Power requested while scanning" iwlegacy: led stay solid on when no traffic b43: trivial: update module info about ucode16_mimo firmware ...
2011-05-01i2c-i801: Move device ID definitions to driverJean Delvare
Move the SMBus device ID definitions of recent devices from pci_ids.h to the i2c-i801.c driver file. They don't have to be shared, as they are clearly identified and only used in this driver. In the future, such IDs will go to i2c-i801 directly. This will make adding support for new devices much faster and easier, as it will avoid cross- subsystem patch sets and merge conflicts. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Seth Heasley <seth.heasley@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-04-28Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: don't lose MS_SYNCHRONOUS on remount of noac mount NFS: Return meaningful status from decode_secinfo() NFSv4: Ensure we request the ordinary fileid when doing readdirplus NFSv4: Ensure that clientid and session establishment can time out SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO NFSv4.1: Don't loop forever in nfs4_proc_create_session NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception() NFSv4.1: Don't update sequence number if rpc_task is not sent NFSv4.1: Ensure state manager thread dies on last umount SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies NFS: Use correct variable for page bounds checking NFS: don't negotiate when user specifies sec flavor NFS: Attempt mount with default sec flavor first NFS: flav_array honors NFS_MAX_SECFLAVORS NFS: Fix infinite loop in gss_create_upcall() Don't mark_inode_dirty_sync() while holding lock NFS: Get rid of pointless test in nfs_commit_done NFS: Remove unused argument from nfs_find_best_sec() NFS: Eliminate duplicate call to nfs_mark_request_dirty NFS: Remove dead code from nfs_fs_mount()
2011-04-28flex_array: flex_array_prealloc takes a number of elements, not an endEric Paris
Change flex_array_prealloc to take the number of elements for which space should be allocated instead of the last (inclusive) element. Users and documentation are updated accordingly. flex_arrays got introduced before they had users. When folks started using it, they ended up needing a different API than was coded up originally. This swaps over to the API that folks apparently need. Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Chris Richards <gizmo@giz-works.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: stable@kernel.org [2.6.38+]
2011-04-28usbnet: Resubmit interrupt URB if device is openPaul Stewart
Resubmit interrupt URB if device is open. Use a flag set in usbnet_open() to determine this state. Also kill and free interrupt URB in usbnet_disconnect(). [Rebased off git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git] Signed-off-by: Paul Stewart <pstew@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanupsAndrea Arcangeli
The huge_memory.c THP page fault was allowed to run if vm_ops was null (which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't setup a special vma->vm_ops and it would fallback to regular anonymous memory) but other THP logics weren't fully activated for vmas with vm_file not NULL (/dev/zero has a not NULL vma->vm_file). So this removes the vm_file checks so that /dev/zero also can safely use THP (the other albeit safer approach to fix this bug would have been to prevent the THP initial page fault to run if vm_file was set). After removing the vm_file checks, this also makes huge_memory.c stricter in khugepaged for the DEBUG_VM=y case. It doesn't replace the vm_file check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should only be allowed to exist before the first page fault, and in turn when vma->anon_vma is null (so preventing khugepaged registration). So I tend to think the previous comment saying if vm_file was set, VM_PFNMAP might have been set and we could still be registered in khugepaged (despite anon_vma was not NULL to be registered in khugepaged) was too paranoid. The is_linear_pfn_mapping check is also I think superfluous (as described by comment) but under DEBUG_VM it is safe to stay. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682 Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Caspar Zhang <bugs@casparzhang.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-27Input: wm831x-ts - allow IRQ flags to be specifiedMark Brown
This allows maximum flexibility for configuring the direct GPIO based interrupts. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2011-04-27mmc: fix a race between card-detect rescan and clock-gate work instancesGuennadi Liakhovetski
Currently there is a race in the MMC core between a card-detect rescan work and the clock-gating work, scheduled from a command completion. Fix it by removing the dedicated clock-gating mutex and using the MMC standard locking mechanism instead. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Simon Horman <horms@verge.net.au> Cc: Magnus Damm <damm@opensource.se> Acked-by: Linus Walleij <linus.walleij@linaro.org> Cc: <stable@kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-04-27Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (42 commits) [media] media: vb2: correct queue initialization order [media] media: vb2: fix incorrect v4l2_buffer->flags handling [media] s5p-fimc: Add support for the buffer timestamps and sequence [media] s5p-fimc: Fix bytesperline and plane payload setup [media] s5p-fimc: Do not allow changing format after REQBUFS [media] s5p-fimc: Fix FIMC3 pixel limits on Exynos4 [media] tda18271: update tda18271c2_rf_cal as per NXP's rev.04 datasheet [media] tda18271: update tda18271_rf_band as per NXP's rev.04 datasheet [media] tda18271: fix bad calculation of main post divider byte [media] tda18271: prog_cal and prog_tab variables should be s32, not u8 [media] tda18271: fix calculation bug in tda18271_rf_tracking_filters_init [media] omap3isp: queue: Don't corrupt buf->npages when get_user_pages() fails [media] v4l: Don't register media entities for subdev device nodes [media] omap3isp: Don't increment node entity use count when poweron fails [media] omap3isp: lane shifter support [media] omap3isp: ccdc: support Y10/12, 8-bit bayer fmts [media] media: add missing 8-bit bayer formats and Y12 [media] v4l: add V4L2_PIX_FMT_Y12 format cx23885: Fix stv0367 Kconfig dependency [media] omap3isp: Use isp xclk defines ... Fix up trivial conflict (spelink errurs) in drivers/media/video/omap3isp/isp.c
2011-04-27NFSv4: Ensure we request the ordinary fileid when doing readdirplusTrond Myklebust
When readdir() returns a directory entry for the root of a mounted filesystem, Linux follows the old convention of returning the inode number of the covered directory (despite newer versions of POSIX declaring that this is a bug). To ensure this continues to work, the NFSv4 readdir implementation requests the 'mounted-on-fileid' from the server. However, readdirplus also needs to instantiate an inode for this entry, and for that, we also need to request the real fileid as per this patch. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-25add hlist_bl_lock/unlock helpersChristoph Hellwig
Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-25bit_spinlock: don't play preemption games inside the busy loopLinus Torvalds
When we are waiting for the bit-lock to be released, and are looping over the 'cpu_relax()' should not be doing anything else - otherwise we miss the point of trying to do the whole 'cpu_relax()'. Do the preemption enable/disable around the loop, rather than inside of it. Noticed when I was looking at the code generation for the dcache __d_drop usage, and the code just looked very odd. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-25ptrace: Prepare to fix racy accesses on task breakpointsFrederic Weisbecker
When a task is traced and is in a stopped state, the tracer may execute a ptrace request to examine the tracee state and get its task struct. Right after, the tracee can be killed and thus its breakpoints released. This can happen concurrently when the tracer is in the middle of reading or modifying these breakpoints, leading to dereferencing a freed pointer. Hence, to prepare the fix, create a generic breakpoint reference holding API. When a reference on the breakpoints of a task is held, the breakpoints won't be released until the last reference is dropped. After that, no more ptrace request on the task's breakpoints can be serviced for the tracer. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: v2.6.33.. <stable@kernel.org> Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com
2011-04-24SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIOTrond Myklebust
On occasion, it is useful for the NFS layer to distinguish between soft timeouts and other EIO errors due to (say) encoding errors, or authentication errors. The following patch ensures that the default behaviour of the RPC layer remains to return EIO on soft timeouts (until we have audited all the callers). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24NFSv4.1: Don't loop forever in nfs4_proc_create_sessionTrond Myklebust
If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end up looping forever inside nfs4_proc_create_session, and so the usual mechanisms for detecting if the nfs_client is dead don't work. Fix this by ensuring that we loop inside the nfs4_state_manager thread instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24Merge branch 'dcache-cleanup'Linus Torvalds
* dcache-cleanup: vfs: get rid of insane dentry hashing rules
2011-04-24libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65Tejun Heo
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM is used. Implement ATA_FLAG_NO_DIPM and apply it. This problem was reported by Stefan Bader in the following thread. http://thread.gmane.org/gmane.linux.ide/48841 stable: applicable to 2.6.37 and 38. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>