summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-05perf/x86: Improve debug output in check_hw_exists()Robert Richter
It might be of interest which perfctr msr failed. Signed-off-by: Robert Richter <robert.richter@amd.com> [ added hunk to avoid GCC warn ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-5-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86/amd: Unify AMD's generic and family 15h pmusRobert Richter
There is no need for keeping separate pmu structs. We can enable amd_{get,put}_event_constraints() functions also for family 15h event. The advantage is that there is only a single pmu struct for all AMD cpus. This patch introduces functions to setup the pmu to enabe core performance counters or counter constraints. Also, cpuid checks are used instead of family checks where possible. Thus, it enables the code independently of cpu families if the feature flag is set. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Move Intel specific code to intel_pmu_init()Robert Richter
There is some Intel specific code in the generic x86 path. Move it to intel_pmu_init(). Since p4 and p6 pmus don't have fixed counters we may skip the check in case such a pmu is detected. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Rename Intel specific macrosRobert Richter
There are macros that are Intel specific and not x86 generic. Rename them into INTEL_*. This patch removes X86_PMC_IDX_GENERIC and does: $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \ arch/x86/include/asm/kvm_host.h \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_intel_ds.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'x86/microcode' into perf/coreIngo Molnar
Merge this branch because we want to rely on the newer (and saner) microcode loading and checking facilities. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'x86/cpu' into perf/coreIngo Molnar
Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Fix USER/KERNEL tagging of samplesPeter Zijlstra
Several perf interrupt handlers (PEBS,IBS,BTS) re-write regs->ip but do not update the segment registers. So use an regs->ip based test instead of an regs->cs/regs->flags based test. Reported-and-tested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-xxrt0a1zronm1sm36obwc2vy@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-03Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull fix to common clk framework from Michael Turquette: "The previous set of common clk fixes for -rc5 left an uninitialized int which could lead to bad array indexing when switching clock parents. The issue is fixed with a trivial change to the code flow in __clk_set_parent." * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux: clk: fix parent validation in __clk_set_parent()
2012-07-03Merge tag 'md-3.5-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull raid10 build failure fix from NeilBrown: "I really shouldn't do important things late in the day. It seems that I get careless." * tag 'md-3.5-fixes' of git://neil.brown.name/md: md/raid10: fix careless build error
2012-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking update from David Miller: 1) Fix RX sequence number handling in mwifiex, from Stone Piao. 2) Netfilter ipset mis-compares device names, fix from Florian Westphal. 3) Fix route leak in ipv6 IPVS, from Eric Dumazet. 4) NFS fixes. Several buffer overflows in NCI layer from Dan Rosenberg, and release sock OOPS'er fix from Eric Dumazet. 5) Fix WEP handling ath9k, we started using a bit the chip provides to indicate undecrypted packets but that bit turns out to be unreliable in certain configurations. Fix from Felix Fietkau. 6) Fix Kconfig dependency bug in wlcore, from Randy Dunlap. 7) New USB IDs for rtlwifi driver from Larry Finger. 8) Fix crashes in qmi_wwan usbnet driver when disconnecting, from Bjørn Mork. 9) Gianfar driver programs coalescing settings properly in single queue mode, but does not do so in multi-queue mode. Fix from Claudiu Manoil. 10) Missing module.h include in davinci_cpdma.c, from Daniel Mack. 11) Need dummy handler for IPSET_CMD_NONE otherwise we crash in ipset if we get this via nfnetlink, fix from Tomasz Bursztyka. 12) Missing RCU unlock in nfnetlink error path, also from Tomasz. 13) Fix divide by zero in igbvf when the user tries to set an RX coalescing value of 0 usecs, from Mitch A Williams. 14) We can process SCTP sacks for the wrong transport, oops. Fix from Neil Horman. 15) Remove hw IP payload checksumming from e1000e driver. This has zery value in our stack, and turning it on creates a very unintuitive restriction for users when using jumbo MTUs. Specifically, when IP payload checksums are on you cannot use both receive hashing offload and jumbo MTU. Fix from Bruce Allan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) e1000e: remove use of IP payload checksum sctp: be more restrictive in transport selection on bundled sacks igbvf: fix divide by zero netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent davinci_cpdma: include linux/module.h gianfar: Fix RXICr/TXICr programming for multi-queue mode net: Downgrade CAP_SYS_MODULE deprecated message from error to warning. net: qmi_wwan: fix Oops while disconnecting mwifiex: fix memory leak associated with IE manamgement ath9k: fix panic caused by returning a descriptor we have queued for reuse mac80211: correct behaviour on unrecognised action frames ath9k: enable serialize_regmode for non-PCIE AR9287 rtlwifi: rtl8192cu: New USB IDs NFC: Return from rawsock_release when sk is NULL iwlwifi: fix activating inactive stations wlcore: drop INET dependency ath9k: fix dynamic WEP related regression NFC: Prevent multiple buffer overflows in NCI netfilter: update location of my trees ...
2012-07-04md/raid10: fix careless build errorNeilBrown
build error introduced by commit b357f04a67c2aeee8 That function doesn't get extra args until a later patch. Bother. Reported-by: Fengguang Wu <wfg@linux.intel.com> Reported-by: Simon Kirby <sim@hostway.ca> Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03floppy: cancel any pending fd_timeouts before adding a new oneLinus Torvalds
In commit 070ad7e793dc ("floppy: convert to delayed work and single-thread wq") the 'fd_timeout' timer was converted to a delayed work. However, the "del_timer(&fd_timeout)" was lost in the process, and any previous pending timeouts would stay active when we then re-queued the timeout. This resulted in the floppy probe sequence having a (stale) 20s timeout rather than the intended 3s timeout, and thus made booting with the floppy driver (but no actual floppy controller) take much longer than it should. Of course, there's little reason for most people to compile the floppy driver into the kernel at all, which is why most people never noticed. Canceling the delayed work where we used to do the del_timer() fixes the issue, and makes the floppy probing use the proper new timeout instead. The three second timeout is still very wasteful, but better than the 20s one. Reported-and-tested-by: Andi Kleen <ak@linux.intel.com> Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-03Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block bits from Jens Axboe: "As vacation is coming up, thought I'd better get rid of my pending changes in my for-linus branch for this iteration. It contains: - Two patches for mtip32xx. Killing a non-compliant sysfs interface and moving it to debugfs, where it belongs. - A few patches from Asias. Two legit bug fixes, and one killing an interface that is no longer in use. - A patch from Jan, making the annoying partition ioctl warning a bit less annoying, by restricting it to !CAP_SYS_RAWIO only. - Three bug fixes for drbd from Lars Ellenberg. - A fix for an old regression for umem, it hasn't really worked since the plugging scheme was changed in 3.0. - A few fixes from Tejun. - A splice fix from Eric Dumazet, fixing an issue with pipe resizing." * 'for-linus' of git://git.kernel.dk/linux-block: scsi: Silence unnecessary warnings about ioctl to partition block: Drop dead function blk_abort_queue() block: Mitigate lock unbalance caused by lock switching block: Avoid missed wakeup in request waitqueue umem: fix up unplugging splice: fix racy pipe->buffers uses drbd: fix null pointer dereference with on-congestion policy when diskless drbd: fix list corruption by failing but already aborted reads drbd: fix access of unallocated pages and kernel panic xen/blkfront: Add WARN to deal with misbehaving backends. blkcg: drop local variable @q from blkg_destroy() mtip32xx: Create debugfs entries for troubleshooting mtip32xx: Remove 'registers' and 'flags' from sysfs blkcg: fix blkg_alloc() failure path block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED block: fix return value on cfq_init() failure mtip32xx: Remove version.h header file inclusion xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-03clk: fix parent validation in __clk_set_parent()Rajendra Nayak
The below commit introduced a bug in __clk_set_parent() which could cause it to *skip* the parent validation which makes sure the parent passed to the api is a valid one. commit 7975059db572eb47f0fb272a62afeae272a4b209 Author: Rajendra Nayak <rnayak@ti.com> Date: Wed Jun 6 14:41:31 2012 +0530 clk: Allow late cache allocation for clk->parents This was identified by the following compiler warning.. drivers/clk/clk.c: In function '__clk_set_parent': drivers/clk/clk.c:1083:5: warning: 'i' may be used uninitialized in this function [-Wuninitialized] .. as reported by Marc Kleine-Budde. There were various options discussed on how to fix this, one being initing 'i' to clk->num_parents, but the below approach was found to be more appropriate as it also makes the 'parent validation' code simpler to read. Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Rajendra Nayak <rnayak@ti.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> Cc: stable@kernel.org
2012-07-03Merge tag 'sound-3.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just a few driver-specific fixes for ASoC and HD-audio." * tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix no sound from ALC662 after Windows reboot ASoC: tlv320aic3x: Fix codec pll configure bug ASoC: wm2200: Add missing BCLK rate
2012-07-03Merge tag 'dm-3.5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper fixes from Alasdair G Kergon: "Four minor thin provisioning fixes and correct and update dm-verity documentation." * tag 'dm-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm: verity fix documentation dm persistent data: fix allocation failure in space map checker init dm persistent data: handle space map checker creation failure dm persistent data: fix shadow_info_leak on dm_tm_destroy dm thin: commit metadata before creating metadata snapshot
2012-07-03Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "One regression fix, two radeon fixes (one for an oops), and an i915 fix to unload framebuffers earlier. We originally were going to leave the i915 fix until -next, but grub2 in some situations causes vesafb/efifb to be loaded now, and this causes big slowdowns, and I have reports in rawhide I'd like to have fixed." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/i915: kick any firmware framebuffers before claiming the gtt drm: edid: Don't add inferred modes with higher resolution drm/radeon: fix rare segfault drm/radeon: fix VM page table setup on SI
2012-07-03Merge tag 'md-3.5-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from NeilBrown: "md: collection of bug fixes for 3.5 You go away for 2 weeks vacation and what do you get when you come back? Piles of bugs :-) Some found by inspection, some by testing, some during use in the field, and some while developing for the next window..." * tag 'md-3.5-fixes' of git://neil.brown.name/md: md: fix up plugging (again). md: support re-add of recovering devices. md/raid1: fix bug in read_balance introduced by hot-replace raid5: delayed stripe fix md/raid456: When read error cannot be recovered, record bad block md: make 'name' arg to md_register_thread non-optional. md/raid10: fix failure when trying to repair a read error. md/raid5: fix refcount problem when blocked_rdev is set. md:Add blk_plug in sync_thread. md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev md/raid5: Do not add data_offset before call to is_badblock md/raid5: prefer replacing failed devices over want-replacement devices. md/raid10: Don't try to recovery unmatched (and unused) chunks.
2012-07-03Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer fixes from James Morris. A documentation update, and a nommu build fix. * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: Fix nommu build. security: document no_new_privs
2012-07-03dm: verity fix documentationMilan Broz
Veritysetup is now part of cryptsetup package. Remove on-disk header description (which is not parsed in kernel) and point users to cryptsetup where it the format is documented. Mention units for block size paramaters. Fix target line specification and dmsetup parameters. Signed-off-by: Milan Broz <mbroz@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm persistent data: fix allocation failure in space map checker initMike Snitzer
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and memory is fragmented and a sufficiently-large metadata device is used in a thin pool then the space map checker will fail to allocate the memory it requires. Switch from kmalloc to vmalloc to allow larger virtually contiguous allocations for the space map checker's internal count arrays. Reported-by: Vivek Goyal <vgoyal@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm persistent data: handle space map checker creation failureMike Snitzer
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create() fails, dm_tm_create_internal() would still return success even though it cleaned up all resources it was supposed to have created. This will lead to a kernel crash: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC ... RIP: 0010:[<ffffffff81593659>] [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20 Call Trace: [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10 [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0 [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0 [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160 [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0 [<ffffffff815aa010>] pool_ctr+0x3f0/0x900 [<ffffffff8158d565>] dm_table_add_target+0x195/0x450 [<ffffffff815904c4>] table_load+0xe4/0x330 [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0 [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20 [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560 [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0 [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b Fix the space map checker code to return an appropriate ERR_PTR and have dm_sm_disk_create() and dm_tm_create_internal() check for it with IS_ERR. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm persistent data: fix shadow_info_leak on dm_tm_destroyMike Snitzer
Cleanup the shadow table before destroying the transaction manager. Reference: leak was identified with kmemleak when running test_discard_random_sectors in the thinp-test-suite. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm thin: commit metadata before creating metadata snapshotJoe Thornber
Userland sometimes sees a corrupt metadata block if metadata is changing rapidly when a metadata snapshot is reserved for userland, To make the problem go away, commit before we take the metadata snapshot (which is a sensible thing to do anyway). The checksums mean userland spots this corruption immediately so there's no risk of acting on incorrect data. No corruption exists from the kernel's point of view, and thin_check passes after pool shutdown. I believe this is to do with shared blocks at the first level of the {device, mapping} btree. Prior to the metadata-snap support no sharing at this level was possible, so this patch is only required after commit cc8394d86f045b86ff303d3c9e4ce47d97148951 ("dm thin: provide userspace access to pool metadata"). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03security: Fix nommu build.Paul Mundt
The security + nommu configuration presently blows up with an undefined reference to BDI_CAP_EXEC_MAP: security/security.c: In function 'mmap_prot': security/security.c:687:36: error: dereferencing pointer to incomplete type security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function) security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in include backing-dev.h directly to fix it up. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-03drm/i915: kick any firmware framebuffers before claiming the gttDaniel Vetter
Especially vesafb likes to map everything as uc- (yikes), and if that mapping hangs around still while we try to map the gtt as wc the kernel will downgrade our request to uc-, resulting in abyssal performance. Unfortunately we can't do this as early as readon does (i.e. as the first thing we do when initializing the hw) because our fb/mmio space region moves around on a per-gen basis. So I've had to move it below the gtt initialization, but that seems to work, too. The important thing is that we do this before we set up the gtt wc mapping. Now an altogether different question is why people compile their kernels with vesafb enabled, but I guess making things just work isn't bad per se ... v2: - s/radeondrmfb/inteldrmfb/ - fix up error handling v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky. v4: Jani Nikula complained about the pointless bool primary initialization. v5: Don't oops if we can't allocate, noticed by Chris Wilson. v6: Resolve conflicts with agp rework and fixup whitespace. This is commit e188719a2891f01b3100d in drm-next. Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub using vesa on fedora their initrd seems to load vesafb before loading the real kms driver. So tons more people actually experience a dead-slow gpu. Hence also the Cc: stable. Cc: stable@vger.kernel.org Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03drm: edid: Don't add inferred modes with higher resolutionTakashi Iwai
When a monitor EDID doesn't give the preferred bit, driver assumes that the mode with the higest resolution and rate is the preferred mode. Meanwhile the recent changes for allowing more modes in the GFT/CVT ranges give actually more modes, and some modes may be over the native size. Thus such a mode would be picked up as the preferred mode although it's no native resolution. For avoiding such a problem, this patch limits the addition of inferred modes by checking not to be greater than other modes. Also, it checks the duplicated mode entry at the same time. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03drm/radeon: fix rare segfaultJerome Glisse
In gem idle/busy ioctl the radeon object was derefenced after drm_gem_object_unreference_unlocked which in case the object have been destroyed lead to use of a possibly free pointer with possibly wrong data. Signed-off-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03md: fix up plugging (again).NeilBrown
The value returned by "mddev_check_plug" is only valid until the next 'schedule' as that will unplug things. This could happen at any call to mempool_alloc. So just calling mddev_check_plug at the start doesn't really make sense. So call it just before, or just after, queuing things for the thread. As the action that happens at unplug is to wake the thread, this makes lots of sense. If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we wake thread immediately. RAID5 is a bit different. Requests are queued for the thread and the thread is woken by release_stripe. So we don't need to wake the thread on failure. However the thread doesn't perform certain actions when there is any active plug, so it is important to install a plug before waking the thread. So for RAID5 we install the plug *before* queuing the request and waking the thread. Without this patch it is possible for raid1 or raid10 to queue a request without then waking the thread, resulting in the array locking up. Also change raid10 to only flush_pending_write when there are not active plugs, just like raid1. This patch is suitable for 3.0 or later. I plan to submit it to -stable, but I'll like to let it spend a few weeks in mainline first to be sure it is completely safe. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: support re-add of recovering devices.NeilBrown
We currently only allow a device to be re-added if it appear to be in-sync. This is overly restrictive as it may be desirable to re-add a device that is in the middle of recovery. So remove the test for "InSync" - the test on rdev->raid_disk is sufficient to ensure that the re-add will succeed. Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid1: fix bug in read_balance introduced by hot-replaceNeilBrown
When we added hot_replace we doubled the number of devices that could be in a RAID1 array. So we doubled how far read_balance would search. Unfortunately we didn't double the point at which it looped back to the beginning - so it effectively loops over all non-replacement disks twice. This doesn't cause bad behaviour, but it pointless and means we never read from replacement devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03raid5: delayed stripe fixShaohua Li
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but the two bits have relationship. A delayed stripe can be moved to hold list only when preread active stripe count is below IO_THRESHOLD. If a stripe has both the bits set, such stripe will be in delayed list and preread count not 0, which will make such stripe never leave delayed list. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid456: When read error cannot be recovered, record bad blockmajianpeng
We may not be able to fix a bad block if: - the array is degraded - the over-write fails. In these cases we currently eject the device, but we should record a bad block if possible. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: make 'name' arg to md_register_thread non-optional.NeilBrown
Having the 'name' arg optional and defaulting to the current personality name is no necessary and leads to errors, as when changing the level of an array we can end up using the name of the old level instead of the new one. So make it non-optional and always explicitly pass the name of the level that the array will be. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid10: fix failure when trying to repair a read error.NeilBrown
commit 58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf md/raid10: handle further errors during fix_read_error better. in 3.1 added "r10_sync_page_io" which takes an IO size in sectors. But we were passing the IO size in bytes!!! This resulting in bio_add_page failing, and empty request being sent down, and a consequent BUG_ON in scsi_lib. [fix missing space in error message at same time] This fix is suitable for 3.1.y and later. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-02Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull a couple more powerpc fixes from Benjamin Herrenschmidt: "Here are two more fixes that I "missed" when scrubbing patchwork last week which are worth still having in 3.5." * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/kvm: sldi should be sld powerpc/xmon: Use cpumask iterator to avoid warning
2012-07-03security: document no_new_privsAndy Lutomirski
Document no_new_privs. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-03md/raid5: fix refcount problem when blocked_rdev is set.NeilBrown
commit 43220aa0f22cd3ce5b30246d50ccd696d119edea md/raid5: fix a hang on device failure. fixed a hang, but introduced a refcounting in-balance so that if the presence of bad-blocks ever caused an rdev to be 'blocked' we would increment the refcount on the rdev and never decrement it. So added the needed rdev_dec_pending when md_wait_for_blocked_rdev is not called. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md:Add blk_plug in sync_thread.majianpeng
Add blk_plug in sync_thread will increase the performance of sync. Because sync_thread did not blk_plug,so when raid sync, the bio merge not well. Testing environment: SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller. OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012 x86_64 x86_64 x86_64 GNU/Linux. RAID5: four ST31000524NS disk. Without blk_plug:recovery speed about 63M/Sec; Add blk_plug:recovery speed about 120M/Sec. Using blktrace: blktrace -d /dev/sdb -w 60 -o -|blkparse -i - without blk_plug: Total (8,16): Reads Queued: 309811, 1239MiB Writes Queued: 0, 0KiB Read Dispatches: 283583, 1189MiB Write Dispatches: 0, 0KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 273351, 1149MiB Writes Completed: 0, 0KiB Read Merges: 23533, 94132KiB Write Merges: 0, 0KiB IO unplugs: 0 Timer unplugs: 0 add blk_plug: Total (8,16): Reads Queued: 428697, 1714MiB Writes Queued: 0, 0KiB Read Dispatches: 3954, 1714MiB Write Dispatches: 0, 0KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 3956, 1715MiB Writes Completed: 0, 0KiB Read Merges: 424743, 1698MiB Write Merges: 0, 0KiB IO unplugs: 0 Timer unplugs: 3384 The ratio of merge will be markedly increased. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdevmajianpeng
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement nr_pending so we lose the reference we hold on the rdev. So atomic_inc it first to maintain the reference. This bug was introduced by commit 73e92e51b7969ef5477d md/raid5. Don't write to known bad block on doubtful devices. which appeared in 3.0, so patch is suitable for stable kernels since then. Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: Do not add data_offset before call to is_badblockmajianpeng
In chunk_aligned_read() we are adding data_offset before calling is_badblock. But is_badblock also adds data_offset, so that is bad. So move the addition of data_offset to after the call to is_badblock. This bug was introduced by commit 31c176ecdf3563140e639 md/raid5: avoid reading from known bad blocks. which first appeared in 3.0. So that patch is suitable for any -stable kernel from 3.0.y onwards. However it will need minor revision for most of those (as the comment didn't appear until recently). Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: prefer replacing failed devices over want-replacement devices.NeilBrown
If a RAID5 has both a failed device and a device marked as 'WantReplacement', then we should preferentially replace the failed device. However the current code replaces whichever is found first. So split into 2 loops, check fail failed/missing first, and only check for WantReplacement if nothing is failed or missing. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid10: Don't try to recovery unmatched (and unused) chunks.NeilBrown
If a RAID10 has an odd number of chunks - as might happen when there are an odd number of devices - the last chunk has no pair and so is not mirrored. We don't store data there, but when recovering the last device in an array we retry to recover that last chunk from a non-existent location. This results in an error, and the recovery aborts. When we get to that last chunk we should just stop - there is nothing more to do anyway. This bug has been present since the introduction of RAID10, so the patch is appropriate for any -stable kernel. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Tested-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-02powerpc/kvm: sldi should be sldMichael Neuling
Since we are taking a registers, this should never have been an sldi. Talking to paulus offline, this is the correct fix. Was introduced by: commit 19ccb76a1938ab364a412253daec64613acbf3df Author: Paul Mackerras <paulus@samba.org> Date: Sat Jul 23 17:42:46 2011 +1000 Talking to paulus, this shouldn't be a literal. Signed-off-by: Michael Neuling <mikey@neuling.org> CC: <stable@kernel.org> [v3.2+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-02powerpc/xmon: Use cpumask iterator to avoid warningAnton Blanchard
We have a bug report where the kernel hits a warning in the cpumask code: WARNING: at include/linux/cpumask.h:107 Which is: WARN_ON_ONCE(cpu >= nr_cpumask_bits); The backtrace is: cpu_cmd cmds xmon_core xmon die xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still open coding this but iterating above nr_cpu_ids is definitely a bug. This patch iterates through all possible cpus, in case we issue a system reset and CPUs in an offline state call in. Perhaps the old code was trying to handle CPUs that were in the partition but were never started (eg kexec into a kernel with an nr_cpus= boot option). They are going to die way before we get into xmon since we haven't set any kernel state up for them. Signed-off-by: Anton Blanchard <anton@samba.org> CC: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-01Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull two ARM fixes from Russell King: "It's been fairly quiet with the fixes. Just two this time. One fixes a long standing problem with KALLSYMS needing an additional pass, and the other sorts a problem with the vmalloc space interacting with static IO mappings." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7438/1: fill possible PMD empty section gaps ARM: 7428/1: Prevent KALLSYM size mismatch on ARM.
2012-07-01x86, microcode: Make reload interface per systemBorislav Petkov
The reload interface should be per-system so that a full system ucode reload happens (on each core) when doing echo 1 > /sys/devices/system/cpu/microcode/reload Move it to the cpu subsys directory instead of it being per-cpu. Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1340280437-7718-3-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-01x86, microcode: Sanitize per-cpu microcode reloading interfaceBorislav Petkov
Microcode reloading in a per-core manner is a very bad idea for both major x86 vendors. And the thing is, we have such interface with which we can end up with different microcode versions applied on different cores of an otherwise homogeneous wrt (family,model,stepping) system. So turn off the possibility of doing that per core and allow it only system-wide. This is a minimal fix which we'd like to see in stable too thus the more-or-less arbitrary decision to allow system-wide reloading only on the BSP: $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload ... and disable the interface on the other cores: $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload -bash: echo: write error: Invalid argument Also, allowing the reload only from one CPU (the BSP in that case) doesn't allow the reload procedure to degenerate into an O(n^2) deal when triggering reloads from all /sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes simultaneously. A more generic fix will follow. Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2012-07-01ARM: 7438/1: fill possible PMD empty section gapsNicolas Pitre
On ARM with the 2-level page table format, a PMD entry is represented by two consecutive section entries covering 2MB of virtual space. However, static mappings always were allowed to use separate 1MB section entries. This means in practice that a static mapping may create half populated PMDs via create_mapping(). Since commit 0536bdf33f (ARM: move iotable mappings within the vmalloc region) those static mappings are located in the vmalloc area. We must ensure no such half populated PMDs are accessible once vmalloc() or ioremap() start looking at the vmalloc area for nearby free virtual address ranges, or various things leading to a kernel crash will happen. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: "R, Sricharan" <r.sricharan@ti.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>