Age | Commit message (Collapse) | Author |
|
Bug 200004122
Conflicts:
drivers/cpufreq/cpufreq.c
drivers/regulator/core.c
sound/soc/codecs/max98090.c
Change-Id: I9418a05ad5c56b2e902249218bac2fa594d99f56
Signed-off-by: Ishan Mittal <imittal@nvidia.com>
|
|
[ Upstream commit 8b7b932434f5eee495b91a2804f5b64ebb2bc835 ]
nla_strcmp compares the string length plus one, so it's implicitly
including the nul-termination in the comparison.
int nla_strcmp(const struct nlattr *nla, const char *str)
{
int len = strlen(str) + 1;
...
d = memcmp(nla_data(nla), str, len);
However, if NLA_STRING is used, userspace can send us a string without
the nul-termination. This is a problem since the string
comparison will not match as the last byte may be not the
nul-termination.
Fix this by skipping the comparison of the nul-termination if the
attribute data is nul-terminated. Suggested by Thomas Graf.
Cc: Florian Westphal <fw@strlen.de>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Bug 1456092
Change-Id: I3021247ec68a3c2dddd9e98cde13d70a45191d53
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
|
|
commit 6583327c4dd55acbbf2a6f25e775b28b3abf9a42 upstream.
Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.
The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions. It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.
This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1431574a1c4c669a0c198e4763627837416e4443 upstream.
When decompressing into memory, the output buffer length is set to some
arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
unlimited in size.
The problem with this is that some platforms have their physical memory at
high physical addresses (0x80000000 or more), and that the output buffer
address and its "unlimited" length cannot be added without overflowing.
An example of this can be found in inflate_fast():
/* next_out is the output buffer address */
out = strm->next_out - OFF;
/* avail_out is the output buffer size. end will overflow if the output
* address is >= 0x80000104 */
end = out + (strm->avail_out - 257);
This has huge consequences on the performance of kernel decompression,
since the following exit condition of inflate_fast() will be always true:
} while (in < last && out < end);
Indeed, "end" has overflowed and is now always lower than "out". As a
result, inflate_fast() will return after processing one single byte of
input data, and will thus need to be called an unreasonably high number of
times. This probably went unnoticed because kernel decompression is fast
enough even with this issue.
Nonetheless, adjusting the output buffer length in such a way that the
above pointer arithmetic never overflows results in a kernel decompression
that is about 3 times faster on affected machines.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is the 3.10.24 stable release
Change-Id: Ibd2734f93d44385ab86867272a1359158635133b
|
|
commit 674470d97958a0ec72f72caf7f6451da40159cc7 upstream.
In struct gen_pool_chunk, end_addr means the end address of memory chunk
(inclusive), but in the implementation it is treated as address + size of
memory chunk (exclusive), so it points to the address plus one instead of
correct ending address.
The ending address of memory chunk plus one will cause overflow on the
memory chunk including the last address of memory map, e.g. when starting
address is 0xFFF00000 and size is 0x100000 on 32bit machine, ending
address will be 0x100000000.
Use correct ending address like starting address + size - 1.
[akpm@linux-foundation.org: add comment to struct gen_pool_chunk:end_addr]
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 51c37a70aaa3f95773af560e6db3073520513912 ]
For properly initialising the Tausworthe generator [1], we have
a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.
Commit 697f8d0348 ("random32: seeding improvement") introduced
a __seed() function that imposes boundary checks proposed by the
errata paper [2] to properly ensure above conditions.
However, we're off by one, as the function is implemented as:
"return (x < m) ? x + m : x;", and called with __seed(X, 1),
__seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
would be possible, whereas the lower boundary should actually
be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
an initialization with an unwanted seed could have the effect
that Tausworthe's PRNG properties cannot not be ensured.
Note that this PRNG is *not* used for cryptography in the kernel.
[1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
[2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
Joint work with Hannes Frederic Sowa.
Fixes: 697f8d0348a6 ("random32: seeding improvement")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 312b4e226951f707e120b95b118cbc14f3d162b2 upstream.
Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.
This happens for example with the setuid pppd application on Ubuntu 12.04:
$ head -1 /proc/kallsyms
00000000 T startup_32
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.
Fix this by adding a check that in addition to the current process having
CAP_SYSLOG, that effective user and group ids are equal to the real ids.
If a setuid binary reads the contents of a file which uses %pK then the
pointer values will be printed as NULL if the real user is unprivileged.
Update the sysctl documentation to reflect the changes, and also correct
the documentation to state the kptr_restrict=0 is the default.
This is a only temporary solution to the issue. The correct solution is
to do the permission check at open() time on files, and to replace %pK
with a function which checks the open() time permission. %pK uses in
printk should be removed since no sane permission check can be done, and
instead protected by using dmesg_restrict.
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3d77b50c5874b7e923be946ba793644f82336b75 upstream.
Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.
Unfortunately, the commit may introduce a potential bug:
- Before sending some SCSI commands, kmalloc() buffer may be passed to
block layper, so flush_kernel_dcache_page() can see a slab page
finally
- According to cachetlb.txt, flush_kernel_dcache_page() is only called
on "a user page", which surely can't be a slab page.
- ARCH's implementation of flush_kernel_dcache_page() may use page
mapping information to do optimization so page_mapping() will see the
slab page, then VM_BUG_ON() is triggered.
Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix undefined reference to debug_dma_platformdata
Bug 1373902
Change-Id: I77544b64f84e8e43a9bfb873f6b2af375d341f0d
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Reviewed-on: http://git-master/r/278134
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
GVS: Gerrit_Virtual_Submit
|
|
Store mapping/allocation counts of devices through their lifetime and
export via debugfs the current, all time total, and maximum number of
mappings and mapped bytes.
Bug 1351794
Bug 1173494
Change-Id: I07ba73d44bcd34e37b2036507da65706a973fc92
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Reviewed-on: http://git-master/r/268456
GVS: Gerrit_Virtual_Submit
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
|
|
Export via debugfs the debug-dma infrastructure's data about allocated
mappings, and architecture specific information about possible mappings.
Bug 1173494
Change-Id: I6c64364dad69f83fd301a89938fe184dde33806a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Reviewed-on: http://git-master/r/268384
GVS: Gerrit_Virtual_Submit
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
|
|
When decompressing into memory, the output buffer length is set to some
arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
unlimited in size.
The problem with this is that some platforms have their physical memory at
high physical addresses (0x80000000 or more), and that the output buffer
address and its "unlimited" length cannot be added without overflowing.
An example of this can be found in inflate_fast():
/* next_out is the output buffer address */
out = strm->next_out - OFF;
/* avail_out is the output buffer size. end will overflow if the output
* address is >= 0x80000104 */
end = out + (strm->avail_out - 257);
This has huge consequences on the performance of kernel decompression,
since the following exit condition of inflate_fast() will be always true:
} while (in < last && out < end);
Indeed, "end" has overflowed and is now always lower than "out". As a
result, inflate_fast() will return after processing one single byte of
input data, and will thus need to be called an unreasonably high number of
times. This probably went unnoticed because kernel decompression is fast
enough even with this issue.
Nonetheless, adjusting the output buffer length in such a way that the
above pointer arithmetic never overflows results in a kernel decompression
that is about 3 times faster on affected machines.
Change-Id: I7bae4c30195d587d0ae892baf7bd5fe0e0c73a4b
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-on: http://git-master/r/274082
Reviewed-by: David Schalig <dschalig@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
|
|
DMA actions can happen before the dma-debug api has been initialized,
and because it's not initialized, there is no memory for the entries and
the system disables itself before even having added the debugfs nodes
that could be used for enabling it again. Use another flag for testing
if the debugging utility has been initialized.
Bug 1303110
Bug 1173494
Change-Id: I27c972ede0a1b64caee62c6552f3ea15e21030c3
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/245338
Reviewed-by: Hiroshi Doyu <hdoyu@nvidia.com>
Tested-by: Hiroshi Doyu <hdoyu@nvidia.com>
|
|
Change-Id: Ic8bbc5bbf9ef9cff65d93b14a655df1bdc7cd691
Signed-off-by: Jeff Smith <jsmith@nvidia.com>
Reviewed-on: http://git-master/r/141352
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
|
|
Add closing quote for LESS_GCC_OPT.
Change-Id: I6b48b19625ac3ab2cc9f4d651cc216fb890063fa
Signed-off-by: Jong Kim <jongk@nvidia.com>
Reviewed-on: http://git-master/r/131257
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: Dan Willemsen <dwillemsen@nvidia.com>
Reviewed-by: Allen Martin <amartin@nvidia.com>
Rebase-Id: R3f0f660244fb7e5f29bd8cf4ec480c7d6a19b796
|
|
Adds a kernel config option to lower the kernel compile optimization
level from O2 to O1 for clearer GDB debugging. Default on for simulation
builds, default off for others.
Change-Id: I17fd63b5b391984d28b275e516df89d6a223021c
Reviewed-on: http://git-master/r/121822
Reviewed-by: Chao Xu <cxu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Ken Adams <kadams@nvidia.com>
Tested-by: Richard Wiley <rwiley@nvidia.com>
Reviewed-by: Mark Stadler <mastadler@nvidia.com>
Reviewed-by: Bo Yan <byan@nvidia.com>
Rebase-Id: Ra7c14ac91b0cfbe441dbfea33f0d5b1a6fd44d5f
|
|
after-upstream-android
Conflicts:
arch/arm/common/Kconfig
arch/arm/mm/Makefile
arch/arm/mm/cache-l2x0.c
arch/arm/mm/mmu.c
drivers/input/Kconfig
drivers/input/Makefile
drivers/power/Kconfig
kernel/futex.c
|
|
Add API to allocate at specified alloc address.
Change-Id: I188e5430220c050026c6a3e17a586012d9a9fa04
Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-on: http://git-master/r/74468
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Rebase-Id: R93cbd801fbaea15cd4b0b579826b659d220618d7
|
|
commit 25c87eae1725ed77a8b44d782a86abdc279b4ede upstream.
FAULT_INJECTION_STACKTRACE_FILTER selects FRAME_POINTER but
that symbol is not available for MIPS.
Fixes the following problem on a randconfig:
warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP &&
KMEMCHECK) selects FRAME_POINTER which has unmet direct dependencies
(DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 || SUPERH || BLACKFIN ||
MN10300 || METAG) || ARCH_WANT_FRAME_POINTERS)
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Acked-by: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5441/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Emulate NMIs on systems where they are not available by using timer
interrupts on other cpus. Each cpu will use its softlockup hrtimer
to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.
This patch is useful on systems where the hardlockup detector is not
available due to a lack of NMIs, for example most ARM SoCs.
Without this patch any cpu stuck with interrupts disabled can
cause a hardware watchdog reset with no debugging information,
but with this patch the kernel can detect the lockup and panic,
which can result in useful debugging info.
Signed-off-by: Colin Cross <ccross@android.com>
|
|
This allows us to get a kernel stacktrace for a thread though /proc.
Also enable it by default.
Change-Id: If8c21cd02feaf9863f4841ace524fa30c7328d49
Signed-off-by: Arve Hjønnevåg <arve@android.com>
|
|
'EXTRA_FLAGS=-W'.
For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
('nbytes' is size_t which is always bigger or equal than zero).
The related warning: (with EXTRA_CFLAGS=-W)
lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The umul_ppmm() macro for parisc uses the xmpyu assembler statement
which does calculation via a floating point register.
But usage of floating point registers inside the Linux kernel are not
allowed and gcc will stop compilation due to the -mdisable-fpregs
compiler option.
Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The
mpilib will then use the generic built-in implementations instead.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg Kroah-Hartman:
"Here are 3 tiny driver core fixes for 3.10-rc2.
A needed symbol export, a change to make it easier to track down
offending sysfs files with incorrect attributes, and a klist bugfix.
All have been in linux-next for a while"
* tag 'driver-core-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
klist: del waiter from klist_remove_waiters before wakeup waitting process
driver core: print sysfs attribute name when warning about bogus permissions
driver core: export subsys_virtual_register
|
|
Fix build error io vmw_vmci.ko when CONFIG_VMWARE_VMCI=m by chaning
iovec.o from lib-y to obj-y.
ERROR: "memcpy_toiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!
ERROR: "memcpy_fromiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a race between klist_remove and klist_release. klist_remove
uses a local var waiter saved on stack. When klist_release calls
wake_up_process(waiter->process) to wake up the waiter, waiter might run
immediately and reuse the stack. Then, klist_release calls
list_del(&waiter->list) to change previous
wait data and cause prior waiter thread corrupt.
The patch fixes it against kernel 3.9.
Signed-off-by: wang, biao <biao.wang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!
That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.
In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.
socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:
- mtip32xx fixes from Micron.
- A slew of drbd updates, this time in a nicer series.
- bcache, a flash/ssd caching framework from Kent.
- Fixes for cciss"
* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include <linux/prefetch.h>
...
|
|
This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction. If ->count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.
Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem->wait_lock."
Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.
Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:
| db_size | clients | tps-rwsem | tps-patch |
+---------+----------+----------------+--------------+
| 160 MB | 1 | 6906 | 9153 | + 32.5
| 160 MB | 2 | 15931 | 22487 | + 41.1%
| 160 MB | 4 | 33021 | 32503 |
| 160 MB | 8 | 34626 | 34695 |
| 160 MB | 16 | 33098 | 34003 |
| 160 MB | 20 | 31343 | 31440 |
| 160 MB | 30 | 28961 | 28987 |
| 160 MB | 40 | 26902 | 26970 |
| 160 MB | 50 | 25760 | 25810 |
------------------------------------------------------
| 1.6 GB | 1 | 7729 | 7537 |
| 1.6 GB | 2 | 19009 | 23508 | + 23.7%
| 1.6 GB | 4 | 33185 | 32666 |
| 1.6 GB | 8 | 34550 | 34318 |
| 1.6 GB | 16 | 33079 | 32689 |
| 1.6 GB | 20 | 31494 | 31702 |
| 1.6 GB | 30 | 28535 | 28755 |
| 1.6 GB | 40 | 27054 | 27017 |
| 1.6 GB | 50 | 25591 | 25560 |
------------------------------------------------------
| 7.6 GB | 1 | 6224 | 7469 | + 20.0%
| 7.6 GB | 2 | 13611 | 12778 |
| 7.6 GB | 4 | 33108 | 32927 |
| 7.6 GB | 8 | 34712 | 34878 |
| 7.6 GB | 16 | 32895 | 33003 |
| 7.6 GB | 20 | 31689 | 31974 |
| 7.6 GB | 30 | 29003 | 28806 |
| 7.6 GB | 40 | 26683 | 26976 |
| 7.6 GB | 50 | 25925 | 25652 |
------------------------------------------------------
For the aim7 worloads, they overall improved on top of Michel's
patchset. For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel:
http://stgolabs.net/rwsem-aim7-results.tar.gz
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
- make warning smp-safe
- result of atomic _unless_zero functions should be checked by caller
to avoid use-after-free error
- trivial whitespace fix.
Link: https://lkml.org/lkml/2013/4/12/391
Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
[ Removed line-break, changed to use WARN_ON_ONCE() - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Merge rwsem optimizations from Michel Lespinasse:
"These patches extend Alex Shi's work (which added write lock stealing
on the rwsem slow path) in order to provide rwsem write lock stealing
on the fast path (that is, without taking the rwsem's wait_lock).
I have unfortunately been unable to push this through -next before due
to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
things. However, this has gotten some attention from Rik van Riel and
Davidlohr Bueso who both commented that they felt this was ready for
v3.10, and Ingo Molnar has said that he was OK with me pushing
directly to you. So, here goes :)
Davidlohr got the following test results from pgbench running on a
quad-core laptop:
| db_size | clients | tps-vanilla | tps-rwsem |
+---------+----------+----------------+--------------+
| 160 MB | 1 | 5803 | 6906 | + 19.0%
| 160 MB | 2 | 13092 | 15931 |
| 160 MB | 4 | 29412 | 33021 |
| 160 MB | 8 | 32448 | 34626 |
| 160 MB | 16 | 32758 | 33098 |
| 160 MB | 20 | 26940 | 31343 | + 16.3%
| 160 MB | 30 | 25147 | 28961 |
| 160 MB | 40 | 25484 | 26902 |
| 160 MB | 50 | 24528 | 25760 |
------------------------------------------------------
| 1.6 GB | 1 | 5733 | 7729 | + 34.8%
| 1.6 GB | 2 | 9411 | 19009 | + 101.9%
| 1.6 GB | 4 | 31818 | 33185 |
| 1.6 GB | 8 | 33700 | 34550 |
| 1.6 GB | 16 | 32751 | 33079 |
| 1.6 GB | 20 | 30919 | 31494 |
| 1.6 GB | 30 | 28540 | 28535 |
| 1.6 GB | 40 | 26380 | 27054 |
| 1.6 GB | 50 | 25241 | 25591 |
------------------------------------------------------
| 7.6 GB | 1 | 5779 | 6224 |
| 7.6 GB | 2 | 10897 | 13611 | + 24.9%
| 7.6 GB | 4 | 32683 | 33108 |
| 7.6 GB | 8 | 33968 | 34712 |
| 7.6 GB | 16 | 32287 | 32895 |
| 7.6 GB | 20 | 27770 | 31689 | + 14.1%
| 7.6 GB | 30 | 26739 | 29003 |
| 7.6 GB | 40 | 24901 | 26683 |
| 7.6 GB | 50 | 17115 | 25925 | + 51.5%
------------------------------------------------------
(Davidlohr also has one additional patch which further improves
throughput, though I will ask him to send it directly to you as I have
suggested some minor changes)."
* emailed patches from Michel Lespinasse <walken@google.com>:
rwsem: no need for explicit signed longs
x86 rwsem: avoid taking slow path when stealing write lock
rwsem: do not block readers at head of queue if other readers are active
rwsem: implement support for write lock stealing on the fastpath
rwsem: simplify __rwsem_do_wake
rwsem: skip initial trylock in rwsem_down_write_failed
rwsem: avoid taking wait_lock in rwsem_down_write_failed
rwsem: use cmpxchg for trying to steal write lock
rwsem: more agressive lock stealing in rwsem_down_write_failed
rwsem: simplify rwsem_down_write_failed
rwsem: simplify rwsem_down_read_failed
rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
rwsem: shorter spinlocked section in rwsem_down_failed_common()
rwsem: make the waiter type an enumeration rather than a bitmask
|
|
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.
In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.
Thanks to Peter Hurley for noticing this possible race.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers. But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue. This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting. To that end, we grant the first reader lock
before counting how many more readers are queued.
We also require some adjustments to the wake_type semantics.
RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock. This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.
Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available. We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is mostly for cleanup value:
- We don't need several gotos to handle the case where the first
waiter is a writer. Two simple tests will do (and generate very
similar code).
- In the remainder of the function, we know the first waiter is a reader,
so we don't have to double check that. We can use do..while loops
to iterate over the readers to wake (generates slightly better code).
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e. the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Using rwsem_atomic_update to try stealing the write lock forced us to
undo the adjustment in the failure path. We can have simpler and faster
code by using cmpxchg instead.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some small code simplifications can be achieved by doing more agressive
lock stealing:
- When rwsem_down_write_failed() notices that there are no active locks
(and thus no thread to wake us if we decided to sleep), it used to wake
the first queued process. However, stealing the lock is also sufficient
to deal with this case, so we don't need this check anymore.
- In try_get_writer_sem(), we can steal the lock even when the first waiter
is a reader. This is correct because the code path that wakes readers is
protected by the wait_lock. As to the performance effects of this change,
they are expected to be minimal: readers are still granted the lock
(rather than having to acquire it themselves) when they reach the front
of the wait queue, so we have essentially the same behavior as in
rwsem-spinlock.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When waking writers, we never grant them the lock - instead, they have
to acquire it themselves when they run, and remove themselves from the
wait_list when they succeed.
As a result, we can do a few simplifications in rwsem_down_write_failed():
- We don't need to check for !waiter.task since __rwsem_do_wake() doesn't
remove writers from the wait_list
- There is no point releaseing the wait_lock before entering the wait loop,
as we will need to reacquire it immediately. We can change the loop so
that the lock is always held at the start of each loop iteration.
- We don't need to get a reference on the task structure, since the task
is responsible for removing itself from the wait_list. There is no risk,
like in the rwsem_down_read_failed() case, that a task would wake up and
exit (thus destroying its task structure) while __rwsem_do_wake() is
still running - wait_lock protects against that.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS
adjustment doesn't cause other readers to block, so we never have to
worry about waking them back after canceling this adjustment in
rwsem_down_read_failed().
We also never want to steal the lock in rwsem_down_read_failed(), so we
don't have to grab the wait_lock either.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove the rwsem_down_failed_common function and replace it with two
identical copies of its code in rwsem_down_{read,write}_failed.
This is because we want to make different optimizations in
rwsem_down_{read,write}_failed; we are adding this pure-duplication
step as a separate commit in order to make it easier to check the
following steps.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE
sections in rwsem_down_failed_common():
- We only need the sem->wait_lock to insert ourselves on the wait_list;
the waiter node can be prepared outside of the wait_lock.
- The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately
before checking if we actually need to sleep; it doesn't need to protect
the entire function.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We are not planning to add some new waiter flags, so we can convert the
waiter type into an enumeration.
Background: David Howells suggested I do this back when I tried adding
a new waiter type for unfair readers. However, I believe the cleanup
applies regardless of that use case.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.
Reported-by: Dros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: stable@vger.kernel.org
cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 3.10.
Wierd bits:
- OMAP drm changes required OMAP dss changes, in drivers/video, so I
took them in here.
- one more fbcon fix for font handover
- VT switch avoidance in pm code
- scatterlist helpers for gpu drivers - have acks from akpm
Highlights:
- qxl kms driver - driver for the spice qxl virtual GPU
Nouveau:
- fermi/kepler VRAM compression
- GK110/nvf0 modesetting support.
Tegra:
- host1x core merged with 2D engine support
i915:
- vt switchless resume
- more valleyview support
- vblank fixes
- modesetting pipe config rework
radeon:
- UVD engine support
- SI chip tiling support
- GPU registers initialisation from golden values.
exynos:
- device tree changes
- fimc block support
Otherwise:
- bunches of fixes all over the place."
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits)
qxl: update to new idr interfaces.
drm/nouveau: fix build with nv50->nvc0
drm/radeon: fix handling of v6 power tables
drm/radeon: clarify family checks in pm table parsing
drm/radeon: consolidate UVD clock programming
drm/radeon: fix UPLL_REF_DIV_MASK definition
radeon: add bo tracking debugfs
drm/radeon: add new richland pci ids
drm/radeon: add some new SI PCI ids
drm/radeon: fix scratch reg handling for UVD fence
drm/radeon: allocate SA bo in the requested domain
drm/radeon: fix possible segfault when parsing pm tables
drm/radeon: fix endian bugs in atom_allocate_fb_scratch()
OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found
OMAPDSS: VENC: Add error handling for venc_probe_pdata
OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata
OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata
OMAPDSS: DSI: Add error handling for dsi_probe_pdata
OMAPDSS: SDI: Add error handling for sdi_probe_pdata
OMAPDSS: DPI: Add error handling for dpi_probe_pdata
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"This fixes the cputime scaling overflow problems for good without
having bad 32-bit overhead, and gets rid of the div64_u64_rem() helper
as well."
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "math64: New div64_u64_rem helper"
sched: Avoid prev->stime underflow
sched: Do not account bogus utime
sched: Avoid cputime scaling overflow
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
|