Age | Commit message (Collapse) | Author |
|
commit 62b4d2041117f35ab2409c9f5c4b8d3dc8e59d0f upstream.
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
the middle of the options for the EXPERT menu. However,
HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
placing items in the EXPERT menu, and displays the remaining several
EXPERT items (starting with EPOLL) directly in the General Setup menu.
Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
subsequent items display as part of the EXPERT menu again; the EMBEDDED
menu now appears as the next top-level item in the General Setup menu,
which makes General Setup much shorter and more usable.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 82c04ff89eba09d0e46e3f3649c6d3aa18e764a0 upstream.
The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it
to show up in the toplevel of the kernel configuration. Fix this by
moving it under the General Setup menu.
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream.
If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtime check necessary, allow to skip the test within futex_init().
This allows to get rid of some code which would always give the same result,
and also allows the compiler to optimize a couple of if statements away.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Zhenglong.cai <zhenglong.cai@cs2c.com.cn>
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespaces work from Eric Biederman:
"The work to convert the kernel to use kuid_t and kgid_t has been
finished since 3.12 so it is time to remove the scaffolding that
allowed the work to progress incrementally.
The first patch on this branch just removes the scaffolding, ensuring
we will always get compile errors if people accidentally try the
userspace and the kernel uid and gid types. The second patch an
overlooked and unused chunk of mips code that that fails to build
after the first patch.
The code hasn't been in linux-next for long (as I was out of it and
could not sheppared the cold properly) but the patch has been around
for a long time just waiting for the day when I had finished the
uid/gid conversions. Putting the code in linux-next did find the
compile failure on mips so I took the time to get that fix reviewed
and included. Beyond that I am not too worried about errors because
all these two patches do is delete a modest amount of code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
MIPS: VPE: Remove vpe_getuid and vpe_getgid
userns: userns: Remove UIDGID_STRICT_TYPE_CHECKS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"The bulk of changes are cleanups and preparations for the upcoming
kernfs conversion.
- cgroup_event mechanism which is and will be used only by memcg is
moved to memcg.
- pidlist handling is updated so that it can be served by seq_file.
Also, the list is not sorted if sane_behavior. cgroup
documentation explicitly states that the file is not sorted but it
has been for quite some time.
- All cgroup file handling now happens on top of seq_file. This is
to prepare for kernfs conversion. In addition, all operations are
restructured so that they map 1-1 to kernfs operations.
- Other cleanups and low-pri fixes"
* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
cgroup: trivial style updates
cgroup: remove stray references to css_id
doc: cgroups: Fix typo in doc/cgroups
cgroup: fix fail path in cgroup_load_subsys()
cgroup: fix missing unlock on error in cgroup_load_subsys()
cgroup: remove for_each_root_subsys()
cgroup: implement for_each_css()
cgroup: factor out cgroup_subsys_state creation into create_css()
cgroup: combine css handling loops in cgroup_create()
cgroup: reorder operations in cgroup_create()
cgroup: make for_each_subsys() useable under cgroup_root_mutex
cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
cgroup: unify pidlist and other file handling
cgroup: replace cftype->read_seq_string() with cftype->seq_show()
cgroup: attach cgroup_open_file to all cgroup files
cgroup: generalize cgroup_pidlist_open_file
cgroup: unify read path so that seq_file is always used
cgroup: unify cgroup_write_X64() and cgroup_write_string()
cgroup: remove cftype->read(), ->read_map() and ->write()
hugetlb_cgroup: convert away from cftype->read()
...
|
|
Pick up the latest fixes and refresh the branch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
allows using 64x64->128 muls on 64bit archs and recent GCC
which defines __SIZEOF_INT128__ and __int128.
(This new method will be used by the scheduler.)
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
|
|
Removing UIDGID_STRICT_TYPE_CHECKS simplifies the code and always
generates a compile error if the uids and kuids or gids and kgids are
mixed by accident. Now that the appropriate conversions have been
placed throughout the kernel there is no longer a need for a mode where
we don't detect them as compile errors.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Merge v3.12 based patch series to move cgroup_event implementation to
memcg into for-3.14. The following two commits cause a conflict in
kernel/cgroup.c
2ff2a7d03bbe4 ("cgroup: kill css_id")
79bd9814e5ec9 ("cgroup, memcg: move cgroup_event implementation to memcg")
Each patch removes a struct definition from kernel/cgroup.c. As the
two are adjacent, they cause a context conflict. Easily resolved by
removing both structs.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface. This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent. Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.
Thankfully, memcg is the only user and gets to keep it. Replacing it
with something simpler on sane_behavior is strongly recommended.
This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c. Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.
cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.
Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it. While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.
Aside from the above change, this is pure code relocation.
v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
poll.h inclusion moved from cgroup.c to memcontrol.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"In this patchset, we finally get an SELinux update, with Paul Moore
taking over as maintainer of that code.
Also a significant update for the Keys subsystem, as well as
maintenance updates to Smack, IMA, TPM, and Apparmor"
and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:
"Okay. There are a number of separate bits. I'll go over the big bits
and the odd important other bit, most of the smaller bits are just
fixes and cleanups. If you want the small bits accounting for, I can
do that too.
(1) Keyring capacity expansion.
KEYS: Consolidate the concept of an 'index key' for key access
KEYS: Introduce a search context structure
KEYS: Search for auth-key by name rather than target key ID
Add a generic associative array implementation.
KEYS: Expand the capacity of a keyring
Several of the patches are providing an expansion of the capacity of a
keyring. Currently, the maximum size of a keyring payload is one page.
Subtract a small header and then divide up into pointers, that only gives
you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
a keyring to store ID mapping data, that has proven to be insufficient to
the cause.
Whatever data structure I use to handle the keyring payload, it can only
store pointers to keys, not the keys themselves because several keyrings
may point to a single key. This precludes inserting, say, and rb_node
struct into the key struct for this purpose.
I could make an rbtree of records such that each record has an rb_node
and a key pointer, but that would use four words of space per key stored
in the keyring. It would, however, be able to use much existing code.
I selected instead a non-rebalancing radix-tree type approach as that
could have a better space-used/key-pointer ratio. I could have used the
radix tree implementation that we already have and insert keys into it by
their serial numbers, but that means any sort of search must iterate over
the whole radix tree. Further, its nodes are a bit on the capacious side
for what I want - especially given that key serial numbers are randomly
allocated, thus leaving a lot of empty space in the tree.
So what I have is an associative array that internally is a radix-tree
with 16 pointers per node where the index key is constructed from the key
type pointer and the key description. This means that an exact lookup by
type+description is very fast as this tells us how to navigate directly to
the target key.
I made the data structure general in lib/assoc_array.c as far as it is
concerned, its index key is just a sequence of bits that leads to a
pointer. It's possible that someone else will be able to make use of it
also. FS-Cache might, for example.
(2) Mark keys as 'trusted' and keyrings as 'trusted only'.
KEYS: verify a certificate is signed by a 'trusted' key
KEYS: Make the system 'trusted' keyring viewable by userspace
KEYS: Add a 'trusted' flag and a 'trusted only' flag
KEYS: Separate the kernel signature checking keyring from module signing
These patches allow keys carrying asymmetric public keys to be marked as
being 'trusted' and allow keyrings to be marked as only permitting the
addition or linkage of trusted keys.
Keys loaded from hardware during kernel boot or compiled into the kernel
during build are marked as being trusted automatically. New keys can be
loaded at runtime with add_key(). They are checked against the system
keyring contents and if their signatures can be validated with keys that
are already marked trusted, then they are marked trusted also and can
thus be added into the master keyring.
Patches from Mimi Zohar make this usable with the IMA keyrings also.
(3) Remove the date checks on the key used to validate a module signature.
X.509: Remove certificate date checks
It's not reasonable to reject a signature just because the key that it was
generated with is no longer valid datewise - especially if the kernel
hasn't yet managed to set the system clock when the first module is
loaded - so just remove those checks.
(4) Make it simpler to deal with additional X.509 being loaded into the kernel.
KEYS: Load *.x509 files into kernel keyring
KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate
The builder of the kernel now just places files with the extension ".x509"
into the kernel source or build trees and they're concatenated by the
kernel build and stuffed into the appropriate section.
(5) Add support for userspace kerberos to use keyrings.
KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
KEYS: Implement a big key type that can save to tmpfs
Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
We looked at storing it in keyrings instead as that confers certain
advantages such as tickets being automatically deleted after a certain
amount of time and the ability for the kernel to get at these tokens more
easily.
To make this work, two things were needed:
(a) A way for the tickets to persist beyond the lifetime of all a user's
sessions so that cron-driven processes can still use them.
The problem is that a user's session keyrings are deleted when the
session that spawned them logs out and the user's user keyring is
deleted when the UID is deleted (typically when the last log out
happens), so neither of these places is suitable.
I've added a system keyring into which a 'persistent' keyring is
created for each UID on request. Each time a user requests their
persistent keyring, the expiry time on it is set anew. If the user
doesn't ask for it for, say, three days, the keyring is automatically
expired and garbage collected using the existing gc. All the kerberos
tokens it held are then also gc'd.
(b) A key type that can hold really big tickets (up to 1MB in size).
The problem is that Active Directory can return huge tickets with lots
of auxiliary data attached. We don't, however, want to eat up huge
tracts of unswappable kernel space for this, so if the ticket is
greater than a certain size, we create a swappable shmem file and dump
the contents in there and just live with the fact we then have an
inode and a dentry overhead. If the ticket is smaller than that, we
slap it in a kmalloc()'d buffer"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
KEYS: Fix keyring content gc scanner
KEYS: Fix error handling in big_key instantiation
KEYS: Fix UID check in keyctl_get_persistent()
KEYS: The RSA public key algorithm needs to select MPILIB
ima: define '_ima' as a builtin 'trusted' keyring
ima: extend the measurement list to include the file signature
kernel/system_certificate.S: use real contents instead of macro GLOBAL()
KEYS: fix error return code in big_key_instantiate()
KEYS: Fix keyring quota misaccounting on key replacement and unlink
KEYS: Fix a race between negating a key and reading the error set
KEYS: Make BIG_KEYS boolean
apparmor: remove the "task" arg from may_change_ptraced_domain()
apparmor: remove parent task info from audit logging
apparmor: remove tsk field from the apparmor_audit_struct
apparmor: fix capability to not use the current task, during reporting
Smack: Ptrace access check mode
ima: provide hash algo info in the xattr
ima: enable support for larger default filedata hash algorithms
ima: define kernel parameter 'ima_template=' to change configured default
ima: add Kconfig default measurement list template
...
|
|
Pull audit updates from Eric Paris:
"Nothing amazing. Formatting, small bug fixes, couple of fixes where
we didn't get records due to some old VFS changes, and a change to how
we collect execve info..."
Fixed conflict in fs/exec.c as per Eric and linux-next.
* git://git.infradead.org/users/eparis/audit: (28 commits)
audit: fix type of sessionid in audit_set_loginuid()
audit: call audit_bprm() only once to add AUDIT_EXECVE information
audit: move audit_aux_data_execve contents into audit_context union
audit: remove unused envc member of audit_aux_data_execve
audit: Kill the unused struct audit_aux_data_capset
audit: do not reject all AUDIT_INODE filter types
audit: suppress stock memalloc failure warnings since already managed
audit: log the audit_names record type
audit: add child record before the create to handle case where create fails
audit: use given values in tty_audit enable api
audit: use nlmsg_len() to get message payload length
audit: use memset instead of trying to initialize field by field
audit: fix info leak in AUDIT_GET requests
audit: update AUDIT_INODE filter rule to comparator function
audit: audit feature to set loginuid immutable
audit: audit feature to only allow unsetting the loginuid
audit: allow unsetting the loginuid (with priv)
audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
audit: loginuid functions coding style
selinux: apply selinux checks on new audit message types
...
|
|
This reverts commit 69f0554ec261fd686ac7fa1c598cc9eb27b83a80.
This patch breaks randconfig on at least the x86-64 architecture, and
most likely on others. There is work underway to support uncompressed
kernels in a generic way, but it looks like it will amount to
rewriting the support from scratch; see the LKML thread in the Link:
for info.
Therefore, revert this change and wait for the fix.
Reported-by: Pavel Roskin <proski@gnu.org>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131113113418.167b8ffd@IRBT4585
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
|
|
Some ARC users say they can boot faster with without kernel compression.
This probably depends on things like the FLASH chip they use etc.
Until now, kernel compression can only be disabled by removing "select
HAVE_<compression>" lines from the architecture Kconfig. So add the
Kconfig logic to permit disabling of kernel compression.
Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Ingo Molnar:
"Main changes in this cycle were:
- Updated full dynticks support.
- Event stream support for architected (ARM) timers.
- ARM clocksource driver updates.
- Move arm64 to using the generic sched_clock framework & resulting
cleanup in the generic sched_clock code.
- Misc fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
clocksource: sun4i: remove IRQF_DISABLED
clocksource: sun4i: Report the minimum tick that we can program
clocksource: sun4i: Select CLKSRC_MMIO
clocksource: Provide timekeeping for efm32 SoCs
clocksource: em_sti: convert to clk_prepare/unprepare
time: Fix signedness bug in sysfs_get_uname() and its callers
timekeeping: Fix some trivial typos in comments
alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
clocksource: arch_timer: Do not register arch_sys_counter twice
timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
sched_clock: Remove sched_clock_func() hook
arch_timer: Move to generic sched_clock framework
clocksource: tcb_clksrc: Remove IRQF_DISABLED
clocksource: tcb_clksrc: Improve driver robustness
clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
clocksource: dw_apb_timer_of: Mark a few more functions as __init
clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
arm: zynq: Enable arm_global_timer
...
|
|
Implement missing functions for parisc to provide kernel audit feature.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
After trying to use this feature in Fedora we found the hard coding
policy like this into the kernel was a bad idea. Surprise surprise.
We ran into these problems because it was impossible to launch a
container as a logged in user and run a login daemon inside that container.
This reverts back to the old behavior before this option was added. The
option will be re-added in a userspace selectable manor such that
userspace can choose when it is and when it is not appropriate.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
The CONFIG_64BIT requirement on vtime can finally be removed
since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which
already takes care of the arch ability to handle nsecs based
cputime_t safely.
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
to use that feature, arch code should be audited to ensure there are no
races in concurrent read/write of cputime_t. For example,
reading/writing 64-bit cputime_t on some 32-bit arches may require
multiple accesses for low and high value parts, so proper locking
is needed to protect against concurrent accesses.
Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
enable after they've been audited for potential races.
This option is automatically enabled on 64-bit platforms.
Feature requested by Frederic Weisbecker.
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
Separate the kernel signature checking keyring from module signing so that it
can be used by code other than the module-signing code.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB update from Pekka Enberg:
"Nothing terribly exciting here apart from Christoph's kmalloc
unification patches that brings sl[aou]b implementations closer to
each other"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slab: Use correct GFP_DMA constant
slub: remove verify_mem_not_deleted()
mm/sl[aou]b: Move kmallocXXX functions to common code
mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
mm/slub.c: beautify code for removing redundancy 'break' statement.
slub: Remove unnecessary page NULL check
slub: don't use cpu partial pages on UP
mm/slub: beautify code for 80 column limitation and tab alignment
mm/slub: remove 'per_cpu' which is useless variable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kconfig updates from Michal Marek:
"This is the kconfig part of kbuild for v3.12-rc1:
- post-3.11 search code fixes and micro-optimizations
- CONFIG_MODULES is no longer a special case; this is needed to
eventually fix the bug that using KCONFIG_ALLCONFIG breaks
allmodconfig
- long long is used to store hex and int values
- make silentoldconfig no longer warns when a symbol changes from
tristate to bool (it's a job for make oldconfig)
- scripts/diffconfig updated to work with newer Pythons
- scripts/config does not rely on GNU sed extensions"
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig: do not allow more than one symbol to have 'option modules'
kconfig: regenerate bison parser
kconfig: do not special-case 'MODULES' symbol
diffconfig: Update script to support python versions 2.5 through 3.3
diffconfig: Gracefully exit if the default config files are not present
modules: do not depend on kconfig to set 'modules' option to symbol MODULES
kconfig: silence warning when parsing auto.conf when a symbol has changed type
scripts/config: use sed's POSIX interface
kconfig: switch to "long long" for sanity
kconfig: simplify symbol-search code
kconfig: don't allocate n+1 elements in temporary array
kconfig: minor style fixes in symbol-search code
kconfig/[mn]conf: shorten title in search-box
kconfig: avoid multiple calls to strlen
Documentation/kconfig: more concise and straightforward search explanation
|
|
Pull xfs updates from Ben Myers:
"For 3.12-rc1 there are a number of bugfixes in addition to work to
ease usage of shared code between libxfs and the kernel, the rest of
the work to enable project and group quotas to be used simultaneously,
performance optimisations in the log and the CIL, directory entry file
type support, fixes for log space reservations, some spelling/grammar
cleanups, and the addition of user namespace support.
- introduce readahead to log recovery
- add directory entry file type support
- fix a number of spelling errors in comments
- introduce new Q_XGETQSTATV quotactl for project quotas
- add USER_NS support
- log space reservation rework
- CIL optimisations
- kernel/userspace libxfs rework"
* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
xfs: finish removing IOP_* macros.
xfs: inode log reservations are too small
xfs: check correct status variable for xfs_inobt_get_rec() call
xfs: inode buffers may not be valid during recovery readahead
xfs: check LSN ordering for v5 superblocks during recovery
xfs: btree block LSN escaping to disk uninitialised
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
xfs: fix bad dquot buffer size in log recovery readahead
xfs: don't account buffer cancellation during log recovery readahead
xfs: check for underflow in xfs_iformat_fork()
xfs: xfs_dir3_sfe_put_ino can be static
xfs: introduce object readahead to log recovery
xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
xfs: Register hotcpu notifier after initialization
xfs: add xfs sb v4 support for dirent filetype field
xfs: Add write support for dirent filetype field
xfs: Add read-only support for dirent filetype field
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz changes from Ingo Molnar:
"It mostly contains fixes and full dynticks off-case optimizations, by
Frederic Weisbecker"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
nohz: Include local CPU in full dynticks global kick
nohz: Optimize full dynticks's sched hooks with static keys
nohz: Optimize full dynticks state checks with static keys
nohz: Rename a few state variables
vtime: Always debug check snapshot source _before_ updating it
vtime: Always scale generic vtime accounting results
vtime: Optimize full dynticks accounting off case with static keys
vtime: Describe overriden functions in dedicated arch headers
m68k: hardirq_count() only need preempt_mask.h
hardirq: Split preempt count mask definitions
context_tracking: Split low level state headers
vtime: Fix racy cputime delta update
vtime: Remove a few unneeded generic vtime state checks
context_tracking: User/kernel broundary cross trace events
context_tracking: Optimize context switch off case with static keys
context_tracking: Optimize guest APIs off case with static key
context_tracking: Optimize main APIs off case with static key
context_tracking: Ground setup for static key use
context_tracking: Remove full dynticks' hacky dependency on wide context tracking
nohz: Only enable context tracking on full dynticks CPUs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
"
* Update RCU documentation. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/611.
* Miscellaneous fixes. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/619.
* Full-system idle detection. This is for use by Frederic
Weisbecker's adaptive-ticks mechanism. Its purpose is
to allow the timekeeping CPU to shut off its tick when
all other CPUs are idle. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/648.
* Improve rcutorture test coverage. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/675.
"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The swapaccount kernel parameter without any values has been removed by
commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
it seems that we didn't get rid of all the left overs.
Make sure that menuconfig help text and kernel-parameters.txt are clear
about value for the paramter and remove the stalled comment which is not
very much useful on its own.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built,
but only TREE_RCU selects IRQ_WORK, which can result in an undefined
reference to irq_work_queue for some (random) configs:
kernel/built-in.o In function `rcu_start_gp_advanced':
kernel/rcutree.c:1564: undefined reference to `irq_work_queue'
Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Currently, the MODULES symbol is special-cased in different places in the
kconfig language. For example, if no symbol is defined to enable tristates,
then kconfig looks up for a symbol named 'MODULES', and forces the 'modules'
option onto that symbol.
This causes problems as such:
- since MODULES is special-cased, reading the configuration with
KCONFIG_ALLCONFIG set will forcibly set MODULES to be 'valid' (ie.
it has a valid value), when no such value was previously set. So
MODULES defaults to 'n' unless it is present in KCONFIG_ALLCONFIG
- other third-party projects may decide that 'MODULES' plays a different
role for them
This has been exposed by cset #cfa98f2e:
kconfig: do not override symbols already set
and reported by Stephen in:
http://marc.info/?l=linux-next&m=137592137915234&w=2
As suggested by Sam, we explicitly define the MODULES symbol to be the
tristate-enabler. This will allow us to drop special-casing of MODULES
in the kconfig language, later.
(Note: this patch is not a fix to Stephen's issue, just a first step).
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: yann.morin.1998@free.fr
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: sedat.dilek@gmail.com
Cc: Theodore Ts'o <tytso@mit.edu>
|
|
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
cpu partial pages are used to avoid contention which does not exist in
the UP case. So let SLUB_CPU_PARTIAL depend on SMP.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
tracking
Now that the full dynticks subsystem only enables the context tracking
on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE
This dependency was a hack to enable the context tracking widely for the
full dynticks susbsystem until the latter becomes able to enable it in a
more CPU-finegrained fashion.
Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
work on support for the context tracking while full dynticks can't be
used yet due to unmet dependencies. It simulates a system where all CPUs
are full dynticks so that RCU user extended quiescent states and dynticks
cputime accounting can be tested on the given arch.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab update from Pekka Enberg:
"Highlights:
- Fix for boot-time problems on some architectures due to
init_lock_keys() not respecting kmalloc_caches boundaries
(Christoph Lameter)
- CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)
- Fix for excessive slab freelist draining (Wanpeng Li)
- SLUB and SLOB cleanups and fixes (various people)"
I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
slub: Check for page NULL before doing the node_match check
mm/slab: Give s_next and s_stop slab-specific names
slob: Check for NULL pointer before calling ctor()
slub: Make cpu partial slab support configurable
slab: add kmalloc() to kernel API documentation
slab: fix init_lock_keys
slob: use DIV_ROUND_UP where possible
slub: do not put a slab to cpu partial list when cpu_partial is 0
mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
mm/slub: Drop unnecessary nr_partials
mm/slab: Fix /proc/slabinfo unwriteable for slab
mm/slab: Sharing s_next and s_stop between slab and slub
mm/slab: Fix drain freelist excessively
slob: Rework #ifdeffery in slab.h
mm, slab: moved kmem_cache_alloc_node comment to correct place
|
|
Add support for extracting LZ4-compressed kernel images, as well as
LZ4-compressed ramdisk images in the kernel boot process.
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CPU partial support can introduce level of indeterminism that is not
wanted in certain context (like a realtime kernel). Make it
configurable.
This patch is based on Christoph Lameter's "slub: Make cpu partial slab
support configurable V2".
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
"The timer changes contain:
- posix timer code consolidation and fixes for odd corner cases
- sched_clock implementation moved from ARM to core code to avoid
duplication by other architectures
- alarm timer updates
- clocksource and clockevents unregistration facilities
- clocksource/events support for new hardware
- precise nanoseconds RTC readout (Xen feature)
- generic support for Xen suspend/resume oddities
- the usual lot of fixes and cleanups all over the place
The parts which touch other areas (ARM/XEN) have been coordinated with
the relevant maintainers. Though this results in an handful of
trivial to solve merge conflicts, which we preferred over nasty cross
tree merge dependencies.
The patches which have been committed in the last few days are bug
fixes plus the posix timer lot. The latter was in akpms queue and
next for quite some time; they just got forgotten and Frederic
collected them last minute."
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
hrtimer: Remove unused variable
hrtimers: Move SMP function call to thread context
clocksource: Reselect clocksource when watchdog validated high-res capability
posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
posix_timers: fix racy timer delta caching on task exit
posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
selftests: add basic posix timers selftests
posix_cpu_timers: consolidate expired timers check
posix_cpu_timers: consolidate timer list cleanups
posix_cpu_timer: consolidate expiry time type
tick: Sanitize broadcast control logic
tick: Prevent uncontrolled switch to oneshot mode
tick: Make oneshot broadcast robust vs. CPU offlining
x86: xen: Sync the CMOS RTC as well as the Xen wallclock
x86: xen: Sync the wallclock when the system time is set
timekeeping: Indicate that clock was set in the pvclock gtod notifier
timekeeping: Pass flags instead of multiple bools to timekeeping_update()
xen: Remove clock_was_set() call in the resume path
hrtimers: Support resuming with two or more CPUs online (but stopped)
timer: Fix jiffies wrap behavior of round_jiffies_common()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Now there are only 2 members in struct page_cgroup. Update config MEMCG
description accordingly.
Signed-off-by: Sergey Dyasly <dserrg@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The major changes:
- Simplify RCU's grace-period and callback processing based on the new
numbering for callbacks.
- Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
single-CPU low-latency systems.
- SRCU-related changes and fixes.
- Miscellaneous fixes, including converting a few remaining printk()
calls to pr_*().
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
rcu: Shrink TINY_RCU by moving exit_rcu()
rcu: Remove TINY_PREEMPT_RCU tracing documentation
rcu: Consolidate rcutiny_plugin.h ifdefs
rcu: Remove rcu_preempt_note_context_switch()
rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
rcu: Remove check_cpu_stall_preempt()
rcu: Simplify RCU_TINY RCU callback invocation
rcu: Remove rcu_preempt_process_callbacks()
rcu: Remove rcu_preempt_remove_callbacks()
rcu: Remove rcu_preempt_check_callbacks()
rcu: Remove show_tiny_preempt_stats()
rcu: Remove TINY_PREEMPT_RCU
powerpc,kvm: fix imbalance srcu_read_[un]lock()
rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
rcu: Apply Dave Jones's NOCB Kconfig help feedback
rcu: Merge adjacent identical ifdefs
rcu: Drive quiescent-state-forcing delay from HZ
rcu: Remove "Experimental" flags
kthread: Add kworker kthreads to OS-jitter documentation
...
|
|
Some drivers can be built on more platforms than they run on. This is
a burden for users and distributors who package a kernel. They have to
manually deselect some (for them useless) drivers when updating their
configs via oldconfig. And yet, sometimes it is even impossible to
disable the drivers without patching the kernel.
Introduce a new config option COMPILE_TEST and make all those drivers
to depend on the platform they run on, or on the COMPILE_TEST option.
Now, when users/distributors choose COMPILE_TEST=n they will not have
the drivers in their allmodconfig setups, but developers still can
compile-test them with COMPILE_TEST=y.
Now the drivers where we use this new option:
* PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
processors so it should depend on x86.
* FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
* USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
systems -- which do not actually support the hardware via that
method.
* INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom
device.
[v2]
* remove EXPERT dependency
[gregkh - remove chipidea portion, as it's incorrect, and also doesn't
apply to my driver-core tree]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-usb@vger.kernel.org
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: linux-geode@lists.infradead.org
Cc: linux-fbdev@vger.kernel.org
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: netdev@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: "Keller, Jacob E" <jacob.e.keller@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We want these fixes here too.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Nothing about the sched_clock implementation in the ARM port is
specific to the architecture. Generalize the code so that other
architectures can use it by selecting GENERIC_SCHED_CLOCK.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[jstultz: Merge minor collisions with other patches in my tree]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD
cbnum.2013.06.10a: Apply simplifications stemming from the new callback
numbering.
doc.2013.06.10a: Documentation updates.
fixes.2013.06.10a: Miscellaneous fixes.
srcu.2013.06.10a: Updates to SRCU.
tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
|
|
TINY_PREEMPT_RCU adds significant code and complexity, but does not
offer commensurate benefits. People currently using TINY_PREEMPT_RCU
can get much better memory footprint with TINY_RCU, or, if they really
need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
minor degradation in memory footprint. Please note that this move
has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
and on LWN (http://lwn.net/Articles/541037/).
This commit therefore removes TINY_PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO,
and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit
adds a bit more detail.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
After a release or two, features are no longer experimental. Therefore,
this commit removes the "Experimental" tag from them.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.
On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
Notice the above call chain.
rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.
You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).
> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
Notice the above trace.
perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:
rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();
Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.
The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.
CPU0 CPU1 CPU2
---- ---- ----
rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);
wake_up();
lock(rq->lock);
rcu_read_unlock();
rcu_read_unlock_special();
lock(rnp->lock);
lock(ctx->lock);
**** DEADLOCK ****
> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|