summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-12-31Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fix from Paul Moore: "One audit patch to resolve a panic/oops when recording filenames in the audit log, see the mail archive link below. The fix isn't as nice as I would like, as it involves an allocate/copy of the filename, but it solves the problem and the overhead should only affect users who have configured audit rules involving file names. We'll revisit this issue with future kernels in an attempt to make this suck less, but in the meantime I think this fix should go into the next release of v3.19-rcX. [ https://marc.info/?t=141986927600001&r=1&w=2 ]" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: create private file name copies when auditing inodes
2014-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen. 2) Fix receive checksum handling in enic driver, from Govindarajulu Varadarajan. 3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from Herbert Xu. Also, add code to detect drivers that have this mistake in the future. 4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai. 5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP input path,f rom Nicolas Dichtel. 6) Fix MPLS action validation in openvswitch, from Pravin B Shelar. 7) Fix double SKB free in vxlan driver, also from Pravin. 8) When we scrub a packet, which happens when we are switching the context of the packet (namespace, etc.), we should reset the secmark. From Thomas Graf. 9) ->ndo_gso_check() needs to do more than return true/false, it also has to allow the driver to clear netdev feature bits in order for the caller to be able to proceed properly. From Jesse Gross. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) genetlink: A genl_bind() to an out-of-range multicast group should not WARN(). netlink/genetlink: pass network namespace to bind/unbind ne2k-pci: Add pci_disable_device in error handling bonding: change error message to debug message in __bond_release_one() genetlink: pass multicast bind/unbind to families netlink: call unbind when releasing socket netlink: update listeners directly when removing socket genetlink: pass only network namespace to genl_has_listeners() netlink: rename netlink_unbind() to netlink_undo_bind() net: Generalize ndo_gso_check to ndo_features_check net: incorrect use of init_completion fixup neigh: remove next ptr from struct neigh_table net: xilinx: Remove unnecessary temac_property in the driver net: phy: micrel: use generic config_init for KSZ8021/KSZ8031 net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding openvswitch: fix odd_ptr_err.cocci warnings Bluetooth: Fix accepting connections when not using mgmt Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR brcmfmac: Do not crash if platform data is not populated ipw2200: select CFG80211_WEXT ...
2014-12-30audit: create private file name copies when auditing inodesPaul Moore
Unfortunately, while commit 4a928436 ("audit: correctly record file names with different path name types") fixed a problem where we were not recording filenames, it created a new problem by attempting to use these file names after they had been freed. This patch resolves the issue by creating a copy of the filename which the audit subsystem frees after it is done with the string. At some point it would be nice to resolve this issue with refcounts, or something similar, instead of having to allocate/copy strings, but that is almost surely beyond the scope of a -rcX patch so we'll defer that for later. On the plus side, only audit users should be impacted by the string copying. Reported-by: Toralf Foerster <toralf.foerster@gmx.de> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-12-27netlink/genetlink: pass network namespace to bind/unbindJohannes Berg
Netlink families can exist in multiple namespaces, and for the most part multicast subscriptions are per network namespace. Thus it only makes sense to have bind/unbind notifications per network namespace. To achieve this, pass the network namespace of a given client socket to the bind/unbind functions. Also do this in generic netlink, and there also make sure that any bind for multicast groups that only exist in init_net is rejected. This isn't really a problem if it is accepted since a client in a different namespace will never receive any notifications from such a group, but it can confuse the family if not rejected (it's also possible to silently (without telling the family) accept it, but it would also have to be ignored on unbind so families that take any kind of action on bind/unbind won't do unnecessary work for invalid clients like that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-23Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fixes from Paul Moore: "Four patches to fix various problems with the audit subsystem, all are fairly small and straightforward. One patch fixes a problem where we weren't using the correct gfp allocation flags (GFP_KERNEL regardless of context, oops), one patch fixes a problem with old userspace tools (this was broken for a while), one patch fixes a problem where we weren't recording pathnames correctly, and one fixes a problem with PID based filters. In general I don't think there is anything controversial with this patchset, and it fixes some rather unfortunate bugs; the allocation flag one can be particularly scary looking for users" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: restore AUDIT_LOGINUID unset ABI audit: correctly record file names with different path name types audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb audit: don't attempt to lookup PIDs when changing PID filtering audit rules
2014-12-23audit: restore AUDIT_LOGINUID unset ABIRichard Guy Briggs
A regression was caused by commit 780a7654cee8: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by e1760bd) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Cc: stable@vger.kernel.org # v3.10-rc1+ Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-12-22audit: correctly record file names with different path name typesPaul Moore
There is a problem with the audit system when multiple audit records are created for the same path, each with a different path name type. The root cause of the problem is in __audit_inode() when an exact match (both the path name and path name type) is not found for a path name record; the existing code creates a new path name record, but it never sets the path name in this record, leaving it NULL. This patch corrects this problem by assigning the path name to these newly created records. There are many ways to reproduce this problem, but one of the easiest is the following (assuming auditd is running): # mkdir /root/tmp/test # touch /root/tmp/test/567 # auditctl -a always,exit -F dir=/root/tmp/test # touch /root/tmp/test/567 Afterwards, or while the commands above are running, check the audit log and pay special attention to the PATH records. A faulty kernel will display something like the following for the file creation: type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2 success=yes exit=3 ... comm="touch" exe="/usr/bin/touch" type=CWD msg=audit(1416957442.025:93): cwd="/root/tmp" type=PATH msg=audit(1416957442.025:93): item=0 name="test/" inode=401409 ... nametype=PARENT type=PATH msg=audit(1416957442.025:93): item=1 name=(null) inode=393804 ... nametype=NORMAL type=PATH msg=audit(1416957442.025:93): item=2 name=(null) inode=393804 ... nametype=NORMAL While a patched kernel will show the following: type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2 success=yes exit=3 ... comm="touch" exe="/usr/bin/touch" type=CWD msg=audit(1416955786.566:89): cwd="/root/tmp" type=PATH msg=audit(1416955786.566:89): item=0 name="test/" inode=401409 ... nametype=PARENT type=PATH msg=audit(1416955786.566:89): item=1 name="test/567" inode=393804 ... nametype=NORMAL This issue was brought up by a number of people, but special credit should go to hujianyang@huawei.com for reporting the problem along with an explanation of the problem and a patch. While the original patch did have some problems (see the archive link below), it did demonstrate the problem and helped kickstart the fix presented here. * https://lkml.org/lkml/2014/9/5/66 Reported-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com>
2014-12-20Merge tag 'pm-config-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki: "This removes the last few uses of CONFIG_PM_RUNTIME introduced recently and makes that config option finally go away. CONFIG_PM will be available directly from the menu now and also it will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION is set" * tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Eliminate CONFIG_PM_RUNTIME tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
2014-12-19audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skbRichard Guy Briggs
Eric Paris explains: Since kauditd_send_multicast_skb() gets called in audit_log_end(), which can come from any context (aka even a sleeping context) GFP_KERNEL can't be used. Since the audit_buffer knows what context it should use, pass that down and use that. See: https://lkml.org/lkml/2014/12/16/542 BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin 2 locks held by sulogin/885: #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b #1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30 Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014 ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375 ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006 0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38 Call Trace: [<ffffffff916ba529>] dump_stack+0x50/0xa8 [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be [<ffffffff910632a6>] __might_sleep+0x119/0x128 [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9 [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3 [<ffffffff914e2b62>] skb_copy+0x3e/0xa3 [<ffffffff910c263e>] audit_log_end+0x83/0x100 [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103 [<ffffffff91252a73>] common_lsm_audit+0x441/0x450 [<ffffffff9123c163>] slow_avc_audit+0x63/0x67 [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3 [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65 [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10 [<ffffffff911515e6>] install_exec_creds+0xe/0x79 [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7 [<ffffffff9115198e>] search_binary_handler+0x81/0x18c [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7 [<ffffffff91153669>] do_execve+0x1f/0x21 [<ffffffff91153967>] SyS_execve+0x25/0x29 [<ffffffff916c61a9>] stub_execve+0x69/0xa0 Cc: stable@vger.kernel.org #v3.16-rc1 Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-12-19audit: don't attempt to lookup PIDs when changing PID filtering audit rulesPaul Moore
Commit f1dc4867 ("audit: anchor all pid references in the initial pid namespace") introduced a find_vpid() call when adding/removing audit rules with PID/PPID filters; unfortunately this is problematic as find_vpid() only works if there is a task with the associated PID alive on the system. The following commands demonstrate a simple reproducer. # auditctl -D # auditctl -l # autrace /bin/true # auditctl -l This patch resolves the problem by simply using the PID provided by the user without any additional validation, e.g. no calls to check to see if the task/PID exists. Cc: stable@vger.kernel.org # 3.15 Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
2014-12-19PM: Eliminate CONFIG_PM_RUNTIMERafael J. Wysocki
Having switched over all of the users of CONFIG_PM_RUNTIME to use CONFIG_PM directly, turn the latter into a user-selectable option and drop the former entirely from the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org>
2014-12-19Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull NOHZ update from Thomas Gleixner: "Remove the call into the nohz idle code from the fake 'idle' thread in the powerclamp driver along with the export of those functions which was smuggeled in via the thermal tree. People have tried to hack around it in the nohz core code, but it just violates all rightful assumptions of that code about the only valid calling context (i.e. the proper idle task). The powerclamp trainwreck will still work, it just wont get the benefit of long idle sleeps" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/powerclamp: Remove tick_nohz_idle abuse
2014-12-19Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core fix from Thomas Gleixner: "A single fix plugging a long standing race between proc/stat and proc/interrupts access and freeing of interrupt descriptors" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent proc race against freeing of irq descriptors
2014-12-19Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes and cleanups from Ingo Molnar: "A kernel fix plus mostly tooling fixes, but also some tooling restructuring and cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) perf: Fix building warning on ARM 32 perf symbols: Fix use after free in filename__read_build_id perf evlist: Use roundup_pow_of_two tools: Adopt roundup_pow_of_two perf tools: Make the mmap length autotuning more robust tools: Adopt rounddown_pow_of_two and deps tools: Adopt fls_long and deps tools: Move bitops.h from tools/perf/util to tools/ tools: Introduce asm-generic/bitops.h tools lib: Move asm-generic/bitops/find.h code to tools/include and tools/lib tools: Whitespace prep patches for moving bitops.h tools: Move code originally from asm-generic/atomic.h into tools/include/asm-generic/ tools: Move code originally from linux/log2.h to tools/include/linux/ tools: Move __ffs implementation to tools/include/asm-generic/bitops/__ffs.h perf evlist: Do not use hard coded value for a mmap_pages default perf trace: Let the perf_evlist__mmap autosize the number of pages to use perf evlist: Improve the strerror_mmap method perf evlist: Clarify sterror_mmap variable names perf evlist: Fixup brown paper bag on "hint" for --mmap-pages cmdline arg perf trace: Provide a better explanation when mmap fails ...
2014-12-19tick/powerclamp: Remove tick_nohz_idle abuseThomas Gleixner
commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers. The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself. Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable and just papers over the real problem. There are way more subtle issues lurking around the corner. The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this. So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports. Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Pan Jacob jun <jacob.jun.pan@intel.com> Cc: LKP <lkp@01.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-18Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "The exciting thing here is the getting rid of stop_machine on module removal. This is possible by using a simple atomic_t for the counter, rather than our fancy per-cpu counter: it turns out that no one is doing a module increment per net packet, so the slowdown should be in the noise" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: param: do not set store func without write perm params: cleanup sysfs allocation kernel:module Fix coding style errors and warnings. module: Remove stop_machine from module unloading module: Replace module_ref with atomic_t refcnt lib/bug: Use RCU list ops for module_bug_list module: Unlink module with RCU synchronizing instead of stop_machine module: Wait for RCU synchronizing before releasing a module
2014-12-18Merge tag 'pm+acpi-3.19-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: "These are regression fixes (leds-gpio, ACPI backlight driver, operating performance points library, ACPI device enumeration messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device PM), some cleanups in the operating performance points (OPP) framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of minor intel_pstate driver changes, a new MAINTAINERS entry for it and an ACPI fan driver change needed for better support of thermal management in user space. Specifics: - Fix a regression in leds-gpio introduced by a recent commit that inadvertently changed the name of one of the properties used by the driver (Fabio Estevam). - Fix a regression in the ACPI backlight driver introduced by a recent fix that missed one special case that had to be taken into account (Aaron Lu). - Drop the level of some new kernel messages from the ACPI core introduced by a recent commit to KERN_DEBUG which they should have used from the start and drop some other unuseful KERN_ERR messages printed by ACPI (Rafael J Wysocki). - Revert an incorrect commit modifying the cpupower tool (Prarit Bhargava). - Fix two regressions introduced by recent commits in the OPP library and clean up some existing minor issues in that code (Viresh Kumar). - Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the tree (or drop it where that can be done) in order to make it possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf Hansson, Ludovic Desroches). There will be one more "CONFIG_PM_RUNTIME removal" batch after this one, because some new uses of it have been introduced during the current merge window, but that should be sufficient to finally get rid of it. - Make the ACPI EC driver more robust against race conditions related to GPE handler installation failures (Lv Zheng). - Prevent the ACPI device PM core code from attempting to disable GPEs that it has not enabled which confuses ACPICA and makes it report errors unnecessarily (Rafael J Wysocki). - Add a "force" command line switch to the intel_pstate driver to make it possible to override the blacklisting of some systems in that driver if needed (Ethan Zhao). - Improve intel_pstate code documentation and add a MAINTAINERS entry for it (Kristen Carlson Accardi). - Make the ACPI fan driver create cooling device interfaces witn names that reflect the IDs of the ACPI device objects they are associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B"). That's necessary for user space thermal management tools to be able to connect the fans with the parts of the system they are supposed to be cooling properly. From Srinivas Pandruvada" * tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) MAINTAINERS: add entry for intel_pstate ACPI / video: update the skip case for acpi_video_device_in_dod() power / PM: Eliminate CONFIG_PM_RUNTIME NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / EC: Fix unexpected ec_remove_handlers() invocations Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()" tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c PM: Remove the SET_PM_RUNTIME_PM_OPS() macro mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro PM / Kconfig: Replace PM_RUNTIME with PM in dependencies ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / PM: Do not disable wakeup GPEs that have not been enabled ACPI / utils: Drop error messages from acpi_evaluate_reference() ...
2014-12-18param: do not set store func without write permKees Cook
When a module_param is defined without DAC write permissions, it can still be changed at runtime and updated. Drivers using a 0444 permission may be surprised that these values can still be changed. For drivers that want to allow updates, any S_IW* flag will set the "store" function as before. Drivers without S_IW* flags will have the "store" function unset, unforcing a read-only value. Drivers that wish neither "store" nor "get" can continue to use "0" for perms to stay out of sysfs entirely. Old behavior: # cd /sys/module/snd/parameters # ls -l total 0 -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit -r--r--r-- 1 root root 4096 Dec 11 13:55 major -r--r--r-- 1 root root 4096 Dec 11 13:55 slots # cat major 116 # echo -1 > major -bash: major: Permission denied # chmod u+w major # echo -1 > major # cat major -1 New behavior: ... # chmod u+w major # echo -1 > major -bash: echo: write error: Input/output error Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace related fixes from Eric Biederman: "As these are bug fixes almost all of thes changes are marked for backporting to stable. The first change (implicitly adding MNT_NODEV on remount) addresses a regression that was created when security issues with unprivileged remount were closed. I go on to update the remount test to make it easy to detect if this issue reoccurs. Then there are a handful of mount and umount related fixes. Then half of the changes deal with the a recently discovered design bug in the permission checks of gid_map. Unix since the beginning has allowed setting group permissions on files to less than the user and other permissions (aka ---rwx---rwx). As the unix permission checks stop as soon as a group matches, and setgroups allows setting groups that can not later be dropped, results in a situtation where it is possible to legitimately use a group to assign fewer privileges to a process. Which means dropping a group can increase a processes privileges. The fix I have adopted is that gid_map is now no longer writable without privilege unless the new file /proc/self/setgroups has been set to permanently disable setgroups. The bulk of user namespace using applications even the applications using applications using user namespaces without privilege remain unaffected by this change. Unfortunately this ix breaks a couple user space applications, that were relying on the problematic behavior (one of which was tools/selftests/mount/unprivileged-remount-test.c). To hopefully prevent needing a regression fix on top of my security fix I rounded folks who work with the container implementations mostly like to be affected and encouraged them to test the changes. > So far nothing broke on my libvirt-lxc test bed. :-) > Tested with openSUSE 13.2 and libvirt 1.2.9. > Tested-by: Richard Weinberger <richard@nod.at> > Tested on Fedora20 with libvirt 1.2.11, works fine. > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Unbreak the unprivileged remount tests userns; Correct the comment in map_write userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks mnt: Clear mnt_expire during pivot_root mnt: Carefully set CL_UNPRIVILEGED in clone_mnt mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. umount: Do not allow unmounting rootfs. umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
2014-12-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc/*/ns/* weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc/*/ns/* symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns
2014-12-16Merge tag 'trace-3.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "As the merge window is still open, and this code was not as complex as I thought it might be. I'm pushing this in now. This will allow Thomas to debug his irq work for 3.20. This adds two new features: 1) Allow traceopoints to be enabled right after mm_init(). By passing in the trace_event= kernel command line parameter, tracepoints can be enabled at boot up. For debugging things like the initialization of interrupts, it is needed to have tracepoints enabled very early. People have asked about this before and this has been on my todo list. As it can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm pushing this now. This way he can add tracepoints into the IRQ set up and have users enable them when things go wrong. 2) Have the tracepoints printed via printk() (the console) when they are triggered. If the irq code locks up or reboots the box, having the tracepoint output go into the kernel ring buffer is useless for debugging. But being able to add the tp_printk kernel command line option along with the trace_event= option will have these tracepoints printed as they occur, and that can be really useful for debugging early lock up or reboot problems. This code is not that intrusive and it passed all my tests. Thomas tried them out too and it works for his needs. Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org" * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add tp_printk cmdline to have tracepoints go to printk() tracing: Move enabling tracepoints to just after rcu_init()
2014-12-15Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - AMD KFD driver merge This is the AMD HSA interface for exposing a lowlevel interface for GPGPU use. They have an open source userspace built on top of this interface, and the code looks as good as it was going to get out of tree. - Initial atomic modesetting work The need for an atomic modesetting interface to allow userspace to try and send a complete set of modesetting state to the driver has arisen, and been suffering from neglect this past year. No more, the start of the common code and changes for msm driver to use it are in this tree. Ongoing work to get the userspace ioctl finished and the code clean will probably wait until next kernel. - DisplayID 1.3 and tiled monitor exposed to userspace. Tiled monitor property is now exposed for userspace to make use of. - Rockchip drm driver merged. - imx gpu driver moved out of staging Other stuff: - core: panel - MIPI DSI + new panels. expose suggested x/y properties for virtual GPUs - i915: Initial Skylake (SKL) support gen3/4 reset work start of dri1/ums removal infoframe tracking fixes for lots of things. - nouveau: tegra k1 voltage support GM204 modesetting support GT21x memory reclocking work - radeon: CI dpm fixes GPUVM improvements Initial DPM fan control - rcar-du: HDMI support added removed some support for old boards slave encoder driver for Analog Devices adv7511 - exynos: Exynos4415 SoC support - msm: a4xx gpu support atomic helper conversion - tegra: iommu support universal plane support ganged-mode DSI support - sti: HDMI i2c improvements - vmwgfx: some late fixes. - qxl: use suggested x/y properties" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits) drm: sti: fix module compilation issue drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm: sti: correctly cleanup CRTC and planes drm: sti: add HQVDP plane drm: sti: add cursor plane drm: sti: enable auxiliary CRTC drm: sti: fix delay in VTG programming drm: sti: prepare sti_tvout to support auxiliary crtc drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off} drm: sti: fix hdmi avi infoframe drm: sti: remove event lock while disabling vblank drm: sti: simplify gdp code drm: sti: clear all mixer control drm: sti: remove gpio for HDMI hot plug detection drm: sti: allow to change hdmi ddc i2c adapter drm/doc: Document drm_add_modes_noedid() usage drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm: Zero out DRM object memory upon cleanup drm/i915/bdw: Fix the write setting up the WIZ hashing mode ...
2014-12-15tracing: Add tp_printk cmdline to have tracepoints go to printk()Steven Rostedt (Red Hat)
Add the kernel command line tp_printk option that will have tracepoints that are active sent to printk() as well as to the trace buffer. Passing "tp_printk" will activate this. To turn it off, the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note, this only works if the cmdline option is used. Echoing 1 into the sysctl file without the cmdline option will have no affect. Note, this is a dangerous option. Having high frequency tracepoints send their data to printk() can possibly cause a live lock. This is another reason why this is only active if the command line option is used. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15tracing: Move enabling tracepoints to just after rcu_init()Steven Rostedt (Red Hat)
Enabling tracepoints at boot up can be very useful. The tracepoint can be initialized right after RCU has been. There's no need to wait for the early_initcall() to be called. That's too late for some things that can use tracepoints for debugging. Move the logic to enable tracepoints out of the initcalls and into init/main.c to right after rcu_init(). This also allows trace_printk() to be used early too. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-14Merge tag 'tty-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here's the big tty/serial driver update for 3.19-rc1. There are a number of TTY core changes/fixes in here from Peter Hurley that have all been teted in linux-next for a long time now. There are also the normal serial driver updates as well, full details in the changelog below" * tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits) serial: pxa: hold port.lock when reporting modem line changes tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put" tty: Deletion of unnecessary checks before two function calls n_tty: Fix read_buf race condition, increment read_head after pushing data serial: of-serial: add PM suspend/resume support Revert "serial: of-serial: add PM suspend/resume support" Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type" serial: 8250: don't attempt a trylock if in sysrq serial: core: Add big-endian iotype serial: samsung: use port->fifosize instead of hardcoded values serial: samsung: prefer to use fifosize from driver data serial: samsung: fix style problems serial: samsung: wait for transfer completion before clock disable serial: icom: fix error return code serial: tegra: clean up tty-flag assignments serial: Fix io address assign flow with Fintek PCI-to-UART Product serial: mxs-auart: fix tx_empty against shift register serial: mxs-auart: fix gpio change detection on interrupt serial: mxs-auart: Fix mxs_auart_set_ldisc() serial: 8250_dw: Use 64-bit access for OCTEON. ...
2014-12-13Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...
2014-12-13Merge tag 'trace-seq-buf-3.19-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixlet from Steven Rostedt: "Remove unnecessary preempt_disable in printk()" * tag 'trace-seq-buf-3.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk: Do not disable preemption for accessing printk_func
2014-12-13Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit updates from Paul Moore: "Two small patches from the audit next branch; only one of which has any real significant code changes, the other is simply a MAINTAINERS update for audit. The single code patch is pretty small and rather straightforward, it changes the audit "version" number reported to userspace from an integer to a bitmap which is used to indicate the functionality of the running kernel. This really doesn't have much impact on the kernel, but it will make life easier for the audit userspace folks. Thankfully we were still on a version number which allowed us to do this without breaking userspace" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: convert status version to a feature bitmap audit: add Paul Moore to the MAINTAINERS entry
2014-12-13fsnotify: unify inode and mount marks handlingJan Kara
There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13gcov: enable GCOV_PROFILE_ALL from ARCH KconfigsRiku Voipio
Following the suggestions from Andrew Morton and Stephen Rothwell, Dont expand the ARCH list in kernel/gcov/Kconfig. Instead, define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures can enable. set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was previously allowed + ARM64 which I tested. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13kexec: remove unnecessary KERN_ERR from kexec.cMasanari Iida
Remove unnecessary KERN_ERR from pr_err() within kexec.c. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13syscalls: implement execveat() system callDavid Drysdale
This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13stacktrace: introduce snprint_stack_trace for buffer outputJoonsoo Kim
Current stacktrace only have the function for console output. page_owner that will be introduced in following patch needs to print the output of stacktrace into the buffer for our own output format so so new function, snprint_stack_trace(), is needed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13uprobes: share the i_mmap_rwsemDavidlohr Bueso
Both register and unregister call build_map_info() in order to create the list of mappings before installing or removing breakpoints for every mm which maps file backed memory. As such, there is no reason to hold the i_mmap_rwsem exclusively, so share it and allow concurrent readers to build the mapping data. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm: convert i_mmap_mutex to rwsemDavidlohr Bueso
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm: use new helper functions around the i_mmap_mutexDavidlohr Bueso
Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13genirq: Prevent proc race against freeing of irq descriptorsThomas Gleixner
Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor. CPU0 CPU1 show_interrupts() desc = irq_to_desc(X); free_desc(desc) remove_from_radix_tree(); kfree(desc); raw_spinlock_irq(&desc->lock); /proc/interrupts is the only interface which can actively corrupt kernel memory via the lock access. /proc/stat can only read from freed memory. Extremly hard to trigger, but possible. The interfaces in /proc/irq/N/ are not affected by this because the removal of the proc file is serialized in procfs against concurrent readers/writers. The removal happens before the descriptor is freed. For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue as the descriptor is never freed. It's merely cleared out with the irq descriptor lock held. So any concurrent proc access will either see the old correct value or the cleared out ones. Protect the lookup and access to the irq descriptor in show_interrupts() with the sparse_irq_lock. Provide kstat_irqs_usr() which is protecting the lookup and access with sparse_irq_lock and switch /proc/stat to use it. Document the existing kstat_irqs interfaces so it's clear that the caller needs to take care about protection. The users of these interfaces are either not affected due to SPARSE_IRQ=n or already protected against removal. Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2014-12-13tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so files that are build conditionally if CONFIG_PM_RUNTIME is set may now be build if CONFIG_PM is set. Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile for this reason. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org.
2014-12-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...
2014-12-12Merge branch 'linus' into perf/urgent, to pick up the upstream merged bitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11Merge branch 'for-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup update from Tejun Heo: "cpuset got simplified a bit. cgroup core got a fix on unified hierarchy and grew some effective css related interfaces which will be used for blkio support for writeback IO traffic which is currently being worked on" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: implement cgroup_get_e_css() cgroup: add cgroup_subsys->css_e_css_changed() cgroup: add cgroup_subsys->css_released() cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cpuset: lock vs unlock typo cpuset: simplify cpuset_node_allowed API cpuset: convert callback_mutex to a spinlock
2014-12-11Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue update from Tejun Heo: "Work items which may be involved in memory reclaim path may be executed by the rescuer under memory pressure. When a rescuer gets activated, it processes whatever are on the pending list and then goes back to sleep until the manager kicks it again which involves 100ms delay. This is problematic for self-requeueing work items or the ones running on ordered workqueues as there always is only one work item on the pending list when the rescuer kicks in. The execution of that work item produces more to execute but the rescuer won't see them until after the said 100ms has passed, so such workqueues would only execute one work item every 100ms under prolonged memory pressure, which BTW may be being prolonged due to the slow execution. Neil wrote up a patch which fixes this issue by keeping the rescuer working as long as the target workqueue is busy but doesn't have enough workers" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: allow rescuer thread to do more work. workqueue: invert the order between pool->lock and wq_mayday_lock workqueue: cosmetic update in rescuer_thread()
2014-12-11Merge branch 'for-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "Nothing interesting. A patch to convert the remaining __get_cpu_var() users, another to fix non-critical off-by-one in an assertion and a cosmetic conversion to lockless_dereference() in percpu-ref. The back-merge from mainline is to receive lockless_dereference()" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Replace smp_read_barrier_depends() with lockless_dereference() percpu: Convert remaining __get_cpu_var uses in 3.18-rcX percpu: off by one in BUG_ON()
2014-12-12Merge tag 'drm-intel-next-fixes-2014-12-11' of ↵Dave Airlie
git://anongit.freedesktop.org/drm-intel into drm-next Here's a batch of i915 fixes for 3.19. * tag 'drm-intel-next-fixes-2014-12-11' of git://anongit.freedesktop.org/drm-intel: drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm/i915/bdw: Fix the write setting up the WIZ hashing mode drm/i915: Don't complain about stolen conflicts on gen3 drm/i915: resume MST after reading back hw state drm/i915: Handle inaccurate time conversion issues drm/i915: compute wait_ioctl timeout correctly drm/i915: don't always do full mode sets when infoframes are enabled
2014-12-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The most notable change for this pull request is the ftrace rework from Heiko. It brings a small performance improvement and the ground work to support a new gcc option to replace the mcount blocks with a single nop. Two new s390 specific system calls are added to emulate user space mmio for PCI, an artifact of the how PCI memory is accessed. Two patches for the memory management with changes to common code. For KVM mm_forbids_zeropage is added which disables the empty zero page for an mm that is used by a KVM process. And an optimization, pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full. Some micro optimization for the cmpxchg and the spinlock code. And as usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits) s390/cputime: fix 31-bit compile s390/scm_block: make the number of reqs per HW req configurable s390/scm_block: handle multiple requests in one HW request s390/scm_block: allocate aidaw pages only when necessary s390/scm_block: use mempool to manage aidaw requests s390/eadm: change timeout value s390/mm: fix memory leak of ptlock in pmd_free_tlb s390: use local symbol names in entry[64].S s390/ptrace: always include vector registers in core files s390/simd: clear vector register pointer on fork/clone s390: translate cputime magic constants to macros s390/idle: convert open coded idle time seqcount s390/idle: add missing irq off lockdep annotation s390/debug: avoid function call for debug_sprintf_* s390/kprobes: fix instruction copy for out of line execution s390: remove diag 44 calls from cpu_relax() s390/dasd: retry partition detection s390/dasd: fix list corruption for sleep_on requests s390/dasd: fix infinite term I/O loop s390/dasd: remove unused code ...
2014-12-11userns; Correct the comment in map_writeEric W. Biederman
It is important that all maps are less than PAGE_SIZE or else setting the last byte of the buffer to '0' could write off the end of the allocated storage. Correct the misleading comment. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-11userns: Allow setting gid_maps without privilege when setgroups is disabledEric W. Biederman
Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-11userns: Add a knob to disable setgroups on a per user namespace basisEric W. Biederman
- Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...
2014-12-11printk: Do not disable preemption for accessing printk_funcSteven Rostedt (Red Hat)
As printk_func will either be the default function, or a per_cpu function for the current CPU, there's no reason to disable preemption to access it from printk. That's because if the printk_func is not the default then the caller had better disabled preemption as they were the one to change it. Link: http://lkml.kernel.org/r/CA+55aFz5-_LKW4JHEBoWinN9_ouNcGRWAF2FUA35u46FRN-Kxw@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>