Age | Commit message (Collapse) | Author |
|
[ Upstream commit eb70db8756717b90c01ccc765fdefc4dd969fc74 ]
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream.
Also simplify the logic a bit.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c5ea0a9cd02d6aa8adc86e100b2a4cff8d614ff upstream.
The following scenario is possible:
CPU 1 CPU 2
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 0, no increment
jump_label_lock()
atomic_inc_return()
-> key.enabled == 1 now
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 1, inc to 2
return
** static key is wrong!
jump_label_update()
jump_label_unlock()
Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet. This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/kvm.h>
int main(void)
{
for (;;) {
int kvmfd = open("/dev/kvm", O_RDONLY);
int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
close(vmfd);
close(kvmfd);
}
return 0;
}
Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.
As explained in the commit that introduced the bug:
706249c222f6 ("locking/static_keys: Rework update logic")
jump_label_update() needs key.enabled to be true. The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled <= 0.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 706249c222f6 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream.
The HOSTPC extension registers found in some EHCI implementations form
a variable-length array, with one element for each port. Therefore
the hostpc field in struct ehci_regs should be declared as a
zero-length array, not a single-element array.
This fixes a problem reported by UBSAN.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3697649ff29e0f647565eed04b27a7779c646a22 ]
When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().
Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ceb56070359b7329b5678b5d95a376fcb24767be ]
Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).
Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9a0fee2b552b1235fb1706ae1fc664ae74573be8 ]
Diag intends to broadcast tcp_sk and udp_sk socket destruction.
Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
sufficient for this. Raw sockets can have the same type.
Add a test for sk->sk_type.
Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 ]
The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.
However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.
Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream.
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream.
Always returned 0.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream.
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream.
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream.
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream.
The INTID mask is wrong, and is made a signed value, which has
nteresting effects in the KVM emulation. Let's sanitize it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 759c01142a5d0f364a462346168a56de28a80f52 upstream.
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.
This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.
The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).
Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Moritz Muehlenhoff <moritz@wikimedia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 upstream.
Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
memblock region") the reserved bit is set on reserved memblock regions.
However start and end address are passed as unsigned long. This is only
32bit on i386, so it can end up marking the wrong pages reserved for
ranges at 4GB and above.
This was observed on a 32bit Xen dom0 which was booted with initial
memory set to a value below 4G but allowing to balloon in memory
(dom0_mem=1024M for example). This would define a reserved bootmem
region for the additional memory (for example on a 8GB system there was
a reverved region covering the 4GB-8GB range). But since the addresses
were passed on as unsigned long, this was actually marking all pages
from 0 to 4GB as reserved.
Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region")
Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca9eb49aa9562eaadf3cea071ec7018ad6800425 upstream.
The generic copy_siginfo() is currently defined in
asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which
defines the generic struct siginfo. However this makes it awkward for an
architecture to use it if it has to define its own struct siginfo (e.g.
MIPS and potentially IA64), since it means that asm-generic/siginfo.h
can only be included after defining the arch-specific siginfo, which may
be problematic if the arch-specific definition needs definitions from
uapi/asm-generic/siginfo.h.
It is possible to work around this by first including
uapi/asm-generic/siginfo.h to get the constants before defining the
arch-specific siginfo, and include asm-generic/siginfo.h after. However
uapi headers can't be included by other uapi headers, so that first
include has to be in an ifdef __kernel__, with the non __kernel__ case
including the non-UAPI header instead.
Instead of that mess, move the generic copy_siginfo() definition into
linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to
include asm-generic/siginfo.h and define the arch-specific siginfo, and
for the generic copy_siginfo() to see that arch-specific definition.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Petr Malat <oss@malat.biz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Christopher Ferris <cferris@google.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12478/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream.
OpenSSH expects the (non-blocking) read() of pty master to return
EAGAIN only if it has received all of the slave-side output after
it has received SIGCHLD. This used to work on pre-3.12 kernels.
This fix effectively forces non-blocking read() and poll() to
block for parallel i/o to complete for all ttys. It also unwinds
these changes:
1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
tty: Fix pty master read() after slave closes
2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
pty, n_tty: Simplify input processing on final close
3) 1a48632ffed61352a7810ce089dc5a8bcd505a60
pty: Fix input race when closing
Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net>
Reported-by: Volth <openssh@volth.com>
Reported-by: Marc Aurele La France <tsi@tuyoix.net>
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6fb650d43da3e7054984dc548eaa88765a94d49f upstream.
When a USB driver is bound to an interface (either through probing or
by claiming it) or is unbound from an interface, the USB core always
disables Link Power Management during the transition and then
re-enables it afterward. The reason is because the driver might want
to prevent hub-initiated link power transitions, in which case the HCD
would have to recalculate the various LPM parameters. This
recalculation takes place when LPM is re-enabled and the new
parameters are sent to the device and its parent hub.
However, if the driver does not want to prevent hub-initiated link
power transitions then none of this work is necessary. The parameters
don't need to be recalculated, and LPM doesn't need to be disabled and
re-enabled.
It turns out that disabling and enabling LPM can be time-consuming,
enough so that it interferes with user programs that want to claim and
release interfaces rapidly via usbfs. Since the usbfs kernel driver
doesn't set the disable_hub_initiated_lpm flag, we can speed things up
and get the user programs to work by leaving LPM alone whenever the
flag isn't set.
And while we're improving the way disable_hub_initiated_lpm gets used,
let's also fix its kerneldoc.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Matthew Giassa <matthew@giassa.net>
CC: Mathias Nyman <mathias.nyman@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bb208f144cf3f59d8f89a09a80efd04389718907 upstream.
As described in 'can: m_can: tag current CAN FD controllers as non-ISO'
(6cfda7fbebe) it is possible to define fixed configuration options by
setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'.
This leads to the incovenience that the fixed configuration bits can not be
passed by netlink even when they have the correct values (e.g. non-ISO, FD).
This patch fixes that issue and not only allows fixed set bit values to be set
again but now requires(!) to provide these fixed values at configuration time.
A valid CAN FD configuration consists of a nominal/arbitration bittiming, a
data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now
enforced by a new can_validate() function. This fix additionally removed the
inconsistency that was prohibiting the support of 'CANFD-only' controller
drivers, like the RCar CAN FD.
For this reason a new helper can_set_static_ctrlmode() has been introduced to
provide a proper interface to handle static enabled CAN controller options.
Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3b672623079bb3e5685b8549e514f2dfaa564406 upstream.
The buck9 regulator of S2MPS11 PMIC had incorrect vsel_mask (0xff
instead of 0x1f) thus reading entire register as buck9's voltage. This
effectively caused regulator core to interpret values as higher voltages
than they were and then to set real voltage much lower than intended.
The buck9 provides power to other regulators, including LDO13
and LDO19 which supply the MMC2 (SD card). On Odroid XU3/XU4 the lower
voltage caused SD card detection errors on Odroid XU3/XU4:
mmc1: card never left busy state
mmc1: error -110 whilst initialising SD card
During driver probe the regulator core was checking whether initial
voltage matches the constraints. With incorrect vsel_mask of 0xff and
default value of 0x50, the core interpreted this as 5 V which is outside
of constraints (3-3.775 V). Then the regulator core was adjusting the
voltage to match the constraints. With incorrect vsel_mask this new
voltage mapped to a vere low voltage in the driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7 upstream.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2c94b53738549d81dc7464a32117d1f5112c64d3 ]
Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op
when !DEBUG") the implementation of net_dbg_ratelimited() was buggy
for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases.
The bug was that net_ratelimit() was being called and, despite
returning true, nothing was being printed to the console. This
resulted in messages like the following -
"net_ratelimit: %d callbacks suppressed"
with no other output nearby.
After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when
!DEBUG") the bug is fixed for the DEBUG case. However, there's no
output at all for CONFIG_DYNAMIC_DEBUG case.
This patch restores debug output (if enabled) for the
CONFIG_DYNAMIC_DEBUG case.
Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG
case. The implementation takes care to check that dynamic debugging is
enabled before calling net_ratelimit().
Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG")
Signed-off-by: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 92117d8443bc5afacc8d5ba82e541946310f106e ]
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
Impose 32k hard limit which means that the same bpf program or
map cannot be shared by more than 32k processes.
Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 046339eaab26804f52f6604877f5674f70815b26 ]
For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 689de1d6ca95b3b5bd8ee446863bf81a4883ea25 upstream.
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.
In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much. In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.
There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem. It
simply picks a much better constant multiplier, spreading the bits out a
lot better.
NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive. The bigger hashing
cleanup will replace the 32-bit case with something better.
The new constants were picked by George Spelvin who wrote that bigger
cleanup series. I just picked out the constants and part of the comment
from that series.
Cc: George Spelvin <linux@horizon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 50359819794b4a16ae35051cd80f2dab025f6019 upstream.
Commit e6d5e7d90be9 ("clk-divider: Fix READ_ONLY when divider > 1") removed
the special ops struct for read-only clocks and instead opted to handle
them inside the regular ops.
On the rk3368 this results in breakage as aclkm now gets set a value.
While it is the same divider value, the A53 core still doesn't like it,
which can result in the cpu ending up in a hang.
The reason being that "ACLKENMasserts one clock cycle before the rising
edge of ACLKM" and the clock should only be touched when STANDBYWFIL2
is asserted.
To fix this, reintroduce the read-only ops but do include the round_rate
callback. That way no writes that may be unsafe are done to the divider
register in any case.
The Rockchip use of the clk_divider_ops is adapted to this split again,
as is the nxp, lpc18xx-ccu driver that was included since the original
commit. On lpc18xx-ccu the divider seems to always be read-only
so only uses the new ops now.
Fixes: e6d5e7d90be9 ("clk-divider: Fix READ_ONLY when divider > 1")
Reported-by: Zhang Qing <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 28093f9f34cedeaea0f481c58446d9dac6dd620f upstream.
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
cgroup_subsys->post_attach callback
commit 5cf1cacb49aee39c3e02ae87068fc3c6430659b0 upstream.
Since e93ad19d0564 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write(). This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.
memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function. This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.
The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex. This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished. cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem. We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 986ef95ecdd3eb6fa29433e68faa94c7624083be upstream.
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)
So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 67b4eab91caf2ad574cab1b17ae09180ea2e116e upstream.
Revert 811a4e6fce09 ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").
This is part of reverting 991de2e59090 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d101a125954eae1d397adda94ca6319485a50493 upstream.
This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").
Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.
This confuses filesystems which get the dentry from struct file and assume
it's theirs.
Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file. This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).
In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).
The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.
[*] If possible, it's better simply to use file_inode() instead.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1363074667a6b7d0507527742ccd7bbed5e3ceaa upstream.
Add a new NO_REPORT_LUNS quirk and set it for Seagate drives with
an usb-id of: 0bc2:331a, as these will fail to respond to a
REPORT_LUNS command.
Reported-and-tested-by: David Webb <djw@noc.ac.uk>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5a5abb1fa3b05dd6aa821525832644c1e7d2905f ]
Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:
[ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
[ 52.765688] other info that might help us debug this:
[ 52.765695] rcu_scheduler_active = 1, debug_locks = 1
[ 52.765701] 1 lock held by a.out/1525:
[ 52.765704] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
[ 52.765721] stack backtrace:
[ 52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
[...]
[ 52.765768] Call Trace:
[ 52.765775] [<ffffffff813e488d>] dump_stack+0x85/0xc8
[ 52.765784] [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
[ 52.765792] [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
[ 52.765801] [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
[ 52.765810] [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
[ 52.765818] [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
[ 52.765827] [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
[ 52.765834] [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
[ 52.765843] [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
[ 52.765850] [<ffffffff81261519>] SyS_ioctl+0x79/0x90
[ 52.765858] [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
[ 52.765866] [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
Since the fix in f91ff5b9ff52 ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.
Since its introduction in 994051625981 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.
Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4c656c13b254d598e83e586b7b4d36a2043dad85 ]
This fixes a regression in the bridge ageing time caused by:
commit c62987bbd8a1 ("bridge: push bridge setting ageing_time down to switchdev")
There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2793a23aacbd754dbbb5cb75093deb7e4103bace ]
Netdevice parameter hard_header_len is variously interpreted both as
an upper and lower bound on link layer header length. The field is
used as upper bound when reserving room at allocation, as lower bound
when validating user input in PF_PACKET.
Clarify the definition to be maximum header length. For validation
of untrusted headers, add an optional validate member to header_ops.
Allow bypassing of validation by passing CAP_SYS_RAWIO, for instance
for deliberate testing of corrupt input. In this case, pad trailing
bytes, as some device drivers expect completely initialized headers.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1837b2e2bcd23137766555a63867e649c0b637f0 ]
The current reserved_tailroom calculation fails to take hlen and tlen into
account.
skb:
[__hlen__|__data____________|__tlen___|__extra__]
^ ^
head skb_end_offset
In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:
[__hlen__|__data____________|__extra__|__tlen___]
^ ^
head skb_end_offset
The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.
The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen
Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.
Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9b368814b336b0a1a479135eb2815edbc00efd3c ]
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:
[...]
[ 43.986968] eth0: hw csum failure
[ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[ 43.996193] Hardware name: BCM2709
[ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14)
[ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90)
[ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac)
[ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178)
[ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188)
[ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338)
[ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00)
[ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c)
[...]
Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 95272c29378ee7dc15f43fa2758cb28a5913a06d upstream.
-ftracer can duplicate asm blocks causing compilation to fail in
noclone functions. For example, KVM declares a global variable
in an asm like
asm("2: ... \n
.pushsection data \n
.global vmx_return \n
vmx_return: .long 2b");
and -ftracer causes a double declaration.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: kvm@vger.kernel.org
Reported-by: Linda Walsh <lkml@tlinx.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream.
The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).
If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.
Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 378c6520e7d29280f400ef2ceaf155c86f05a71a upstream.
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:
- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)
Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.
To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b021cbf3cb6208f0d40fd2f1869f237934340ed upstream.
Before 2e91fa7f6d45 ("cgroup: keep zombies associated with their
original cgroups"), all dead tasks were associated with init_css_set.
If a zombie task is requested for migration, while migration prep
operations would still be performed on init_css_set, the actual
migration would ignore zombie tasks. As init_css_set is always valid,
this worked fine.
However, after 2e91fa7f6d45, zombie tasks stay with the css_set it was
associated with at the time of death. Let's say a task T associated
with cgroup A on hierarchy H-1 and cgroup B on hiearchy H-2. After T
becomes a zombie, it would still remain associated with A and B. If A
only contains zombie tasks, it can be removed. On removal, A gets
marked offline but stays pinned until all zombies are drained. At
this point, if migration is initiated on T to a cgroup C on hierarchy
H-2, migration path would try to prepare T's css_set for migration and
trigger the following.
WARNING: CPU: 0 PID: 1576 at kernel/cgroup.c:474 cgroup_get+0x121/0x160()
CPU: 0 PID: 1576 Comm: bash Not tainted 4.4.0-work+ #289
...
Call Trace:
[<ffffffff8127e63c>] dump_stack+0x4e/0x82
[<ffffffff810445e8>] warn_slowpath_common+0x78/0xb0
[<ffffffff810446d5>] warn_slowpath_null+0x15/0x20
[<ffffffff810c33e1>] cgroup_get+0x121/0x160
[<ffffffff810c349b>] link_css_set+0x7b/0x90
[<ffffffff810c4fbc>] find_css_set+0x3bc/0x5e0
[<ffffffff810c5269>] cgroup_migrate_prepare_dst+0x89/0x1f0
[<ffffffff810c7547>] cgroup_attach_task+0x157/0x230
[<ffffffff810c7a17>] __cgroup_procs_write+0x2b7/0x470
[<ffffffff810c7bdc>] cgroup_tasks_write+0xc/0x10
[<ffffffff810c4790>] cgroup_file_write+0x30/0x1b0
[<ffffffff811c68fc>] kernfs_fop_write+0x13c/0x180
[<ffffffff81151673>] __vfs_write+0x23/0xe0
[<ffffffff81152494>] vfs_write+0xa4/0x1a0
[<ffffffff811532d4>] SyS_write+0x44/0xa0
[<ffffffff814af2d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
It doesn't make sense to prepare migration for css_sets pointing to
dead cgroups as they are guaranteed to contain only zombies which are
ignored later during migration. This patch makes cgroup destruction
path mark all affected css_sets as dead and updates the migration path
to ignore them during preparation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f33798deecbd59a2955f40ac0ae2bc7dff54c069 upstream.
commit 9ce119f318ba ("tty: Fix GPF in flush_to_ldisc()") fixed a
GPF caused by a line discipline which does not define a receive_buf()
method.
However, the vt driver (and speakup driver also) pushes selection
data directly to the line discipline receive_buf() method via
tty_ldisc_receive_buf(). Fix the same problem in tty_ldisc_receive_buf().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4df2bf466a9c9c92f40d27c4aa9120f4e8227bfc upstream.
Otherwise loading a "snapshot" table using the same device for the
origin and COW devices, e.g.:
echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap
will trigger:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1958.989655] PGD 0
[ 1958.991903] Oops: 0000 [#1] SMP
...
[ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G IO 4.5.0-rc5.snitm+ #150
...
[ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000
[ 1959.091865] RIP: 0010:[<ffffffffa040efba>] [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.104295] RSP: 0018:ffff88032a957b30 EFLAGS: 00010246
[ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00
[ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001
[ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80
[ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80
[ 1959.150021] FS: 00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000
[ 1959.159047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0
[ 1959.173415] Stack:
[ 1959.175656] ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc
[ 1959.183946] ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000
[ 1959.192233] ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf
[ 1959.200521] Call Trace:
[ 1959.203248] [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot]
[ 1959.211986] [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot]
[ 1959.219469] [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod]
[ 1959.226656] [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod]
[ 1959.234328] [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod]
[ 1959.241129] [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
[ 1959.248607] [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod]
[ 1959.255307] [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20
[ 1959.261816] [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 1959.268615] [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0
[ 1959.274637] [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100
[ 1959.281726] [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 1959.288814] [<ffffffff81216449>] SyS_ioctl+0x79/0x90
[ 1959.294450] [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71
...
[ 1959.323277] RIP [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.333090] RSP <ffff88032a957b30>
[ 1959.336978] CR2: 0000000000000098
[ 1959.344121] ---[ end trace b049991ccad1169e ]---
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899
Signed-off-by: Ding Xiang <dingxiang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b84106b4e2290c081cdab521fa832596cdfea246 upstream.
The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.
Some non-spec-compliant devices put registers other than BARs where the
BARs should be. When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.
Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81ad4276b505e987dd8ebbdf63605f92cd172b52 upstream.
In some cases, platform thermal driver may report invalid trip points,
thermal core should not take any action for these trip points.
This fixed a regression that bogus trip point starts to screw up thermal
control on some Lenovo laptops, after
commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461
Author: Zhang Rui <rui.zhang@intel.com>
Date: Fri Oct 30 16:31:47 2015 +0800
Thermal: initialize thermal zone device correctly
After thermal zone device registered, as we have not read any
temperature before, thus tz->temperature should not be 0,
which actually means 0C, and thermal trend is not available.
In this case, we need specially handling for the first
thermal_zone_device_update().
Both thermal core framework and step_wise governor is
enhanced to handle this. And since the step_wise governor
is the only one that uses trends, so it's the only thermal
governor that needs to be updated.
Tested-by: Manuel Krause <manuelkrause@netscape.net>
Tested-by: szegad <szegadlo@poczta.onet.pl>
Tested-by: prash <prash.n.rao@gmail.com>
Tested-by: amish <ammdispose-arch@yahoo.com>
Tested-by: Matthias <morpheusxyz123@yahoo.de>
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1317190
Link: https://bugzilla.kernel.org/show_bug.cgi?id=114551
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9a1a743818ea3265abf98f9431623afa8c50c86 upstream.
ARM64 allmodconfig produces a bunch of warnings when building the
samsung ASoC code:
sound/soc/samsung/dmaengine.c: In function 'samsung_asoc_init_dma_data':
sound/soc/samsung/dmaengine.c:53:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
playback_data->filter_data = (void *)playback->channel;
sound/soc/samsung/dmaengine.c:60:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
capture_data->filter_data = (void *)capture->channel;
We could easily shut up the warning by adding an intermediate cast,
but there is a bigger underlying problem: The use of IORESOURCE_DMA
to pass data from platform code to device drivers is dubious to start
with, as what we really want is a pointer that can be passed into
a filter function.
Note that on s3c64xx, the pl08x DMA data is already a pointer, but
gets cast to resource_size_t so we can pass it as a resource, and it
then gets converted back to a pointer. In contrast, the data we pass
for s3c24xx is an index into a device specific table, and we artificially
convert that into a pointer for the filter function.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90d0f0f11588ec692c12f9009089b398be395184 upstream.
For !BIO_CLONED bio, we can use .bi_vcnt safely, but it
doesn't mean we can just simply return .bi_io_vec[.bi_vcnt - 1]
because the start postion may have been moved in the middle of
the bvec, such as splitting in the middle of bvec.
Fixes: 7bcd79ac50d9(block: bio: introduce helpers to get the 1st and last bvec)
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dc17147de328a74bbdee67c1bf37d2f1992de756 upstream.
Commit f37755490fe9b ("tracepoints: Do not trace when cpu is offline") added
a check to make sure that tracepoints only get called when the cpu is
online, as it uses rcu_read_lock_sched() for protection.
Commit 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints
are disabled") added lockdep checks (including rcu checks) for events that
are not enabled to catch possible RCU issues that would only be triggered if
a trace event was enabled. Commit f37755490fe9b only stopped the warnings
when the trace event was enabled but did not prevent warnings if the trace
event was called when disabled.
To fix this, the cpu online check is moved to where the condition is added
to the trace event. This will place the cpu online check in all places that
it may be used now and in the future.
Fixes: f37755490fe9b ("tracepoints: Do not trace when cpu is offline")
Fixes: 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints are disabled")
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8244062ef1e54502ef55f54cced659913f244c3e upstream.
For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
There's one full copy, marked SHF_ALLOC and laid out at the end of the
module's init section. There's also a cut-down version that only
contains core symbols and strings, and lives in the module's core
section.
After module init (and before we free the module memory), we switch
the mod->symtab, mod->num_symtab and mod->strtab to point to the core
versions. We do this under the module_mutex.
However, kallsyms doesn't take the module_mutex: it uses
preempt_disable() and rcu tricks to walk through the modules, because
it's used in the oops path. It's also used in /proc/kallsyms.
There's nothing atomic about the change of these variables, so we can
get the old (larger!) num_symtab and the new symtab pointer; in fact
this is what I saw when trying to reproduce.
By grouping these variables together, we can use a
carefully-dereferenced pointer to ensure we always get one or the
other (the free of the module init section is already done in an RCU
callback, so that's safe). We allocate the init one at the end of the
module init section, and keep the core one inside the struct module
itself (it could also have been allocated at the end of the module
core, but that's probably overkill).
[ Rebased for 4.4-stable and older, because the following changes aren't
in the older trees:
- e0224418516b4d8a6c2160574bac18447c354ef0: adds arg to is_core_symbol
- 7523e4dc5057e157212b4741abd6256e03404cf1: module_init/module_core/init_size/core_size
become init_layout.base/core_layout.base/init_layout.size/core_layout.size.
]
Reported-by: Weilong Chen <chenweilong@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|