Age | Commit message (Collapse) | Author |
|
[ Upstream commit dc17147de328a74bbdee67c1bf37d2f1992de756 ]
Commit f37755490fe9b ("tracepoints: Do not trace when cpu is offline") added
a check to make sure that tracepoints only get called when the cpu is
online, as it uses rcu_read_lock_sched() for protection.
Commit 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints
are disabled") added lockdep checks (including rcu checks) for events that
are not enabled to catch possible RCU issues that would only be triggered if
a trace event was enabled. Commit f37755490fe9b only stopped the warnings
when the trace event was enabled but did not prevent warnings if the trace
event was called when disabled.
To fix this, the cpu online check is moved to where the condition is added
to the trace event. This will place the cpu online check in all places that
it may be used now and in the future.
Cc: stable@vger.kernel.org # v3.18+
Fixes: f37755490fe9b ("tracepoints: Do not trace when cpu is offline")
Fixes: 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints are disabled")
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit cb150b9d23be6ee7f3a0fff29784f1c5b5ac514d ]
Since cfg80211 frequently takes actions from its netdev notifier
call, wireless extensions messages could still be ordered badly
since the wext netdev notifier, since wext is built into the
kernel, runs before the cfg80211 netdev notifier. For example,
the following can happen:
5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
5: wlan1: <BROADCAST,MULTICAST,UP>
link/ether
when setting the interface down causes the wext message.
To also fix this, export the wireless_nlevent_flush() function
and also call it from the cfg80211 notifier.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit a528aca7f359f4b0b1d72ae406097e491a5ba9ea ]
Games with ordering and barriers are way too brittle. Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit a7c6f571ff51cc77d90dd54968f7c5c938c43998 ]
I'll shortly be introducing another seqcount primitive that's useful
to provide ordering semantics and would like to use the
write_seqcount_barrier() name for that.
Seeing how there's only one user of the current primitive, lets rename
it to invalidate, as that appears what its doing.
While there, employ lockdep_assert_held() instead of
assert_spin_locked() to not generate debug code for regular kernels.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: wanpeng.li@linux.intel.com
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 4ee34ea3a12396f35b26d90a094c75db95080baa ]
The id buffer in ata_device is a DMA target, but it isn't explicitly
cacheline aligned. Due to this, adjacent fields can be overwritten with
stale data from memory on non coherent architectures. As a result, the
kernel is sometimes unable to communicate with an ATA device.
Fix this by ensuring that the id buffer is cacheline aligned.
This issue is similar to that fixed by Commit 84bda12af31f
("libata: align ap->sector_buf").
Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 2.6.18
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 287e6611ab1eac76c2c5ebf6e345e04c80ca9c61 ]
As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not
work correctly in compat mode with libata.
I have investigated the issue further and found multiple problems
that all appeared with the same commit that originally introduced
HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably
also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy
a 'long' variable containing either 0 or 1 to user space.
The problems with this are:
* On big-endian machines, this will always write a zero because it
stores the wrong byte into user space.
* In compat mode, the upper three bytes of the variable are updated
by the compat_hdio_ioctl() function, but they now contain
uninitialized stack data.
* The hdparm tool calling this ioctl uses a 'static long' variable
to store the result. This means at least the upper bytes are
initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT
would fill them with data that remains stale when the low byte
is overwritten. Fortunately libata doesn't implement any of the
affected ioctl commands, so this would only happen when we query
both an IDE and an ATA device in the same command such as
"hdparm -N -c /dev/hda /dev/sda"
* The libata code for unknown reasons started using ATA_IOC_GET_IO32
and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT,
while the ioctl commands that were added later use the normal
HDIO_* names. This is harmless but rather confusing.
This addresses all four issues by changing the code to use put_user()
on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem
does, and by clarifying the names of the ioctl commands.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Soohoon Lee <Soohoon.Lee@f5.com>
Tested-by: Soohoon Lee <Soohoon.Lee@f5.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 50ab8ec74a153eb30db26529088bc57dd700b24c ]
See http: //www.infradead.org/rpr.html
X-Evolution-Source: 1451162204.2173.11@leira.trondhjem.org
Content-Transfer-Encoding: 8bit
Mime-Version: 1.0
We support OFFSET_MAX just fine, so don't round down below it. Also
switch to using min_t to make the helper more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 433c92379d9c ("NFS: Clean up nfs_size_to_loff_t()")
Cc: stable@vger.kernel.org # 2.6.23+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
commit 0f4a943168f31d29a1701908931acaba518b131a upstream.
To address the bug where fabric driver level shutdown
of se_cmd occurs at the same time when TMR CMD_T_ABORTED
is happening resulting in a -1 ->cmd_kref, this patch
adds a CMD_T_FABRIC_STOP bit that is used to determine
when TMR + driver I_T nexus shutdown is happening
concurrently.
It changes target_sess_cmd_list_set_waiting() to obtain
se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
reference in target_wait_for_sess_cmds() and invoke extra
target_put_sess_cmd() during Task Aborted Status (TAS)
when necessary.
Also, it adds a new target_wait_free_cmd() wrapper around
transport_wait_for_tasks() for the special case within
transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
and is now aware of CMD_T_ABORTED + CMD_T_TAS status
bits to know when an extra transport_put_cmd() during
TAS is required.
Note transport_generic_free_cmd() is expected to block on
cmd->cmd_wait_comp in order to follow what iscsi-target
expects during iscsi_conn context se_cmd shutdown.
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
commit febe562c20dfa8f33bee7d419c6b517986a5aa33 upstream.
This patch fixes a NULL pointer se_cmd->cmd_kref < 0
refcount bug during TMR LUN_RESET with active se_cmd
I/O, that can be triggered during se_cmd descriptor
shutdown + release via core_tmr_drain_state_list() code.
To address this bug, add common __target_check_io_state()
helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
both cases ahead of last target_put_sess_cmd() after
TFO->aborted_task() -> transport_cmd_finish_abort()
callback has completed.
It also introduces SCF_ACK_KREF to determine when
transport_cmd_finish_abort() needs to drop the second
extra reference, ahead of calling target_put_sess_cmd()
for the final kref_put(&se_cmd->cmd_kref).
It also updates transport_cmd_check_stop() to avoid
holding se_cmd->t_state_lock while dropping se_cmd
device state via target_remove_from_state_list(), now
that core_tmr_drain_state_list() is holding the
se_device lock while checking se_cmd state from
within TMR logic.
Finally, move transport_put_cmd() release of SGL +
TMR + extended CDB memory into target_free_cmd_mem()
in order to avoid potential resource leaks in TMR
ABORT_TASK + LUN_RESET code-paths. Also update
target_release_cmd_kref() accordingly.
Reviewed-by: Quinn Tran <quinn.tran@qlogic.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 057085e522f8bf94c2e691a5b76880f68060f8ba ]
This patch addresses a race + use after free where the first
stage of COMPARE_AND_WRITE in compare_and_write_callback()
is rescheduled after the backend sends the secondary WRITE,
resulting in second stage compare_and_write_post() callback
completing in target_complete_ok_work() before the first
can return.
Because current code depends on checking se_cmd->se_cmd_flags
after return from se_cmd->transport_complete_callback(),
this results in first stage having SCF_COMPARE_AND_WRITE_POST
set, which incorrectly falls through into second stage CAW
processing code, eventually triggering a NULL pointer
dereference due to use after free.
To address this bug, pass in a new *post_ret parameter into
se_cmd->transport_complete_callback(), and depend upon this
value instead of ->se_cmd_flags to determine when to return
or fall through into ->queue_status() code for CAW.
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit deed49df7390d5239024199e249190328f1651e7 ]
Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.
Fix this issue by checking and removing route cache when we get route.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 5f74f82ea34c0da80ea0b49192bb5ea06e063593 ]
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 ]
The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.
To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.
Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4 ]
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 6f21c96a78b835259546d8f3fb4edff0f651d478 ]
The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.
This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.
ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit b33c8ff4431a343561e2319f17c14286f2aa52e2 ]
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
u32 adc_clock = 450560; /* 45.056 MHz */
if (state->config.adc_clock)
adc_clock = state->config.adc_clock;
do_div(value, adc_clock);
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de
Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit f37755490fe9bf76f6ba1d8c6591745d3574a6a6 ]
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ]
4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
Call Trace:
[c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
[c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
[c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
[c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
[c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
[c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
[c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
[c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
[c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
[c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
[c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
[c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 1f55c718c290616889c04946864a13ef30f64929 ]
Considering current pty code and multiple devpts instances, it's possible
to umount a devpts file system while a program still has /dev/tty opened
pointing to a previosuly closed pty pair in that instance. In the case all
ptmx and pts/N files are closed, umount can be done. If the program closes
/dev/tty after umount is done, devpts_kill_index will use now an invalid
super_block, which was already destroyed in the umount operation after
running ->kill_sb. This is another "use after free" type of issue, but now
related to the allocated super_block instance.
To avoid the problem (warning at ida_remove and potential crashes) for
this specific case, I added two functions in devpts which grabs additional
references to the super_block, which pty code now uses so it makes sure
the super block structure is still valid until pty shutdown is done.
I also moved the additional inode references to the same functions, which
also covered similar case with inode being freed before /dev/tty final
close/shutdown.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Cc: stable@vger.kernel.org # 2.6.29+
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit afc16604c06414223478df3e42301ab630b9960a ]
The first argument of these two functions is always identical
to se_cmd->se_sess. Hence remove the first argument.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: <qla2xxx-upstream@qlogic.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 0f26922fe5dc5724b1adbbd54b21bad03590b4f3 ]
The datatype __kernel_time_t is u32 on 32bit platform, so its subject to
overflows in the timeval/timespec to cputime conversion.
Currently the following functions are affected:
1. setitimer()
2. timer_create/timer_settime()
3. sys_clock_nanosleep
This can happen on MIPS32 and ARM32 with "Full dynticks CPU time accounting"
enabled, which is required for CONFIG_NO_HZ_FULL.
Enforce u64 conversion to prevent the overflow.
Fixes: 31c1fc818715 ("ARM: Kconfig: allow full nohz CPU accounting")
Signed-off-by: zengtao <prime.zeng@huawei.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: <fweisbec@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1454384314-154784-1-git-send-email-prime.zeng@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit aa226ff4a1ce79f229c6b7a4c0a14e17fececd01 ]
There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free(). Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children. This behavior is unexpected and led to bugs in cpu and
memory controller.
This patch updates offline path so that a parent css is never offlined
before its children. Each css keeps online_cnt which reaches zero iff
itself and all its children are offline and offline_css() is invoked
only after online_cnt reaches zero.
This fixes the memory controller bug and allows the fix for cpu
controller.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reported-by: Brian Christiansen <brian.o.christiansen@gmail.com>
Link: http://lkml.kernel.org/g/5698A023.9070703@de.ibm.com
Link: http://lkml.kernel.org/g/CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit b4a04ab7a37b490cad48e69abfe14288cacb669c ]
From 2d728f74bfc071df06773e2fd7577dd5dab6425d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Wed, 13 May 2015 15:37:01 -0400
This patch separates out cgroup-defs.h from cgroup.h which has grown a
lot of dependencies. cgroup-defs.h currently only contains constant
and type definitions and can be used to break circular include
dependency. While moving, definitions are reordered so that
cgroup-defs.h has consistent logical structure.
This patch is pure reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 732042821cfa106b3c20b9780e4c60fee9d68900 ]
Helper radix_tree_iter_retry() resets next_index to the current index.
In following radix_tree_next_slot current chunk size becomes zero. This
isn't checked and it tries to dereference null pointer in slot.
Tagged iterator is fine because retry happens only at slot 0 where tag
bitmask in iter->tags is filled with single bit.
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 12352d3cae2cebe18805a91fab34b534d7444231 ]
Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
anon_vma appeared between lock and unlock. We have to check anon_vma
first or call anon_vma_prepare() to be sure that it's here. There are
only few users of these legacy helpers. Let's get rid of them.
This patch fixes anon_vma lock imbalance in validate_mm(). Write lock
isn't required here, read lock is enough.
And reorders expand_downwards/expand_upwards: security_mmap_addr() and
wrapping-around check don't have to be under anon vma lock.
Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 4772ff03df8094fd99d28de5fcf5df3a3e9c68bb ]
The port is removed synchronously, but the connector delayed.
This causes a use after free which can cause a kernel BUG with
slug_debug=FPZU. This is fixed by freeing the port after the
connector.
This fixes a regression introduced with
6b8eeca65b18ae77e175cc2b6571731f0ee413bf
"drm/dp/mst: close deadlock in connector destruction."
Cc: stable@vger.kernel.org
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 64566b5e767f9bc3161055ca1b443a51afb52aad ]
drm_fixp_from_fraction allows us to create a fixed point directly
from a fraction, rather than creating fixed point values and dividing
later. This avoids overflow of our 64 bit value for large numbers.
drm_fixp2int_ceil allows us to return the ceiling of our fixed point
value.
[airlied: squash Jordan's fix]
32-bit-build-fix: Jordan Lazare <Jordan.Lazare@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 46437f9a554fbe3e110580ca08ab703b59f2f95a ]
If the indirect_ptr bit is set on a slot, that indicates we need to redo
the lookup. Introduce a new function radix_tree_iter_retry() which
forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.
This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0. The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.
Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 06ab30034ed9c200a570ab13c017bde248ddb2a6 ]
A kernel WARNING in snd_rawmidi_transmit_ack() is triggered by
syzkaller fuzzer:
WARNING: CPU: 1 PID: 20739 at sound/core/rawmidi.c:1136
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<ffffffff84f80bd5>] snd_rawmidi_transmit_ack+0x275/0x400 sound/core/rawmidi.c:1136
[<ffffffff84fdb3c1>] snd_virmidi_output_trigger+0x4b1/0x5a0 sound/core/seq/seq_virmidi.c:163
[< inline >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
[<ffffffff84f87ed9>] snd_rawmidi_kernel_write1+0x549/0x780 sound/core/rawmidi.c:1223
[<ffffffff84f89fd3>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1273
[<ffffffff817b0323>] __vfs_write+0x113/0x480 fs/read_write.c:528
[<ffffffff817b1db7>] vfs_write+0x167/0x4a0 fs/read_write.c:577
[< inline >] SYSC_write fs/read_write.c:624
[<ffffffff817b50a1>] SyS_write+0x111/0x220 fs/read_write.c:616
[<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
Also a similar warning is found but in another path:
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82be2c0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff81355139>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<ffffffff81355369>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<ffffffff8527e69a>] rawmidi_transmit_ack+0x24a/0x3b0 sound/core/rawmidi.c:1133
[<ffffffff8527e851>] snd_rawmidi_transmit_ack+0x51/0x80 sound/core/rawmidi.c:1163
[<ffffffff852d9046>] snd_virmidi_output_trigger+0x2b6/0x570 sound/core/seq/seq_virmidi.c:185
[< inline >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
[<ffffffff85285a0b>] snd_rawmidi_kernel_write1+0x4bb/0x760 sound/core/rawmidi.c:1252
[<ffffffff85287b73>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1302
[<ffffffff817ba5f3>] __vfs_write+0x113/0x480 fs/read_write.c:528
[<ffffffff817bc087>] vfs_write+0x167/0x4a0 fs/read_write.c:577
[< inline >] SYSC_write fs/read_write.c:624
[<ffffffff817bf371>] SyS_write+0x111/0x220 fs/read_write.c:616
[<ffffffff86660276>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
In the former case, the reason is that virmidi has an open code
calling snd_rawmidi_transmit_ack() with the value calculated outside
the spinlock. We may use snd_rawmidi_transmit() in a loop just for
consuming the input data, but even there, there is a race between
snd_rawmidi_transmit_peek() and snd_rawmidi_tranmit_ack().
Similarly in the latter case, it calls snd_rawmidi_transmit_peek() and
snd_rawmidi_tranmit_ack() separately without protection, so they are
racy as well.
The patch tries to address these issues by the following ways:
- Introduce the unlocked versions of snd_rawmidi_transmit_peek() and
snd_rawmidi_transmit_ack() to be called inside the explicit lock.
- Rewrite snd_rawmidi_transmit() to be race-free (the former case).
- Make the split calls (the latter case) protected in the rawmidi spin
lock.
BugLink: http://lkml.kernel.org/r/CACT4Y+YPq1+cYLkadwjWa5XjzF1_Vki1eHnVn-Lm0hzhSpu5PA@mail.gmail.com
BugLink: http://lkml.kernel.org/r/CACT4Y+acG4iyphdOZx47Nyq_VHGbpJQK-6xNpiqUjaZYqsXOGw@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 4b0e4e4af6c6dc8354dcb72182d52c1bc55f12fc ]
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 67645d7619738e51c668ca69f097cb90b5470422 ]
There are a number of problems with revoking a "was sending" message:
(1) We never make any attempt to revoke data - only kvecs contibute to
con->out_skip. However, once the header (envelope) is written to the
socket, our peer learns data_len and sets itself to expect at least
data_len bytes to follow front or front+middle. If ceph_msg_revoke()
is called while the messenger is sending message's data portion,
anything we send after that call is counted by the OSD towards the now
revoked message's data portion. The effects vary, the most common one
is the eventual hang - higher layers get stuck waiting for the reply to
the message that was sent out after ceph_msg_revoke() returned and
treated by the OSD as a bunch of data bytes. This is what Matt ran
into.
(2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
is wrong. If ceph_msg_revoke() is called before the tag is sent out or
while the messenger is sending the header, we will get a connection
reset, either due to a bad tag (0 is not a valid tag) or a bad header
CRC, which kind of defeats the purpose of revoke. Currently the kernel
client refuses to work with header CRCs disabled, but that will likely
change in the future, making this even worse.
(3) con->out_skip is not reset on connection reset, leading to one or
more spurious connection resets if we happen to get a real one between
con->out_skip is set in ceph_msg_revoke() and before it's cleared in
write_partial_skip().
Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never
zero the tag or the header, i.e. send out tag+header regardless of when
ceph_msg_revoke() is called. That way the header is always correct, no
unnecessary resets are induced and revoke stands ready for disabled
CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new
"message out temp" and copy the header into it before sending.
Cc: stable@vger.kernel.org # 4.0+
Reported-by: Matt Conner <matt.conner@keepertech.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Matt Conner <matt.conner@keepertech.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 ]
This patch allows af_alg_release_parent to be called even for
nokey sockets.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit a5596d6332787fd383b3b5427b41f94254430827 ]
This patch adds a way for ahash users to determine whether a key
is required by a crypto_ahash transform.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 37766586c965d63758ad542325a96d5384f4a8c9 ]
This patch adds a compatibility path to support old applications
that do acept(2) before setkey.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit c840ac6af3f8713a71b4d2363419145760bd6044 ]
Each af_alg parent socket obtained by socket(2) corresponds to a
tfm object once bind(2) has succeeded. An accept(2) call on that
parent socket creates a context which then uses the tfm object.
Therefore as long as any child sockets created by accept(2) exist
the parent socket must not be modified or freed.
This patch guarantees this by using locks and a reference count
on the parent socket. Any attempt to modify the parent socket will
fail with EBUSY.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 8d91f8b15361dfb438ab6eb3b319e2ded43458ff ]
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 1f16ee7fa13649f4e55aa48ad31c3eb0722a62d3 ]
We should always send reply for UP request in order
to make downstream device clean-up resources appropriately.
Issue was that reply for UP request was sent only once.
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Mykola Lysenko <Mykola.Lysenko@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit cd812599796f500b042f5464b6665755eca21137 ]
Setting the change attribute has been mandatory for all NFS versions, since
commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should
therefore not have anything be conditional on it being set/unset.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 4511f7166a2deb5f7a578cf87fd2fe1ae83527e3 ]
When a new cooling device is registered, we need to update the
thermal zone to set the new registered cooling device to a proper
state.
This fixes a problem that the system is cool, while the fan devices
are left running on full speed after boot, if fan device is registered
after thermal zone device.
Here is the history of why current patch looks like this:
https://patchwork.kernel.org/patch/7273041/
CC: <stable@vger.kernel.org> #3.18+
Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431
Tested-by: Manuel Krause <manuelkrause@netscape.net>
Tested-by: szegad <szegadlo@poczta.onet.pl>
Tested-by: prash <prash.n.rao@gmail.com>
Tested-by: amish <ammdispose-arch@yahoo.com>
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 ]
After thermal zone device registered, as we have not read any
temperature before, thus tz->temperature should not be 0,
which actually means 0C, and thermal trend is not available.
In this case, we need specially handling for the first
thermal_zone_device_update().
Both thermal core framework and step_wise governor is
enhanced to handle this. And since the step_wise governor
is the only one that uses trends, so it's the only thermal
governor that needs to be updated.
CC: <stable@vger.kernel.org> #3.18+
Tested-by: Manuel Krause <manuelkrause@netscape.net>
Tested-by: szegad <szegadlo@poczta.onet.pl>
Tested-by: prash <prash.n.rao@gmail.com>
Tested-by: amish <ammdispose-arch@yahoo.com>
Tested-by: Matthias <morpheusxyz123@yahoo.de>
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
commit fbc416ff86183e2203cdf975e2881d7c164b0271 upstream.
As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
disabled currently fails because the system call table still needs to
reference the individual function entry points that are provided by
kernel/sys_ni.c in this case, and the declarations are hidden inside
of #ifdef CONFIG_UID16:
arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
__SYSCALL(__NR_lchown, sys_lchown16)
I believe this problem only exists on ARM64, because older architectures
tend to not need declarations when their system call table is built
in assembly code, while newer architectures tend to not need UID16
support. ARM64 only uses these system calls for compatibility with
32-bit ARM binaries.
This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
declarations whenever we need them, but otherwise the behavior is
unchanged.
Fixes: af1839eb4bd4 ("Kconfig: clean up the long arch list for the UID16 config option")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8695a144da9e500a5a60fa34c06694346ec1048f ]
I’m using the compilation flag -Werror=old-style-declaration, which
requires that the “inline” word would come at the beginning of the code
line.
$ make drivers/net/ethernet/intel/e1000e/e1000e.ko
...
include/net/inet_timewait_sock.h:116:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_schedule(struct inet_timewait_sock *tw, int
timeo)
include/net/inet_timewait_sock.h:121:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_reschedule(struct inet_timewait_sock *tw,
int timeo)
Fixes: ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ed2e923945892a8372ab70d2f61d364b0b6d9054 ]
When creating a timewait socket, we need to arm the timer before
allowing other cpus to find it. The signal allowing cpus to find
the socket is setting tw_refcnt to non zero value.
As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
call inet_twsk_schedule() first.
This also means we need to remove tw_refcnt changes from
inet_twsk_schedule() and let the caller handle it.
Note that because we use mod_timer_pinned(), we have the guarantee
the timer wont expire before we set tw_refcnt as we run in BH context.
To make things more readable I introduced inet_twsk_reschedule() helper.
When rearming the timer, we can use mod_timer_pending() to make sure
we do not rearm a canceled timer.
Note: This bug can possibly trigger if packets of a flow can hit
multiple cpus. This does not normally happen, unless flow steering
is broken somehow. This explains this bug was spotted ~5 months after
its introduction.
A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
but will be provided in a separate patch for proper tracking.
Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 34ae6a1aa0540f0f781dd265366036355fdc8930 ]
When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.
It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9207f9d45b0ad071baa128e846d7e7ed85016df3 ]
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.
This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 55795ef5469290f89f04e12e662ded604909e462 ]
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value. All the BPF JITs fail to clear A if this is used as
the first instruction in a filter. This was found using american fuzzy
lop.
Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs. Except for ARM, the
rest have only been compile-tested.
Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 712f4aad406bb1ed67f3f98d04c044191f0ff593 ]
It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.
This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.
Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1dfddff5fcd869fcab0c52fafae099dfa435a935 ]
NCM buffer sizes are negotiated with the device independently of
the network device MTU. The RX buffers are allocated by the
usbnet framework based on the rx_urb_size value set by cdc_ncm. A
single RX buffer can hold a number of MTU sized packets.
The default usbnet change_mtu ndo only modifies rx_urb_size if it
is equal to hard_mtu. And the cdc_ncm driver will set rx_urb_size
and hard_mtu independently of each other, based on dwNtbInMaxSize
and dwNtbOutMaxSize respectively. It was therefore assumed that
usbnet_change_mtu() would never touch rx_urb_size. This failed to
consider the case where dwNtbInMaxSize and dwNtbOutMaxSize happens
to be equal.
Fix by implementing an NCM specific change_mtu ndo, modifying the
netdev MTU without touching the buffer size settings.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5037e9ef9454917b047f9f3a19b4dd179fbf7cd4 ]
David Wilder reported crashes caused by dst reuse.
<quote David>
I am seeing a crash on a distro V4.2.3 kernel caused by a double
release of a dst_entry. In ipv4_dst_destroy() the call to
list_empty() finds a poisoned next pointer, indicating the dst_entry
has already been removed from the list and freed. The crash occurs
18 to 24 hours into a run of a network stress exerciser.
</quote>
Thanks to his detailed report and analysis, we were able to understand
the core issue.
IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.
When socket cache is not properly set, we want to store into
sk->sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.
Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.
We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.
This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.
It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb->dst
can suddenly be cleared.
Can probably be backported back to linux-3.6 kernels
Reported-by: David J. Wilder <dwilder@us.ibm.com>
Tested-by: David J. Wilder <dwilder@us.ibm.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ]
Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
related to disabling sock timestamp.
When SCTP accepts an association or peel one off, it copies sock flags
but forgot to call net_enable_timestamp() if a packet timestamping flag
was copied, leading to extra calls to net_disable_timestamp() whenever
such clones were closed.
The fix is to call net_enable_timestamp() whenever we copy a sock with
that flag on, like tcp does.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|