Age | Commit message (Collapse) | Author |
|
[ Upstream commit 2cc683e88c0c993ac3721d9b702cb0630abe2879 ]
Need to lock lower socket in order to provide mutual exclusion
with kcm_unattach.
v2: Add Reported-by for syzbot
Fixes: ab7ac4eb9832e32a09f4e804 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6e5d58fdc9bedd0255a8781b258f10bbdc63e975 ]
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.
Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.witt@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2cbb4ea7de167b02ffa63e9cdfdb07a7e7094615 ]
Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings
if the index is actually set in the message.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 02a2385f37a7c6594c9d89b64c4a1451276f08eb ]
nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.
Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fa6a91e9b907231d2e38ea5ed89c537b3525df3d ]
Free memory by calling put_device(), if afiucv_iucv_init is not
successful.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 17cfe79a65f98abe535261856c5aef14f306dff7 ]
syzkaller found an issue caused by lack of sufficient checks
in l2tp_tunnel_create()
RAW sockets can not be considered as UDP ones for instance.
In another patch, we shall replace all pr_err() by less intrusive
pr_debug() so that syzkaller can find other bugs faster.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Acked-by: James Chapman <jchapman@katalix.com>
==================================================================
BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
dst_release: dst:00000000d53d0d0f refcnt:-1
Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242
CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23b/0x360 mm/kasan/report.c:412
__asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435
setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596
pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707
SYSC_connect+0x213/0x4a0 net/socket.c:1640
SyS_connect+0x24/0x30 net/socket.c:1621
do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9f62c15f28b0d1d746734666d88a79f08ba1e43e ]
Fix the following slab-out-of-bounds kasan report in
ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not
linear and the accessed data are not in the linear data region of orig_skb.
[ 1503.122508] ==================================================================
[ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990
[ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932
[ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124
[ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 1503.123527] Call Trace:
[ 1503.123579] <IRQ>
[ 1503.123638] print_address_description+0x6e/0x280
[ 1503.123849] kasan_report+0x233/0x350
[ 1503.123946] memcpy+0x1f/0x50
[ 1503.124037] ndisc_send_redirect+0x94e/0x990
[ 1503.125150] ip6_forward+0x1242/0x13b0
[...]
[ 1503.153890] Allocated by task 1932:
[ 1503.153982] kasan_kmalloc+0x9f/0xd0
[ 1503.154074] __kmalloc_track_caller+0xb5/0x160
[ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70
[ 1503.154324] __alloc_skb+0x130/0x3e0
[ 1503.154415] sctp_packet_transmit+0x21a/0x1810
[ 1503.154533] sctp_outq_flush+0xc14/0x1db0
[ 1503.154624] sctp_do_sm+0x34e/0x2740
[ 1503.154715] sctp_primitive_SEND+0x57/0x70
[ 1503.154807] sctp_sendmsg+0xaa6/0x1b10
[ 1503.154897] sock_sendmsg+0x68/0x80
[ 1503.154987] ___sys_sendmsg+0x431/0x4b0
[ 1503.155078] __sys_sendmsg+0xa4/0x130
[ 1503.155168] do_syscall_64+0x171/0x3f0
[ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.155436] Freed by task 1932:
[ 1503.155527] __kasan_slab_free+0x134/0x180
[ 1503.155618] kfree+0xbc/0x180
[ 1503.155709] skb_release_data+0x27f/0x2c0
[ 1503.155800] consume_skb+0x94/0xe0
[ 1503.155889] sctp_chunk_put+0x1aa/0x1f0
[ 1503.155979] sctp_inq_pop+0x2f8/0x6e0
[ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230
[ 1503.156164] sctp_inq_push+0x117/0x150
[ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0
[ 1503.156346] __release_sock+0x142/0x250
[ 1503.156436] release_sock+0x80/0x180
[ 1503.156526] sctp_sendmsg+0xbb0/0x1b10
[ 1503.156617] sock_sendmsg+0x68/0x80
[ 1503.156708] ___sys_sendmsg+0x431/0x4b0
[ 1503.156799] __sys_sendmsg+0xa4/0x130
[ 1503.156889] do_syscall_64+0x171/0x3f0
[ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.157158] The buggy address belongs to the object at ffff8800298ab600
which belongs to the cache kmalloc-1024 of size 1024
[ 1503.157444] The buggy address is located 176 bytes inside of
1024-byte region [ffff8800298ab600, ffff8800298aba00)
[ 1503.157702] The buggy address belongs to the page:
[ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 1503.158053] flags: 0x4000000000008100(slab|head)
[ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e
[ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000
[ 1503.158523] page dumped because: kasan: bad access detected
[ 1503.158698] Memory state around the buggy address:
[ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1503.159338] ^
[ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159785] ==================================================================
[ 1503.159964] Disabling lock debugging due to kernel taint
The test scenario to trigger the issue consists of 4 devices:
- H0: data sender, connected to LAN0
- H1: data receiver, connected to LAN1
- GW0 and GW1: routers between LAN0 and LAN1. Both of them have an
ethernet connection on LAN0 and LAN1
On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for
data from LAN0 to LAN1.
Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent
data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send
buffer size is set to 16K). While data streams are active flush the route
cache on HA multiple times.
I have not been able to identify a given commit that introduced the issue
since, using the reproducer described above, the kasan report has been
triggered from 4.14 and I have not gone back further.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 67f93df79aeefc3add4e4b31a752600f834236e2 ]
dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL,
therefore if DCCP socket is disconnected and dccp_sendmsg() is
called after it, it will cause a NULL pointer dereference in
dccp_write_xmit().
This crash and the reproducer was reported by syzbot. Looks like
it is reproduced if commit 69c64866ce07 ("dccp: CVE-2017-8824:
use-after-free in DCCP code") is applied.
Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a560002437d3646dafccecb1bf32d1685112ddda ]
inet_evict_bucket() iterates global list, and
several tasks may call it in parallel. All of
them hash the same fq->list_evictor to different
lists, which leads to list corruption.
This patch makes fq be hashed to expired list
only if this has not been made yet by another
task. Since inet_frag_alloc() allocates fq
using kmem_cache_zalloc(), we may rely on
list_evictor is initially unhashed.
The problem seems to exist before async
pernet_operations, as there was possible to have
exit method to be executed in parallel with
inet_frags::frags_work, so I add two Fixes tags.
This also may go to stable.
Fixes: d1fe19444d82 "inet: frag: don't re-use chainlist for evictor"
Fixes: f84c6821aa54 "net: Convert pernet_subsys, registered from inet_init()"
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4dcb31d4649df36297296b819437709f5407059c ]
Andrei Vagin reported a KASAN: slab-out-of-bounds error in
skb_update_prio()
Since SYNACK might be attached to a request socket, we need to
get back to the listener socket.
Since this listener is manipulated without locks, add const
qualifiers to sock_cgroup_prioidx() so that the const can also
be used in skb_update_prio()
Also add the const qualifier to sock_cgroup_classid() for consistency.
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca0edb131bdf1e6beaeb2b8289fd6b374b74147d ]
A tun device type can trivially be set to arbitrary value using
TUNSETLINK ioctl().
Therefore, lowpan_device_event() must really check that ieee802154_ptr
is not NULL.
Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 35d889d10b649fda66121891ec05eca88150059d ]
When we exceed current packets limit and we have more than one
segment in the list returned by skb_gso_segment(), netem drops
only the first one, skipping the rest, hence kmemleak reports:
unreferenced object 0xffff880b5d23b600 (size 1024):
comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
hex dump (first 32 bytes):
00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
[<000000001709b32f>] skb_segment+0x8c8/0x3710
[<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
[<00000000c921cba1>] inet_gso_segment+0x476/0x1370
[<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
[<000000002182660a>] __skb_gso_segment+0x1dd/0x620
[<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
[<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
[<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
[<00000000d309e9d3>] ip_output+0x1aa/0x2c0
[<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
[<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
[<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
[<0000000013d06d02>] tasklet_action+0x1ca/0x250
[<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
[<00000000e7ed027c>] irq_exit+0x1e2/0x210
Fix it by adding the rest of the segments, if any, to skb 'to_free'
list. Add new __qdisc_drop_all() and qdisc_drop_all() functions
because they can be useful in the future if we need to drop segmented
GSO packets in other places.
Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 51d4740f88affd85d49c04e3c9cd129c0e33bcb9 ]
If set/unset mode of the tunnel_key action is not provided, ->init() still
returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
this results in crash:
% tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1
[ 35.805515] general protection fault: 0000 [#1] SMP PTI
[ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
crypto_simd glue_helper cryptd serio_raw
[ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
[ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190
[ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
[ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
[ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
[ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
[ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
[ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
[ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
knlGS:0000000000000000
[ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
[ 35.816457] Call Trace:
[ 35.817158] tc_ctl_action+0x11a/0x220
[ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0
[ 35.818457] ? __slab_alloc+0x1c/0x30
[ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0
[ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0
[ 35.820231] netlink_rcv_skb+0xce/0x100
[ 35.820744] netlink_unicast+0x164/0x220
[ 35.821500] netlink_sendmsg+0x293/0x370
[ 35.822040] sock_sendmsg+0x30/0x40
[ 35.822508] ___sys_sendmsg+0x2c5/0x2e0
[ 35.823149] ? pagecache_get_page+0x27/0x220
[ 35.823714] ? filemap_fault+0xa2/0x640
[ 35.824423] ? page_add_file_rmap+0x108/0x200
[ 35.825065] ? alloc_set_pte+0x2aa/0x530
[ 35.825585] ? finish_fault+0x4e/0x70
[ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0
[ 35.826723] ? __sys_sendmsg+0x41/0x70
[ 35.827230] __sys_sendmsg+0x41/0x70
[ 35.827710] do_syscall_64+0x68/0x120
[ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 35.828859] RIP: 0033:0x7f3d0ca4da67
[ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
[ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
[ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
[ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
[ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
[ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
[ 35.838291] ---[ end trace a095c06ee4b97a26 ]---
Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 53c81e95df1793933f87748d36070a721f6cb287 ]
LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over
ip6_vti that require fragmentation and the underlying device has an
MTU smaller than 1500 plus some extra space for headers. This happens
because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating
it depending on a destination address or link parameter. Further
attempts to send UDP packets may succeed because pmtu gets updated on
ICMPV6_PKT_TOOBIG in vti6_err().
In case the lower device has larger MTU size, e.g. 9000, ip6_vti works
but not using the possible maximum size, output packets have 1500 limit.
The above cases require manual MTU setup after ip6_vti creation. However
ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev().
Here is the example when the lower device MTU is set to 9000:
# ip a sh ltp_ns_veth2
ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ...
inet 10.0.0.2/24 scope global ltp_ns_veth2
inet6 fd00::2/64 scope global
# ip li add vti6 type vti6 local fd00::2 remote fd00::1
# ip li show vti6
vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ...
link/tunnel6 fd00::2 peer fd00::1
After the patch:
# ip li add vti6 type vti6 local fd00::2 remote fd00::1
# ip li show vti6
vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
link/tunnel6 fd00::2 peer fd00::1
Reported-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7dde07e9c53617549d67dd3e1d791496d0d3868e ]
According to my static checker we should unlock here before the return.
That seems reasonable to me as well.
Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 21a8e9dd52b64f0170bad208293ef8c30c3c1403 ]
Existing API 'ieee80211_get_sdata_band' returns default 2 GHz band even
if the channel context configuration is NULL. This crashes for chipsets
which support 5 Ghz alone when it tries to access members of 'sband'.
Channel context configuration can be NULL in multivif case and when
channel switch is in progress (or) when it fails. Fix this by replacing
the API 'ieee80211_get_sdata_band' with 'ieee80211_get_sband' which
returns a NULL pointer for sband when the channel configuration is NULL.
An example scenario is as below:
In multivif mode (AP + STA) with drivers like ath10k, when we do a
channel switch in the AP vif (which has a number of clients connected)
and a STA vif which is connected to some other AP, when the channel
switch in AP vif fails, while the STA vifs tries to connect to the
other AP, there is a window where the channel context is NULL/invalid
and this results in a crash while the clients connected to the AP vif
tries to reconnect and this race is very similar to the one investigated
by Michal in https://patchwork.kernel.org/patch/3788161/ and this does
happens with hardware that supports 5Ghz alone after long hours of
testing with continuous channel switch on the AP vif
ieee80211 phy0: channel context reservation cannot be finalized because
some interfaces aren't switching
wlan0: failed to finalize CSA, disconnecting
wlan0-1: deauthenticating from 8c:fd:f0:01:54:9c by local choice
(Reason: 3=DEAUTH_LEAVING)
WARNING: CPU: 1 PID: 19032 at net/mac80211/ieee80211_i.h:1013 sta_info_alloc+0x374/0x3fc [mac80211]
[<bf77272c>] (sta_info_alloc [mac80211])
[<bf78776c>] (ieee80211_add_station [mac80211]))
[<bf73cc50>] (nl80211_new_station [cfg80211])
Unable to handle kernel NULL pointer dereference at virtual
address 00000014
pgd = d5f4c000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
PC is at sta_info_alloc+0x380/0x3fc [mac80211]
LR is at sta_info_alloc+0x37c/0x3fc [mac80211]
[<bf772738>] (sta_info_alloc [mac80211])
[<bf78776c>] (ieee80211_add_station [mac80211])
[<bf73cc50>] (nl80211_new_station [cfg80211]))
Cc: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1442f6f7c1b77de1c508318164a527e240c24a4d ]
When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.
This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.
This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.
According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")
Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cf147085fdda044622973a12e4e06f1c753ab677 ]
ieee80211_frame_acked is called when a frame is acked by
the peer. In case this is a management frame, we check
if this an SMPS frame, in which case we can update our
antenna configuration.
When we parse the management frame we look at the category
in case it is an action frame. That byte sits after the IV
in case the frame was encrypted. This means that if the
frame was encrypted, we basically look at the IV instead
of looking at the category. It is then theorically
possible that we think that an SMPS action frame was acked
where really we had another frame that was encrypted.
Since the only management frame whose ack needs to be
tracked is the SMPS action frame, and that frame is not
a robust management frame, it will never be encrypted.
The easiest way to fix this problem is then to not look
at frames that were encrypted.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9378b274e1eb6925db315e345f48850d2d5d9789 ]
Trying to create MRs while the transport is being torn down can
cause a crash.
Fixes: e2ac236c0b65 ("xprtrdma: Allocate MRs on demand")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 277a292835c196894ef895d5e1fd6170bb916f55 ]
Currently, after adding the following nft rules:
# nft add set x target1 { type ipv4_addr \; flags timeout \;}
# nft add rule x y set add ip daddr timeout 1d @target1 counter
the counters will always be zero despite of the elements are added
to the dynamic set "target1" or not, as we will break the nft expr
traversal unconditionally:
# nft list ruleset
...
set target1 {
...
elements = { 8.8.8.8 expires 23h59m53s}
}
chain output {
...
set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
^ ^
...
}
Since we add the elements to the set successfully, we should continue
to the next expression.
Additionally, if elements are added to "flow table" successfully, we
will _always_ continue to the next expr, even if the operation is
_OP_ADD. So it's better to keep them to be consistent.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 78302fd405769c9a9379e9adda119d533dce2eed ]
Function nlmsg_new() will return a NULL pointer if there is no enough
memory, and its return value should be checked before it is used.
However, in function tipc_nl_node_get_monitor(), the validation of the
return value of function nlmsg_new() is missed. This patch fixes the
bug.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 66e5a6b18bd09d0431e97cd3c162e76c5c2aebba ]
cthelpers added via nfnetlink may have the same tuple, i.e. except for
the l3proto and l4proto, other fields are all zero. So even with the
different names, we will also fail to add them:
# nfct helper add ssdp inet udp
# nfct helper add tftp inet udp
nfct v1.4.3: netlink error: File exists
So in order to avoid unpredictable behaviour, we should:
1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
report error if duplicated { name, l3proto, l4proto } tuple exist.
2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
nf_ct_auto_assign_helper is enabled, so also report error if duplicated
{ l3proto, l4proto, src-port } tuple exist.
Also note, if the cthelper is added from userspace, then the src-port will
always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
need to check the second point listed above.
Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cf5d70918877c6a6655dc1e92e2ebb661ce904fd ]
Conntrack helpers do not check for a potentially clashing conntrack
entry when creating a new expectation. Also, nf_conntrack_in() will
check expectations (via init_conntrack()) only if a conntrack entry
can not be found. The expectation for a packet which also matches an
existing conntrack entry will not be removed by conntrack, and is
currently handled inconsistently by OVS, as OVS expects the
expectation to be removed when the connection tracking entry matching
that expectation is confirmed.
It should be noted that normally an IP stack would not allow reuse of
a 5-tuple of an old (possibly lingering) connection for a new data
connection, so this is somewhat unlikely corner case. However, it is
possible that a misbehaving source could cause conntrack entries be
created that could then interfere with new related connections.
Fix this in the OVS module by deleting the clashing conntrack entry
after an expectation has been matched. This causes the following
nf_conntrack_in() call also find the expectation and remove it when
creating the new conntrack entry, as well as the forthcoming reply
direction packets to match the new related connection instead of the
old clashing conntrack entry.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 470acf55a021713869b9bcc967268ac90c8a0fac ]
There are two cases which causes refcnt leak.
1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
free the timeout refcnt.
Now goto the err_put_timeout error handler instead of going ahead.
2. When the time policy is not found, we should call module_put.
Otherwise, the related cthelper module cannot be removed anymore.
It is easy to reproduce by typing the following command:
# iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0f9fa831aecfc297b7b45d4f046759bcefcf87f0 ]
When using TCP FastOpen for an active session, we send one wakeup event
from tcp_finish_connect(), right before the data eventually contained in
the received SYNACK is queued to sk->sk_receive_queue.
This means that depending on machine load or luck, poll() users
might receive POLLOUT events instead of POLLIN|POLLOUT
To fix this, we need to move the call to sk->sk_state_change()
after the (optional) call to tcp_rcv_fastopen_synack()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4a6e3c5def13c91adf2acc613837001f09af3baa ]
ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is
set to 1, gratuitous arp requests are sent when the device is brought up.
The same is expected when ndisc_notify is set to 1 (per ndisc_notify in
Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP
event; add it.
Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c7976f5272486e4ff406014c4b43e2fa3b70b052 ]
In the ieee80211_setup_sdata() we check if the interface type is valid
and, if not, call BUG(). This should never happen, but if there is
something wrong with the code, it will not be caught until the bug
happens when an interface is being set up. Calling BUG() is too
extreme for this and a WARN_ON() would be better used instead. Change
that.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit be8f8284cd897af2482d4e54fbc2bdfc15557259 ]
Currently it is possible to add or update socket policies, but
not clear them. Therefore, once a socket policy has been applied,
the socket cannot be used for unencrypted traffic.
This patch allows (privileged) users to clear socket policies by
passing in a NULL pointer and zero length argument to the
{IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both
the incoming and outgoing policies being cleared.
The simple approach taken in this patch cannot clear socket
policies in only one direction. If desired this could be added
in the future, for example by continuing to pass in a length of
zero (which currently is guaranteed to return EMSGSIZE) and
making the policy be a pointer to an integer that contains one
of the XFRM_POLICY_{IN,OUT} enum values.
An alternative would have been to interpret the length as a
signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the
input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output
policy.
Tested: https://android-review.googlesource.com/539816
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d2891c4d071d807f01cc911dc42a68f4568d65cf ]
When adding 6lowpan devices very rapidly we sometimes see a crash:
[23122.306615] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0-43-arm64 #1 Debian 4.9.9.linaro.43-1
[23122.315400] Hardware name: HiKey Development Board (DT)
[23122.320623] task: ffff800075443080 task.stack: ffff800075484000
[23122.326551] PC is at expire_timers+0x70/0x150
[23122.330907] LR is at run_timer_softirq+0xa0/0x1a0
[23122.335616] pc : [<ffff000008142dd8>] lr : [<ffff000008142f58>] pstate: 600001c5
This was due to add_peer_chan() unconditionally initializing the
lowpan_btle_dev->notify_peers delayed work structure, even if the
lowpan_btle_dev passed into add_peer_chan() had previously been
initialized.
Normally, this would go unnoticed as the delayed work timer is set for
100 msec, however when calling add_peer_chan() faster than 100 msec it
clears out a previously queued delay work causing the crash above.
To fix this, let add_peer_chan() know when a new lowpan_btle_dev is passed
in so that it only performs the delay work initialization when needed.
Signed-off-by: Michael Scott <michael.scott@linaro.org>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 27bfbc21a0c0f711fa5382de026c7c0700c9ea28 ]
There is a race condition between a thread calling bt_accept_dequeue()
and a different thread calling bt_accept_unlink(). Protection against
concurrency is implemented using sk locking. However, sk locking causes
serialisation of the bt_accept_dequeue() and bt_accept_unlink() threads.
This serialisation can cause bt_accept_dequeue() to obtain the sk from the
parent list but becomes blocked waiting for the sk lock held by the
bt_accept_unlink() thread. bt_accept_unlink() unlinks sk and this thread
releases the sk lock unblocking bt_accept_dequeue() which potentially runs
bt_accept_unlink() again on the same sk causing a crash. The attempt to
double unlink the same sk from the parent list can cause a NULL pointer
dereference crash due to bt_sk(sk)->parent becoming NULL on the first
unlink, followed by the second unlink trying to execute
bt_sk(sk)->parent->sk_ack_backlog-- in bt_accept_unlink() which crashes.
When sk is in the parent list, bt_sk(sk)->parent will be not be NULL.
When sk is removed from the parent list, bt_sk(sk)->parent is set to
NULL. Therefore, add a defensive check for bt_sk(sk)->parent not being
NULL to ensure that sk is still in the parent list after the sk lock has
been taken in bt_accept_dequeue(). If bt_sk(sk)->parent is detected as
being NULL then restart the loop so that the loop variables are refreshed
to use the latest values. This is necessary as list_for_each_entry_safe()
is not thread safe so causing a risk of an infinite loop occurring as sk
could point to itself.
In addition, in bt_accept_dequeue() increase the sk reference count to
protect against early freeing of sk. Early freeing can be possible if the
bt_accept_unlink() thread calls l2cap_sock_kill() or rfcomm_sock_kill()
functions before bt_accept_dequeue() gets the sk lock.
For test purposes, the probability of failure can be increased by putting
a msleep of 1 second in bt_accept_dequeue() between getting the sk and
waiting for the sk lock. This exposes the fact that the loop
list_for_each_entry_safe(p, n, &bt_sk(parent)->accept_q) is not safe from
threads that unlink sk from the list in parallel with the loop which can
cause sk to become stale within the loop.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit add641e7dee31b36aee83412c29e39dd1f5e0c9c ]
after act_csum computes the checksum on skbs carrying GSO TCP/UDP packets,
subsequent segmentation fails because skb_needs_check(skb, true) returns
true. Because of that, skb_warn_bad_offload() is invoked and the following
message is displayed:
WARNING: CPU: 3 PID: 28 at net/core/dev.c:2553 skb_warn_bad_offload+0xf0/0xfd
<...>
[<ffffffff8171f486>] skb_warn_bad_offload+0xf0/0xfd
[<ffffffff8161304c>] __skb_gso_segment+0xec/0x110
[<ffffffff8161340d>] validate_xmit_skb+0x12d/0x2b0
[<ffffffff816135d2>] validate_xmit_skb_list+0x42/0x70
[<ffffffff8163c560>] sch_direct_xmit+0xd0/0x1b0
[<ffffffff8163c760>] __qdisc_run+0x120/0x270
[<ffffffff81613b3d>] __dev_queue_xmit+0x23d/0x690
[<ffffffff81613fa0>] dev_queue_xmit+0x10/0x20
Since GSO is able to compute checksum on individual segments of such skbs,
we can simply skip mangling the packet.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a3a5129e122709306cfa6409781716c2933df99b ]
Consider the following situation which has been found in a test setup:
Gateway B has claimed client C and gateway A has the same backbone
network as B. C sends a broad- or multicast to B and directly after
this packet decides to send another packet to A due to a better TQ
value. B will forward the broad-/multicast into the backbone as it is
the responsible gw and after that A will claim C as it has been
chosen by C as the best gateway. If it now happens that A claims C
before it has received the broad-/multicast forwarded by B (due to
backbone topology or due to some delay in B when forwarding the
packet) we get a critical situation: in the current code A will
immediately unclaim C when receiving the multicast due to the
roaming client scenario although the position of C has not changed
in the mesh. If this happens the multi-/broadcast forwarded by B
will be sent back into the mesh by A and we have looping packets
until one of the gateways claims C again.
In order to prevent this, unclaiming of a client due to the roaming
client scenario is only done after a certain time is expired after
the last claim of the client. 100 ms are used here, which should be
slow enough for big backbones and slow gateways but fast enough not
to break the roaming client use case.
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Andreas Pape <apape@phoenixcontact.com>
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 88997e4208aea117627898e5f6f9801cf3cd42d2 ]
wanted_features is a set of features which have to be enabled if a
hardware allows that.
Currently when a vlan device is created, its wanted_features is set to
current features of its base device.
The problem is that the base device can get new features and they are
not propagated to vlan-s of this device.
If we look at bonding devices, they doesn't have this problem and this
patch suggests to fix this issue by the same way how it works for bonding
devices.
We meet this problem, when we try to create a vlan device over a bonding
device. When a system are booting, real devices require time to be
initialized, so bonding devices created without slaves, then vlan
devices are created and only then ethernet devices are added to the
bonding device. As a result we have vlan devices with disabled
scatter-gather.
* create a bonding device
$ ip link add bond0 type bond
$ ethtool -k bond0 | grep scatter
scatter-gather: off
tx-scatter-gather: off [requested on]
tx-scatter-gather-fraglist: off [requested on]
* create a vlan device
$ ip link add link bond0 name bond0.10 type vlan id 10
$ ethtool -k bond0.10 | grep scatter
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off
* Add a slave device to bond0
$ ip link set dev eth0 master bond0
And now we can see that the bond0 device has got the scatter-gather
feature, but the bond0.10 hasn't got it.
[root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
[root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off
With this patch the vlan device will get all new features from the
bonding device.
Here is a call trace how features which are set in this patch reach
dev->wanted_features.
register_netdevice
vlan_dev_init
...
dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
NETIF_F_ALL_FCOE;
dev->features |= dev->hw_features;
...
dev->wanted_features = dev->features & dev->hw_features;
__netdev_update_features(dev);
vlan_dev_fix_features
...
Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5080f39e8c72e01cf37e8359023e7018e2a4901e ]
I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:
1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
machines
The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html
There appear to be two underlying bugs causing these effects:
- The first issue causes long delays when the rate is slow and no
delay is configured (e.g., "rate 512kbit"). This is because SKBs are
not orphaned when no delay is configured, so orphaning does not
occur until *after* the rate-induced delay has been applied. For
this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
dramatically increases the measured bandwidth.
- The second issue is that rate-induced delays are not correctly
applied, allowing SKB delays to occur in parallel. The indended
approach is to compute the delay for an SKB and to add this delay to
the end of the current queue. However, the code does not detect
existing SKBs in the queue due to improperly testing sch->q.qlen,
which is nonzero even when packets exist only in the
rbtree. Consequently, new SKBs do not wait for the current queue to
empty. When packet delays vary significantly (e.g., if packet sizes
are different), then this also causes unintended reordering.
I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.
Signed-off-by: Nik Unger <njunger@uwaterloo.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ae0ac0ed6fcf5af3be0f63eb935f483f44a402d2 upstream.
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.
This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.
As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f28e15bacedd444608e25421c72eb2cf4527c9ca upstream.
Keeps some noise away from a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d31eef5176df06f218201bc9c0ce40babb41660 upstream.
On SMP we overload the packet counter (unsigned long) to contain
percpu offset. Hide this from callers and pass xt_counters address
instead.
Preparation patch to allocate the percpu counters in page-sized batch
chunks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b078556aecd791b0e5cb3a59f4c3a14273b52121 upstream.
l4proto->manip_pkt() can cause reallocation of skb head so pointer
to the ipv6 header must be reloaded.
Reported-and-tested-by: <syzbot+10005f4292fc9cc89de7@syzkaller.appspotmail.com>
Fixes: 58a317f1061c89 ("netfilter: ipv6: add IPv6 NAT support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c4585a2823edf4d1326da44d1524ecbfda26bb37 upstream.
ebt_among is special, it has a dynamic match size and is exempt
from the central size checks.
Therefore it must check that the size of the match structure
provided from userspace is sane by making sure em->match_size
is at least the minimum size of the expected structure.
The module has such a check, but its only done after accessing
a structure that might be out of bounds.
tested with: ebtables -A INPUT ... \
--among-dst fe:fe:fe:fe:fe:fe
--among-dst fe:fe:fe:fe:fe:fe --among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fb,fe:fe:fe:fe:fc:fd,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe
--among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fa,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe,fe:fe:fe:fe:fe:fe
Reported-by: <syzbot+fe0b19af568972814355@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b71812168571fa55e44cdd0254471331b9c4c4c6 upstream.
We need to make sure the offsets are not out of range of the
total size.
Also check that they are in ascending order.
The WARN_ON triggered by syzkaller (it sets panic_on_warn) is
changed to also bail out, no point in continuing parsing.
Briefly tested with simple ruleset of
-A INPUT --limit 1/s' --log
plus jump to custom chains using 32bit ebtables binary.
Reported-by: <syzbot+845a53d13171abf8bf29@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cfc2c740533368b96e2be5e0a4e8c3cace7d9814 upstream.
We had one report from syzkaller [1]
First issue is that INIT_WORK() should be done before mod_timer()
or we risk timer being fired too soon, even with a 1 second timer.
Second issue is that we need to reject too big info->timeout
to avoid overflows in msecs_to_jiffies(info->timeout * 1000), or
risk looping, if result after overflow is 0.
[1]
WARNING: CPU: 1 PID: 5129 at kernel/workqueue.c:1444 __queue_work+0xdf4/0x1230 kernel/workqueue.c:1444
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 5129 Comm: syzkaller159866 Not tainted 4.16.0-rc1+ #230
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x211/0x2d0 lib/bug.c:184
fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:988
RIP: 0010:__queue_work+0xdf4/0x1230 kernel/workqueue.c:1444
RSP: 0018:ffff8801db507538 EFLAGS: 00010006
RAX: ffff8801aeb46080 RBX: ffff8801db530200 RCX: ffffffff81481404
RDX: 0000000000000100 RSI: ffffffff86b42640 RDI: 0000000000000082
RBP: ffff8801db507758 R08: 1ffff1003b6a0de5 R09: 000000000000000c
R10: ffff8801db5073f0 R11: 0000000000000020 R12: 1ffff1003b6a0eb6
R13: ffff8801b1067ae0 R14: 00000000000001f8 R15: dffffc0000000000
queue_work_on+0x16a/0x1c0 kernel/workqueue.c:1488
queue_work include/linux/workqueue.h:488 [inline]
schedule_work include/linux/workqueue.h:546 [inline]
idletimer_tg_expired+0x44/0x60 net/netfilter/xt_IDLETIMER.c:116
call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2d7/0xb85 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:541 [inline]
smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:777 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x5e/0xba kernel/locking/spinlock.c:184
RSP: 0018:ffff8801c20173c8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
RAX: dffffc0000000000 RBX: 0000000000000282 RCX: 0000000000000006
RDX: 1ffffffff0d592cd RSI: 1ffff10035d68d23 RDI: 0000000000000282
RBP: ffff8801c20173d8 R08: 1ffff10038402e47 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8820e5c8
R13: ffff8801b1067ad8 R14: ffff8801aea7c268 R15: ffff8801aea7c278
__debug_object_init+0x235/0x1040 lib/debugobjects.c:378
debug_object_init+0x17/0x20 lib/debugobjects.c:391
__init_work+0x2b/0x60 kernel/workqueue.c:506
idletimer_tg_create net/netfilter/xt_IDLETIMER.c:152 [inline]
idletimer_tg_checkentry+0x691/0xb00 net/netfilter/xt_IDLETIMER.c:213
xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850
check_target net/ipv6/netfilter/ip6_tables.c:533 [inline]
find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:575
translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:744
do_replace net/ipv6/netfilter/ip6_tables.c:1160 [inline]
do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1686
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt+0x10b/0x130 net/ipv6/ipv6_sockglue.c:927
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2976
SYSC_setsockopt net/socket.c:1850 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1829
do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit db57ccf0f2f4624b4c4758379f8165277504fbd7 upstream.
syzbot reported a division by 0 bug in the netfilter nat code:
divide error: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88
RSP: 0018:ffff8801b2466778 EFLAGS: 00010246
RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c
RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac
RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900
R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c
FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30
get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362
nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406
nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124
redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34
ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41
nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302
nf_nat_ipv6_local_fn+0x33/0x5d0
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407
ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69
nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
nf_hook include/linux/netfilter.h:243 [inline]
NF_HOOK include/linux/netfilter.h:286 [inline]
ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277
inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
dccp_connect+0x369/0x670 net/dccp/output.c:564
dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946
__inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620
inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684
SYSC_connect+0x213/0x4a0 net/socket.c:1639
SyS_connect+0x24/0x30 net/socket.c:1620
do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x441c69
RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69
RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590
R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000
Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b
45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48
c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01
RIP: nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778
The problem is that currently we don't have any check on the
configured port range. A port range == -1 triggers the bug, while
other negative values may require a very long time to complete the
following loop.
This commit addresses the issue swapping the two ends on negative
ranges. The check is performed in nf_nat_l4proto_unique_tuple() since
the nft nat loads the port values from nft registers at runtime.
v1 -> v2: use the correct 'Fixes' tag
v2 -> v3: update commit message, drop unneeded READ_ONCE()
Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack")
Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 10414014bc085aac9f787a5890b33b5605fbcfc4 upstream.
syzbot reported that xt_LED may try to use the ledinternal->timer
without previously initializing it:
------------[ cut here ]------------
kernel BUG at kernel/time/timer.c:958!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 1826 Comm: kworker/1:2 Not tainted 4.15.0+ #306
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:__mod_timer kernel/time/timer.c:958 [inline]
RIP: 0010:mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102
RSP: 0018:ffff8801d24fe9f8 EFLAGS: 00010293
RAX: ffff8801d25246c0 RBX: ffff8801aec6cb50 RCX: ffffffff816052c6
RDX: 0000000000000000 RSI: 00000000fffbd14b RDI: ffff8801aec6cb68
RBP: ffff8801d24fec98 R08: 0000000000000000 R09: 1ffff1003a49fd6c
R10: ffff8801d24feb28 R11: 0000000000000005 R12: dffffc0000000000
R13: ffff8801d24fec70 R14: 00000000fffbd14b R15: ffff8801af608f90
FS: 0000000000000000(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000206d6fd0 CR3: 0000000006a22001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
led_tg+0x1db/0x2e0 net/netfilter/xt_LED.c:75
ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
ip6table_raw_hook+0x65/0x80 net/ipv6/netfilter/ip6table_raw.c:42
nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
nf_hook.constprop.27+0x3f6/0x830 include/linux/netfilter.h:243
NF_HOOK include/linux/netfilter.h:286 [inline]
ndisc_send_skb+0xa51/0x1370 net/ipv6/ndisc.c:491
ndisc_send_ns+0x38a/0x870 net/ipv6/ndisc.c:633
addrconf_dad_work+0xb9e/0x1320 net/ipv6/addrconf.c:4008
process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
worker_thread+0x223/0x1990 kernel/workqueue.c:2247
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429
Code: 85 2a 0b 00 00 4d 8b 3c 24 4d 85 ff 75 9f 4c 8b bd 60 fd ff ff e8 bb
57 10 00 65 ff 0d 94 9a a1 7e e9 d9 fc ff ff e8 aa 57 10 00 <0f> 0b e8 a3
57 10 00 e9 14 fb ff ff e8 99 57 10 00 4c 89 bd 70
RIP: __mod_timer kernel/time/timer.c:958 [inline] RSP: ffff8801d24fe9f8
RIP: mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: ffff8801d24fe9f8
---[ end trace f661ab06f5dd8b3d ]---
The ledinternal struct can be shared between several different
xt_LED targets, but the related timer is currently initialized only
if the first target requires it. Fix it by unconditionally
initializing the timer struct.
v1 -> v2: call del_timer_sync() unconditionally, too.
Fixes: 268cb38e1802 ("netfilter: x_tables: add LED trigger target")
Reported-by: syzbot+10c98dc5725c6c8fc7fb@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 57ebd808a97d7c5b1e1afb937c2db22beba3c1f8 upstream.
The rationale for removing the check is only correct for rulesets
generated by ip(6)tables.
In iptables, a jump can only occur to a user-defined chain, i.e.
because we size the stack based on number of user-defined chains we
cannot exceed stack size.
However, the underlying binary format has no such restriction,
and the validation step only ensures that the jump target is a
valid rule start point.
IOW, its possible to build a rule blob that has no user-defined
chains but does contain a jump.
If this happens, no jump stack gets allocated and crash occurs
because no jumpstack was allocated.
Fixes: 7814b6ec6d0d6 ("netfilter: xtables: don't save/restore jumpstack offset")
Reported-by: syzbot+e783f671527912cd9403@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3968523f855050b8195134da951b87c20bd66130 upstream.
mpls_label_ok() validates that the 'platform_label' array index from a
userspace netlink message payload is valid. Under speculation the
mpls_label_ok() result may not resolve in the CPU pipeline until after
the index is used to access an array element. Sanitize the index to zero
to prevent userspace-controlled arbitrary out-of-bounds speculation, a
precursor for a speculative execution side channel vulnerability.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4:
- mpls_label_ok() doesn't take an extack parameter
- Drop change in mpls_getroute()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b7b386f42f079b25b942c756820e36c6bd09b2ca upstream.
mpls_route_add and mpls_route_del have the same checks on the label.
Move to a helper. Avoid duplicate extack messages in the next patch.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c ]
When SCTP makes INIT or INIT_ACK packet the total chunk length
can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
transmitting these packets, e.g. the crash on sending INIT_ACK:
[ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168
put:120156 head:000000007aa47635 data:00000000d991c2de
tail:0x1d640 end:0xfec0 dev:<NULL>
...
[ 597.976970] ------------[ cut here ]------------
[ 598.033408] kernel BUG at net/core/skbuff.c:104!
[ 600.314841] Call Trace:
[ 600.345829] <IRQ>
[ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
[ 600.436934] skb_put+0x16c/0x200
[ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp]
[ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp]
[ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
[ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
[ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp]
[ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp]
[ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
[ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp]
[ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
[ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
[ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp]
[ 601.233575] sctp_do_sm+0x182/0x560 [sctp]
[ 601.284328] ? sctp_has_association+0x70/0x70 [sctp]
[ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp]
[ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp]
...
Here the chunk size for INIT_ACK packet becomes too big, mostly
because of the state cookie (INIT packet has large size with
many address parameters), plus additional server parameters.
Later this chunk causes the panic in skb_put_data():
skb_packet_transmit()
sctp_packet_pack()
skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
'nskb' (head skb) was previously allocated with packet->size
from u16 'chunk->chunk_hdr->length'.
As suggested by Marcelo we should check the chunk's length in
_sctp_make_chunk() before trying to allocate skb for it and
discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 ]
When going through the bind address list in sctp_v6_get_dst() and
the previously found address is better ('matchlen > bmatchlen'),
the code continues to the next iteration without releasing currently
held destination.
Fix it by releasing 'bdst' before continue to the next iteration, and
instead of introducing one more '!IS_ERR(bdst)' check for dst_release(),
move the already existed one right after ip6_dst_lookup_flow(), i.e. we
shouldn't proceed further if we get an error for the route lookup.
Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary addresses for ipv6")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 350c9f484bde93ef229682eedd98cd5f74350f7f ]
BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :
/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;
But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)
What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.
So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.
To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs
Only tcp_tso_segs() has to enforce the gso_max_segs limit.
Tested:
ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast
Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
691 (ss -temoi shows cwnd is stuck around 6 )
667
651
631
517
After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
1733 (ss -temoi shows cwnd is around 386 )
1778
1746
1781
1718
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 93c62c45ed5fad1b87e3a45835b251cd68de9c46 ]
All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send
both parts of the iov[] buffer, but one of them does not. Fix it so that
it does.
Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc
header included, but no payload.
Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|