linux-toradex.git/net/sched, branch v4.4.97

net: sched: fix NULL pointer dereference when action calls some targets

2017-08-30T08:19:21+00:00

[ Upstream commit 4f8a881acc9d1adaf1e552349a0b1df28933a04c ]

As we know in some target's checkentry it may dereference par.entryinfo
to check entry stuff inside. But when sched action calls xt_check_target,
par.entryinfo is set with NULL. It would cause kernel panic when calling
some targets.

It can be reproduce with:
  # tc qd add dev eth1 ingress handle ffff:
  # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
    -j ECN --ecn-tcp-remove

It could also crash kernel when using target CLUSTERIP or TPROXY.

By now there's no proper value for par.entryinfo in ipt_init_target,
but it can not be set with NULL. This patch is to void all these
panics by setting it with an ipt_entry obj with all members = 0.

Note that this issue has been there since the very beginning.

Signed-off-by: Xin Long 
Acked-by: Pablo Neira Ayuso 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net_sched/sfq: update hierarchical backlog when drop packet

2017-08-30T08:19:19+00:00

[ Upstream commit 325d5dc3f7e7c2840b65e4a2988c082c2c0025c5 ]

When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target

2017-08-13T02:29:08+00:00

[ Upstream commit 96d9703050a0036a3360ec98bb41e107c90664fe ]

Commit 55917a21d0cc ("netfilter: x_tables: add context to know if
extension runs from nft_compat") introduced a member nft_compat to
xt_tgchk_param structure.

But it didn't set it's value for ipt_init_target. With unexpected
value in par.nft_compat, it may return unexpected result in some
target's checkentry.

This patch is to set all it's fields as 0 and only initialize the
non-zero fields in ipt_init_target.

v1->v2:
  As Wang Cong's suggestion, fix it by setting all it's fields as
  0 and only initializing the non-zero fields.

Fixes: 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat")
Suggested-by: Cong Wang 
Signed-off-by: Xin Long 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: sched: Fix one possible panic when no destroy callback

2017-07-21T05:44:54+00:00

commit c1a4872ebfb83b1af7144f7b29ac8c4b344a12a8 upstream.

When qdisc fail to init, qdisc_create would invoke the destroy callback
to cleanup. But there is no check if the callback exists really. So it
would cause the panic if there is no real destroy callback like the qdisc
codel, fq, and so on.

Take codel as an example following:
When a malicious user constructs one invalid netlink msg, it would cause
codel_init->codel_change->nla_parse_nested failed.
Then kernel would invoke the destroy callback directly but qdisc codel
doesn't define one. It causes one panic as a result.

Now add one the check for destroy to avoid the possible panic.

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Gao Feng 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net_sched: fix error recovery at qdisc creation

2017-07-21T05:44:54+00:00

commit 87b60cfacf9f17cf71933c6e33b66e68160af71d upstream.

Dmitry reported uses after free in qdisc code [1]

The problem here is that ops->init() can return an error.

qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.

Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.

This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.

[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
 ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
 ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
Call Trace:
 [] __dump_stack lib/dump_stack.c:15 [inline]
 [] dump_stack+0x6c/0x98 lib/dump_stack.c:51
 [] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [] print_address_description mm/kasan/report.c:196 [inline]
 [] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
 [] kasan_report mm/kasan/report.c:305 [inline]
 [] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
 [] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
 [] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
 [] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
 [] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
 [] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
 [] __dev_open+0x221/0x320 net/core/dev.c:1403
 [] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
 [] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
 [] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
 [] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
 [] sock_do_ioctl+0x99/0xb0 net/socket.c:879
 [] sock_ioctl+0x2a0/0x390 net/socket.c:958
 [] vfs_ioctl fs/ioctl.c:44 [inline]
 [] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
 [] SYSC_ioctl fs/ioctl.c:626 [inline]
 [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
 [] entry_SYSCALL_64_fastpath+0x12/0x17

Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net_sched: close another race condition in tcf_mirred_release()

2017-05-03T04:19:49+00:00

commit dc327f8931cb9d66191f489eb9a852fc04530546 upstream.

We saw the following extra refcount release on veth device:

  kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1

Since we heavily use mirred action to redirect packets to veth, I think
this is caused by the following race condition:

CPU0:
tcf_mirred_release(): (in RCU callback)
	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);

CPU1:
mirred_device_event():
        spin_lock_bh(&mirred_list_lock);
        list_for_each_entry(m, &mirred_list, tcfm_list) {
                if (rcu_access_pointer(m->tcfm_dev) == dev) {
                        dev_put(dev);
                        /* Note : no rcu grace period necessary, as
                         * net_device are already rcu protected.
                         */
                        RCU_INIT_POINTER(m->tcfm_dev, NULL);
                }
        }
        spin_unlock_bh(&mirred_list_lock);

CPU0:
tcf_mirred_release():
        spin_lock_bh(&mirred_list_lock);
        list_del(&m->tcfm_list);
        spin_unlock_bh(&mirred_list_lock);
        if (dev)               // <======== Stil refers to the old m->tcfm_dev
                dev_put(dev);  // <======== dev_put() is called on it again

The action init code path is good because it is impossible to modify
an action that is being removed.

So, fix this by moving everything under the spinlock.

Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path")
Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
Acked-by: Jamal Hadi Salim 
Signed-off-by: David S. Miller 
Cc: Julia Lawall 
Signed-off-by: Greg Kroah-Hartman

net sched actions: decrement module reference count after table flush.

2017-03-22T11:04:18+00:00

[ Upstream commit edb9d1bff4bbe19b8ae0e71b1f38732591a9eeb2 ]

When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.

Following is example with GACT action:

% sudo modprobe act_gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  1
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....

After the fix:
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module                  Size  Used by
act_gact               16384  3
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
%
% sudo tc actions flush action gact
% lsmod
Module                  Size  Used by
act_gact               16384  0
% sudo rmmod act_gact
% lsmod
Module                  Size  Used by
%

Fixes: f97017cdefef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

act_connmark: avoid crashing on malformed nlattrs with null parms

2017-03-22T11:04:16+00:00

[ Upstream commit 52491c7607c5527138095edf44c53169dc1ddb82 ]

tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.

[501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[501099.043039] IP: [] tcf_connmark_init+0x8b/0x180 [act_connmark]
...
[501099.044334] Call Trace:
[501099.044345]  [] ? tcf_action_init_1+0x198/0x1b0
[501099.044363]  [] ? tcf_action_init+0xb0/0x120
[501099.044380]  [] ? tcf_exts_validate+0xc4/0x110
[501099.044398]  [] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417]  [] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436]  [] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454]  [] ? security_capable+0x41/0x60
[501099.044471]  [] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490]  [] ? rtnl_newlink+0x870/0x870
[501099.044507]  [] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524]  [] ? rtnetlink_rcv+0x24/0x30
[501099.044541]  [] ? netlink_unicast+0x184/0x230
[501099.044558]  [] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576]  [] ? sock_sendmsg+0x30/0x40
[501099.044592]  [] ? SYSC_sendto+0xd3/0x150
[501099.044608]  [] ? __do_page_fault+0x2d1/0x510
[501099.044626]  [] ? system_call_fast_compare_end+0xc/0x9b

Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action")
Signed-off-by: Étienne Noss 
Signed-off-by: Victorien Molle 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net, sched: fix soft lockup in tc_classify

2017-01-15T12:41:34+00:00

[ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ]

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein 
Signed-off-by: Daniel Borkmann 
Cc: Cong Wang 
Acked-by: Eric Dumazet 
Tested-by: Shahar Klein 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net/sched: pedit: make sure that offset is valid

2016-12-10T18:07:23+00:00

[ Upstream commit 95c2027bfeda21a28eb245121e6a249f38d0788e ]

Add a validation function to make sure offset is valid:
1. Not below skb head (could happen when offset is negative).
2. Validate both 'offset' and 'at'.

Signed-off-by: Amir Vadai 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman