<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/sched, branch v3.14.31</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: Use netlink_ns_capable to verify the permisions of netlink messages</title>
<updated>2014-06-26T19:15:38+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-04-23T21:29:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=be0ef855baab7248d0fc71cdf78a47fcfd3708f1'/>
<id>be0ef855baab7248d0fc71cdf78a47fcfd3708f1</id>
<content type='text'>
[ Upstream commit 90f62cf30a78721641e08737bda787552428061e ]

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 90f62cf30a78721641e08737bda787552428061e ]

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: fix an oops in tcindex filter</title>
<updated>2014-05-31T20:20:39+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-05-19T19:15:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1ff9c00a7a14af5c848d00eb264cb4e6c9cb30a6'/>
<id>1ff9c00a7a14af5c848d00eb264cb4e6c9cb30a6</id>
<content type='text'>
[ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ]

Kelly reported the following crash:

        IP: [&lt;ffffffff817a993d&gt;] tcf_action_exec+0x46/0x90
        PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
        Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
        CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
        RIP: 0010:[&lt;ffffffff817a993d&gt;]  [&lt;ffffffff817a993d&gt;] tcf_action_exec+0x46/0x90
        RSP: 0018:ffff8800d21b9b90  EFLAGS: 00010283
        RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
        RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
        RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
        R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
        R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
        FS:  00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
        Stack:
         ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
         ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
         ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
        Call Trace:
         [&lt;ffffffff817c55bb&gt;] tcindex_classify+0x88/0x9b
         [&lt;ffffffff817a7f7d&gt;] tc_classify_compat+0x3e/0x7b
         [&lt;ffffffff817a7fdf&gt;] tc_classify+0x25/0x9f
         [&lt;ffffffff817b0e68&gt;] htb_enqueue+0x55/0x27a
         [&lt;ffffffff817b6c2e&gt;] dsmark_enqueue+0x165/0x1a4
         [&lt;ffffffff81775642&gt;] __dev_queue_xmit+0x35e/0x536
         [&lt;ffffffff8177582a&gt;] dev_queue_xmit+0x10/0x12
         [&lt;ffffffff818f8ecd&gt;] packet_sendmsg+0xb26/0xb9a
         [&lt;ffffffff810b1507&gt;] ? __lock_acquire+0x3ae/0xdf3
         [&lt;ffffffff8175cf08&gt;] __sock_sendmsg_nosec+0x25/0x27
         [&lt;ffffffff8175d916&gt;] sock_aio_write+0xd0/0xe7
         [&lt;ffffffff8117d6b8&gt;] do_sync_write+0x59/0x78
         [&lt;ffffffff8117d84d&gt;] vfs_write+0xb5/0x10a
         [&lt;ffffffff8117d96a&gt;] SyS_write+0x49/0x7f
         [&lt;ffffffff8198e212&gt;] system_call_fastpath+0x16/0x1b

This is because we memcpy struct tcindex_filter_result which contains
struct tcf_exts, obviously struct list_head can not be simply copied.
This is a regression introduced by commit 33be627159913b094bb578
(net_sched: act: use standard struct list_head).

It's not very easy to fix it as the code is a mess:

       if (old_r)
               memcpy(&amp;cr, r, sizeof(cr));
       else {
               memset(&amp;cr, 0, sizeof(cr));
               tcf_exts_init(&amp;cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
       }
       ...
       tcf_exts_change(tp, &amp;cr.exts, &amp;e);
       ...
       memcpy(r, &amp;cr, sizeof(cr));

the above code should equal to:

        tcindex_filter_result_init(&amp;cr);
        if (old_r)
               cr.res = r-&gt;res;
        ...
        if (old_r)
               tcf_exts_change(tp, &amp;r-&gt;exts, &amp;e);
        else
               tcf_exts_change(tp, &amp;cr.exts, &amp;e);
        ...
        r-&gt;res = cr.res;

after this change, since there is no need to copy struct tcf_exts.

And it also fixes other places zero'ing struct's contains struct tcf_exts.

Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
Reported-by: Kelly Anderson &lt;kelly@xilka.com&gt;
Tested-by: Kelly Anderson &lt;kelly@xilka.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ]

Kelly reported the following crash:

        IP: [&lt;ffffffff817a993d&gt;] tcf_action_exec+0x46/0x90
        PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
        Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
        CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
        RIP: 0010:[&lt;ffffffff817a993d&gt;]  [&lt;ffffffff817a993d&gt;] tcf_action_exec+0x46/0x90
        RSP: 0018:ffff8800d21b9b90  EFLAGS: 00010283
        RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
        RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
        RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
        R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
        R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
        FS:  00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
        Stack:
         ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
         ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
         ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
        Call Trace:
         [&lt;ffffffff817c55bb&gt;] tcindex_classify+0x88/0x9b
         [&lt;ffffffff817a7f7d&gt;] tc_classify_compat+0x3e/0x7b
         [&lt;ffffffff817a7fdf&gt;] tc_classify+0x25/0x9f
         [&lt;ffffffff817b0e68&gt;] htb_enqueue+0x55/0x27a
         [&lt;ffffffff817b6c2e&gt;] dsmark_enqueue+0x165/0x1a4
         [&lt;ffffffff81775642&gt;] __dev_queue_xmit+0x35e/0x536
         [&lt;ffffffff8177582a&gt;] dev_queue_xmit+0x10/0x12
         [&lt;ffffffff818f8ecd&gt;] packet_sendmsg+0xb26/0xb9a
         [&lt;ffffffff810b1507&gt;] ? __lock_acquire+0x3ae/0xdf3
         [&lt;ffffffff8175cf08&gt;] __sock_sendmsg_nosec+0x25/0x27
         [&lt;ffffffff8175d916&gt;] sock_aio_write+0xd0/0xe7
         [&lt;ffffffff8117d6b8&gt;] do_sync_write+0x59/0x78
         [&lt;ffffffff8117d84d&gt;] vfs_write+0xb5/0x10a
         [&lt;ffffffff8117d96a&gt;] SyS_write+0x49/0x7f
         [&lt;ffffffff8198e212&gt;] system_call_fastpath+0x16/0x1b

This is because we memcpy struct tcindex_filter_result which contains
struct tcf_exts, obviously struct list_head can not be simply copied.
This is a regression introduced by commit 33be627159913b094bb578
(net_sched: act: use standard struct list_head).

It's not very easy to fix it as the code is a mess:

       if (old_r)
               memcpy(&amp;cr, r, sizeof(cr));
       else {
               memset(&amp;cr, 0, sizeof(cr));
               tcf_exts_init(&amp;cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
       }
       ...
       tcf_exts_change(tp, &amp;cr.exts, &amp;e);
       ...
       memcpy(r, &amp;cr, sizeof(cr));

the above code should equal to:

        tcindex_filter_result_init(&amp;cr);
        if (old_r)
               cr.res = r-&gt;res;
        ...
        if (old_r)
               tcf_exts_change(tp, &amp;r-&gt;exts, &amp;e);
        else
               tcf_exts_change(tp, &amp;cr.exts, &amp;e);
        ...
        r-&gt;res = cr.res;

after this change, since there is no need to copy struct tcf_exts.

And it also fixes other places zero'ing struct's contains struct tcf_exts.

Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
Reported-by: Kelly Anderson &lt;kelly@xilka.com&gt;
Tested-by: Kelly Anderson &lt;kelly@xilka.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: lock imbalance in hhf qdisc</title>
<updated>2014-05-31T20:20:36+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-05-01T16:23:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ab7ba76731a124ef9a70b07004c63b09e260b0d4'/>
<id>ab7ba76731a124ef9a70b07004c63b09e260b0d4</id>
<content type='text'>
[ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ]

hhf_change() takes the sch_tree_lock and releases it but misses the
error cases. Fix the missed case here.

To reproduce try a command like this,

# tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ]

hhf_change() takes the sch_tree_lock and releases it but misses the
error cases. Fix the missed case here.

To reproduce try a command like this,

# tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: fq: do not hold qdisc lock while allocating memory</title>
<updated>2014-03-10T20:17:52+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-03-07T06:57:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2818fa0fa068bcbc87d6bd9064e3c1f72d6fcc2a'/>
<id>2818fa0fa068bcbc87d6bd9064e3c1f72d6fcc2a</id>
<content type='text'>
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: move the sanity test in qdisc_list_add()</title>
<updated>2014-03-10T19:44:21+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-03-08T16:01:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=37314363cd65d19c71bea5f222e5108c93dc3c78'/>
<id>37314363cd65d19c71bea5f222e5108c93dc3c78</id>
<content type='text'>
The WARN_ON(root == &amp;noop_qdisc)) added in qdisc_list_add()
can trigger in normal conditions when devices are not up.
It should be done only right before the list_add_tail() call.

Fixes: e57a784d8cae4 ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
Reported-by: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Tested-by: Mirco Tischler &lt;mt-ml@gmx.de&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The WARN_ON(root == &amp;noop_qdisc)) added in qdisc_list_add()
can trigger in normal conditions when devices are not up.
It should be done only right before the list_add_tail() call.

Fixes: e57a784d8cae4 ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
Reported-by: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Tested-by: Mirco Tischler &lt;mt-ml@gmx.de&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sch_tbf: Fix potential memory leak in tbf_change().</title>
<updated>2014-02-27T17:53:50+00:00</updated>
<author>
<name>Hiroaki SHIMODA</name>
<email>shimoda.hiroaki@gmail.com</email>
</author>
<published>2014-02-26T12:43:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=724b9e1d75ab3401aaa081bd4efb440c1b3509db'/>
<id>724b9e1d75ab3401aaa081bd4efb440c1b3509db</id>
<content type='text'>
The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Cc: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Cc: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: Cleanup PIE comments</title>
<updated>2014-02-13T23:29:58+00:00</updated>
<author>
<name>Vijay Subramanian</name>
<email>vijaynsu@cisco.com</email>
</author>
<published>2014-02-13T02:58:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=219e288e8900fac65211e0a23e2a1037fd521af1'/>
<id>219e288e8900fac65211e0a23e2a1037fd521af1</id>
<content type='text'>
Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
more details. Also add references to algorithm (IETF draft and paper) to top of
file.

Signed-off-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
CC: Mythili Prabhu &lt;mysuryan@cisco.com&gt;
CC: Norbert Kiesel &lt;nkiesel@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
more details. Also add references to algorithm (IETF draft and paper) to top of
file.

Signed-off-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
CC: Mythili Prabhu &lt;mysuryan@cisco.com&gt;
CC: Norbert Kiesel &lt;nkiesel@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add and use skb_gso_transport_seglen()</title>
<updated>2014-01-27T06:38:23+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-01-26T09:58:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=de960aa9ab4decc3304959f69533eef64d05d8e8'/>
<id>de960aa9ab4decc3304959f69533eef64d05d8e8</id>
<content type='text'>
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sch_htb: let skb-&gt;priority refer to non-leaf class</title>
<updated>2014-01-23T01:39:48+00:00</updated>
<author>
<name>Harry Mason</name>
<email>harry.mason@smoothwall.net</email>
</author>
<published>2014-01-17T13:22:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=29824310ce3e320726a45475a38fd0c3f0eaad60'/>
<id>29824310ce3e320726a45475a38fd0c3f0eaad60</id>
<content type='text'>
If the class in skb-&gt;priority is not a leaf, apply filters from the
selected class, not the qdisc. This lets netfilter or user space
partially classify the packet.

Signed-off-by: Harry Mason &lt;harry.mason@smoothwall.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the class in skb-&gt;priority is not a leaf, apply filters from the
selected class, not the qdisc. This lets netfilter or user space
partially classify the packet.

Signed-off-by: Harry Mason &lt;harry.mason@smoothwall.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>reciprocal_divide: update/correction of the algorithm</title>
<updated>2014-01-22T07:17:20+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2014-01-22T01:29:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=809fa972fd90ff27225294b17a027e908b2d7b7a'/>
<id>809fa972fd90ff27225294b17a027e908b2d7b7a</id>
<content type='text'>
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c4809fa5
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d8d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1&lt;&lt;32)*((1&lt;&lt;l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')&gt;&gt;32
  q = (t+((n-t)&gt;&gt;sh_1))&gt;&gt;sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki &lt;darkjames-ws@darkjames.pl&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Austin S Hemmelgarn &lt;ahferroin7@gmail.com&gt;
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross &lt;jesse@nicira.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Andy Gospodarek &lt;andy@greyhouse.net&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Jay Vosburgh &lt;fubar@us.ibm.com&gt;
Cc: Jakub Zawadzki &lt;darkjames-ws@darkjames.pl&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c4809fa5
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d8d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1&lt;&lt;32)*((1&lt;&lt;l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')&gt;&gt;32
  q = (t+((n-t)&gt;&gt;sh_1))&gt;&gt;sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki &lt;darkjames-ws@darkjames.pl&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Austin S Hemmelgarn &lt;ahferroin7@gmail.com&gt;
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross &lt;jesse@nicira.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Andy Gospodarek &lt;andy@greyhouse.net&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Jay Vosburgh &lt;fubar@us.ibm.com&gt;
Cc: Jakub Zawadzki &lt;darkjames-ws@darkjames.pl&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
