<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net, branch v3.2.64</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>l2tp: fix race while getting PMTU on PPP pseudo-wire</title>
<updated>2014-11-05T20:27:49+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>g.nault@alphalink.fr</email>
</author>
<published>2014-09-03T12:12:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=544bd1bf08f824dc972832fba6d92b34a5a93f69'/>
<id>544bd1bf08f824dc972832fba6d92b34a5a93f69</id>
<content type='text'>
commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb upstream.

Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.

The dst_mtu(__sk_dst_get(tunnel-&gt;sock)) call was racy. __sk_dst_get()
could return NULL if tunnel-&gt;sock-&gt;sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:

[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[&lt;ffffffffa049db88&gt;]  [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005]  [&lt;ffffffffa049da80&gt;] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005]  [&lt;ffffffff81109b57&gt;] ? might_fault+0x9e/0xa5
[ 1937.664005]  [&lt;ffffffff81109b0e&gt;] ? might_fault+0x55/0xa5
[ 1937.664005]  [&lt;ffffffff8114c566&gt;] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005]  [&lt;ffffffff81309196&gt;] SYSC_connect+0x87/0xb1
[ 1937.664005]  [&lt;ffffffff813e56f7&gt;] ? sysret_check+0x1b/0x56
[ 1937.664005]  [&lt;ffffffff8107590d&gt;] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005]  [&lt;ffffffff81213dee&gt;] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005]  [&lt;ffffffff8114c262&gt;] ? spin_lock+0x9/0xb
[ 1937.664005]  [&lt;ffffffff813092b4&gt;] SyS_connect+0x9/0xb
[ 1937.664005]  [&lt;ffffffff813e56d2&gt;] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 &lt;48&gt; 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP  [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005]  RSP &lt;ffff8800c43c7de8&gt;
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---

Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault &lt;g.nault@alphalink.fr&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb upstream.

Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.

The dst_mtu(__sk_dst_get(tunnel-&gt;sock)) call was racy. __sk_dst_get()
could return NULL if tunnel-&gt;sock-&gt;sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:

[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[&lt;ffffffffa049db88&gt;]  [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005]  [&lt;ffffffffa049da80&gt;] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005]  [&lt;ffffffff81109b57&gt;] ? might_fault+0x9e/0xa5
[ 1937.664005]  [&lt;ffffffff81109b0e&gt;] ? might_fault+0x55/0xa5
[ 1937.664005]  [&lt;ffffffff8114c566&gt;] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005]  [&lt;ffffffff81309196&gt;] SYSC_connect+0x87/0xb1
[ 1937.664005]  [&lt;ffffffff813e56f7&gt;] ? sysret_check+0x1b/0x56
[ 1937.664005]  [&lt;ffffffff8107590d&gt;] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005]  [&lt;ffffffff81213dee&gt;] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005]  [&lt;ffffffff8114c262&gt;] ? spin_lock+0x9/0xb
[ 1937.664005]  [&lt;ffffffff813092b4&gt;] SyS_connect+0x9/0xb
[ 1937.664005]  [&lt;ffffffff813e56d2&gt;] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 &lt;48&gt; 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP  [&lt;ffffffffa049db88&gt;] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005]  RSP &lt;ffff8800c43c7de8&gt;
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---

Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault &lt;g.nault@alphalink.fr&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack</title>
<updated>2014-11-05T20:27:49+00:00</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2014-07-10T06:24:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7756f3a8792b7696f3e7aed47671359af8bf549e'/>
<id>7756f3a8792b7696f3e7aed47671359af8bf549e</id>
<content type='text'>
commit 2627b7e15c5064ddd5e578e4efd948d48d531a3f upstream.

commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack")
added second ip_vs_conn_drop_conntrack call instead of just adding
the needed check. As result, the first call still can cause
crash on netns exit. Remove it.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Hans Schillstrom &lt;hans@schillstrom.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2627b7e15c5064ddd5e578e4efd948d48d531a3f upstream.

commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack")
added second ip_vs_conn_drop_conntrack call instead of just adding
the needed check. As result, the first call still can cause
crash on netns exit. Remove it.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Hans Schillstrom &lt;hans@schillstrom.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sctp: fix remote memory pressure from excessive queueing</title>
<updated>2014-11-05T20:27:48+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2014-10-09T20:55:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3a8c709ba4cf6fe86f5069c71325029d412bcf1e'/>
<id>3a8c709ba4cf6fe86f5069c71325029d412bcf1e</id>
<content type='text'>
commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------&gt;
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------&gt;

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -&gt; sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------&gt;
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------&gt;

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -&gt; sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sctp: fix panic on duplicate ASCONF chunks</title>
<updated>2014-11-05T20:27:48+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2014-10-09T20:55:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9a3c6f2e051b608181aff9345481e586b2d54fc9'/>
<id>9a3c6f2e051b608181aff9345481e586b2d54fc9</id>
<content type='text'>
commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b -----------------&gt;

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk-&gt;list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b -----------------&gt;

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk-&gt;list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks</title>
<updated>2014-11-05T20:27:48+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2014-10-09T20:55:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aa001b043dde50e2856fe9460bc819d2a70dc309'/>
<id>aa001b043dde50e2856fe9460bc819d2a70dc309</id>
<content type='text'>
commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:&lt;NULL&gt;
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 &lt;IRQ&gt;
 [&lt;ffffffff8144fb1c&gt;] skb_put+0x5c/0x70
 [&lt;ffffffffa01ea1c3&gt;] sctp_addto_chunk+0x63/0xd0 [sctp]
 [&lt;ffffffffa01eadaf&gt;] sctp_process_asconf+0x1af/0x540 [sctp]
 [&lt;ffffffff8152d025&gt;] ? _read_unlock_bh+0x15/0x20
 [&lt;ffffffffa01e0038&gt;] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [&lt;ffffffffa01e3751&gt;] sctp_do_sm+0x71/0x1210 [sctp]
 [&lt;ffffffff8147645d&gt;] ? fib_rules_lookup+0xad/0xf0
 [&lt;ffffffffa01e6b22&gt;] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [&lt;ffffffffa01e8393&gt;] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [&lt;ffffffffa01ee986&gt;] sctp_inq_push+0x56/0x80 [sctp]
 [&lt;ffffffffa01fcc42&gt;] sctp_rcv+0x982/0xa10 [sctp]
 [&lt;ffffffffa01d5123&gt;] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [&lt;ffffffff8148bdc9&gt;] ? nf_iterate+0x69/0xb0
 [&lt;ffffffff81496d10&gt;] ? ip_local_deliver_finish+0x0/0x2d0
 [&lt;ffffffff8148bf86&gt;] ? nf_hook_slow+0x76/0x120
 [&lt;ffffffff81496d10&gt;] ? ip_local_deliver_finish+0x0/0x2d0
 [&lt;ffffffff81496ded&gt;] ip_local_deliver_finish+0xdd/0x2d0
 [&lt;ffffffff81497078&gt;] ip_local_deliver+0x98/0xa0
 [&lt;ffffffff8149653d&gt;] ip_rcv_finish+0x12d/0x440
 [&lt;ffffffff81496ac5&gt;] ip_rcv+0x275/0x350
 [&lt;ffffffff8145c88b&gt;] __netif_receive_skb+0x4ab/0x750
 [&lt;ffffffff81460588&gt;] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------&gt;

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param-&gt;param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2:
 - Adjust context
 - sctp_sf_violation_paramlen() doesn't take a struct net * parameter]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:&lt;NULL&gt;
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 &lt;IRQ&gt;
 [&lt;ffffffff8144fb1c&gt;] skb_put+0x5c/0x70
 [&lt;ffffffffa01ea1c3&gt;] sctp_addto_chunk+0x63/0xd0 [sctp]
 [&lt;ffffffffa01eadaf&gt;] sctp_process_asconf+0x1af/0x540 [sctp]
 [&lt;ffffffff8152d025&gt;] ? _read_unlock_bh+0x15/0x20
 [&lt;ffffffffa01e0038&gt;] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [&lt;ffffffffa01e3751&gt;] sctp_do_sm+0x71/0x1210 [sctp]
 [&lt;ffffffff8147645d&gt;] ? fib_rules_lookup+0xad/0xf0
 [&lt;ffffffffa01e6b22&gt;] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [&lt;ffffffffa01e8393&gt;] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [&lt;ffffffffa01ee986&gt;] sctp_inq_push+0x56/0x80 [sctp]
 [&lt;ffffffffa01fcc42&gt;] sctp_rcv+0x982/0xa10 [sctp]
 [&lt;ffffffffa01d5123&gt;] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [&lt;ffffffff8148bdc9&gt;] ? nf_iterate+0x69/0xb0
 [&lt;ffffffff81496d10&gt;] ? ip_local_deliver_finish+0x0/0x2d0
 [&lt;ffffffff8148bf86&gt;] ? nf_hook_slow+0x76/0x120
 [&lt;ffffffff81496d10&gt;] ? ip_local_deliver_finish+0x0/0x2d0
 [&lt;ffffffff81496ded&gt;] ip_local_deliver_finish+0xdd/0x2d0
 [&lt;ffffffff81497078&gt;] ip_local_deliver+0x98/0xa0
 [&lt;ffffffff8149653d&gt;] ip_rcv_finish+0x12d/0x440
 [&lt;ffffffff81496ac5&gt;] ip_rcv+0x275/0x350
 [&lt;ffffffff8145c88b&gt;] __netif_receive_skb+0x4ab/0x750
 [&lt;ffffffff81460588&gt;] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] -------------&gt;
  &lt;----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO --------------------&gt;
  &lt;-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------&gt;

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param-&gt;param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2:
 - Adjust context
 - sctp_sf_violation_paramlen() doesn't take a struct net * parameter]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: reuse ip6_frag_id from ip6_ufo_append_data</title>
<updated>2014-11-05T20:27:47+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2014-02-21T01:55:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8db33010af3020af7f4904b2dfffc9841ffc42e4'/>
<id>8db33010af3020af7f4904b2dfffc9841ffc42e4</id>
<content type='text'>
commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream.

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream.

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: reallocate addrconf router for ipv6 address when lo device up</title>
<updated>2014-11-05T20:27:47+00:00</updated>
<author>
<name>chenweilong</name>
<email>chenweilong@huawei.com</email>
</author>
<published>2014-08-06T08:18:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a0a8667a54a43250f962bf778c031c50c31e9142'/>
<id>a0a8667a54a43250f962bf778c031c50c31e9142</id>
<content type='text'>
It fix the bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951

The patch can't be applied directly, as it' used the function introduced
by "commit 94e187c0" ip6_rt_put(), that patch can't be applied directly
either.

====================

From: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;

commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 upstream.

This commit don't have a stable tag, but it fix the bug
no reply after loopback down-up.It's very worthy to be
applied to stable 3.4 kernels.

The bug is 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951


CC: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
CC: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Reported-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[weilong: s/ip6_rt_put/dst_release]
Signed-off-by: Chen Weilong &lt;chenweilong@huawei.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It fix the bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951

The patch can't be applied directly, as it' used the function introduced
by "commit 94e187c0" ip6_rt_put(), that patch can't be applied directly
either.

====================

From: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;

commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 upstream.

This commit don't have a stable tag, but it fix the bug
no reply after loopback down-up.It's very worthy to be
applied to stable 3.4 kernels.

The bug is 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951


CC: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
CC: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Reported-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[weilong: s/ip6_rt_put/dst_release]
Signed-off-by: Chen Weilong &lt;chenweilong@huawei.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: disable bh while doing route gc</title>
<updated>2014-11-05T20:27:47+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>mleitner@redhat.com</email>
</author>
<published>2014-10-13T17:03:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4715883ba814db9635baf74e378580bd27a534bd'/>
<id>4715883ba814db9635baf74e378580bd27a534bd</id>
<content type='text'>
Further tests revealed that after moving the garbage collector to a work
queue and protecting it with a spinlock may leave the system prone to
soft lockups if bottom half gets very busy.

It was reproced with a set of firewall rules that REJECTed packets. If
the NIC bottom half handler ends up running on the same CPU that is
running the garbage collector on a very large cache, the garbage
collector will not be able to do its job due to the amount of work
needed for handling the REJECTs and also won't reschedule.

The fix is to disable bottom half during the garbage collecting, as it
already was in the first place (most calls to it came from softirqs).

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Further tests revealed that after moving the garbage collector to a work
queue and protecting it with a spinlock may leave the system prone to
soft lockups if bottom half gets very busy.

It was reproced with a set of firewall rules that REJECTed packets. If
the NIC bottom half handler ends up running on the same CPU that is
running the garbage collector on a very large cache, the garbage
collector will not be able to do its job due to the amount of work
needed for handling the REJECTs and also won't reschedule.

The fix is to disable bottom half during the garbage collecting, as it
already was in the first place (most calls to it came from softirqs).

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: avoid parallel route cache gc executions</title>
<updated>2014-11-05T20:27:47+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>mleitner@redhat.com</email>
</author>
<published>2014-08-14T19:44:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ad5ca98f54c3b6604a7bb72059973a85deb0b779'/>
<id>ad5ca98f54c3b6604a7bb72059973a85deb0b779</id>
<content type='text'>
When rt_intern_hash() has to deal with neighbour cache overflowing,
it triggers the route cache garbage collector in an attempt to free
some references on neighbour entries.

Such call cannot be done async but should also not run in parallel with
an already-running one, so that they don't collapse fighting over the
hash lock entries.

This patch thus blocks parallel executions with spinlocks:
- A call from worker and from rt_intern_hash() are not the same, and
cannot be merged, thus they will wait each other on rt_gc_lock.
- Calls to gc from rt_intern_hash() may happen in parallel but we must
wait for it to finish in order to try again. This dedup and
synchrinozation is then performed by the locking just before calling
__do_rt_garbage_collect().

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When rt_intern_hash() has to deal with neighbour cache overflowing,
it triggers the route cache garbage collector in an attempt to free
some references on neighbour entries.

Such call cannot be done async but should also not run in parallel with
an already-running one, so that they don't collapse fighting over the
hash lock entries.

This patch thus blocks parallel executions with spinlocks:
- A call from worker and from rt_intern_hash() are not the same, and
cannot be merged, thus they will wait each other on rt_gc_lock.
- Calls to gc from rt_intern_hash() may happen in parallel but we must
wait for it to finish in order to try again. This dedup and
synchrinozation is then performed by the locking just before calling
__do_rt_garbage_collect().

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: move route garbage collector to work queue</title>
<updated>2014-11-05T20:27:46+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>mleitner@redhat.com</email>
</author>
<published>2014-08-14T19:44:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6c383b3a565dbf07cf6665e5da535f8668479453'/>
<id>6c383b3a565dbf07cf6665e5da535f8668479453</id>
<content type='text'>
Currently the route garbage collector gets called by dst_alloc() if it
have more entries than the threshold. But it's an expensive call, that
don't really need to be done by then.

Another issue with current way is that it allows running the garbage
collector with the same start parameters on multiple CPUs at once, which
is not optimal. A system may even soft lockup if the cache is big enough
as the garbage collectors will be fighting over the hash lock entries.

This patch thus moves the garbage collector to run asynchronously on a
work queue, much similar to how rt_expire_check runs.

There is one condition left that allows multiple executions, which is
handled by the next patch.

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the route garbage collector gets called by dst_alloc() if it
have more entries than the threshold. But it's an expensive call, that
don't really need to be done by then.

Another issue with current way is that it allows running the garbage
collector with the same start parameters on multiple CPUs at once, which
is not optimal. A system may even soft lockup if the cache is big enough
as the garbage collectors will be fighting over the hash lock entries.

This patch thus moves the garbage collector to run asynchronously on a
work queue, much similar to how rt_expire_check runs.

There is one condition left that allows multiple executions, which is
handled by the next patch.

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
