Age | Commit message (Collapse) | Author |
|
commit 735ce972fbc8a65fb17788debd7bbe7b4383cc62 upstream
As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.
Therefore, enforce an appropriate limit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
netfilter: nf_conntrack_h323: fix memory leak in module initialization error path
Upstream commit 8a548868db62422113104ebc658065e3fe976951
Properly free h323_buffer when helper registration fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
netfilter: nf_conntrack_h323: fix module unload crash
Upstream commit a56b8f81580761c65e4d8d0c04ac1cb7a788bdf1
The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:
CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000004 c00a67f0
$ 4 : 802a5ad0 81657e00 00000000 00000000
$ 8 : 00000008 801461c8 00000000 80570050
$12 : 819b0280 819b04b0 00000006 00000000
$16 : 802a5a60 80000000 80b46000 80321010
$20 : 00000000 00000004 802a5ad0 00000001
$24 : 00000000 802257a8
$28 : 802a4000 802a59e8 00000004 801d4e7c
Hi : 0000000b
Lo : 00506320
epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P
ra : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403 KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
...
Call Trace:
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4f8c>] nf_hook_slow+0x9c/0x1a4
One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.
Reported-by: liannan <liannan@twsz.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
Upstream commit ceeff7541e5a4ba8e8d97ffbae32b3f283cb7a3f
When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:
BUG: unable to handle kernel paging request at 00120fbd
IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
[<c038e728>] nla_parse+0x5c/0xb0
[<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
[<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
[<c0119aee>] update_curr+0x3d/0x52
[<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
[<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
[<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
[<c038d2ce>] netlink_rcv_skb+0x2d/0x71
[<c0390205>] nfnetlink_rcv+0x19/0x24
[<c038d0f5>] netlink_unicast+0x1b3/0x216
...
Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875
Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 507b06d0622480f8026d49a94f86068bb0fd6ed6 upstream
Otherwise userspace has no idea the IBSS creation succeeded.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ upstream commit: 8aca6cb1179ed9bef9351028c8d8af852903eae2 ]
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.
I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).
This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ upstream commit: 79d44516b4b178ffb6e2159c75584cfcfc097914 ]
If receiver consumes segments successfully only in-order, FRTO
fallback to conventional recovery produces RTO loop because
FRTO's forward transmissions will always get dropped and need to
be resent, yet by default they're not marked as lost (which are
the only segments we will retransmit in CA_Loss).
Price to pay about this is occassionally unnecessarily
retransmitting the forward transmission(s). SACK blocks help
a bit to avoid this, so it's mainly a concern for NewReno case
though SACK is not fully immune either.
This change has a side-effect of fixing SACKFRTO problem where
it didn't have snd_nxt of the RTO time available anymore when
fallback become necessary (this problem would have only occured
when RTO would occur for two or more segments and ECE arrives
in step 3; no need to figure out how to fix that unless the
TODO item of selective behavior is considered in future).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 62ab22278308a40bcb7f4079e9719ab8b7fe11b5 ]
Note: there's actually another bug in FRTO's SACK variant, which
is the causing failure in NewReno case because of the error
that's fixed here. I'll fix the SACK case separately (it's
a separate bug really, though related, but in order to fix that
I need to audit tp->snd_nxt usage a bit).
There were two places where SACK variant of FRTO is getting
incorrectly used even if SACK wasn't negotiated by the TCP flow.
This leads to incorrect setting of frto_highmark with NewReno
if a previous recovery was interrupted by another RTO.
An eventual fallback to conventional recovery then incorrectly
considers one or couple of segments as forward transmissions
though they weren't, which then are not LOST marked during
fallback making them "non-retransmittable" until the next RTO.
In a bad case, those segments are really lost and are the only
one left in the window. Thus TCP needs another RTO to continue.
The next FRTO, however, could again repeat the same events
making the progress of the TCP flow extremely slow.
In order for these events to occur at all, FRTO must occur
again in FRTOs step 3 while the key segments must be lost as
well, which is not too likely in practice. It seems to most
frequently with some small devices such as network printers
that *seem* to accept TCP segments only in-order. In cases
were key segments weren't lost, things get automatically
resolved because those wrongly marked segments don't need to be
retransmitted in order to continue.
I found a reproducer after digging up relevant reports (few
reports in total, none at netdev or lkml I know of), some
cases seemed to indicate middlebox issues which seems now
to be a false assumption some people had made. Bugzilla
#10063 _might_ be related. Damon L. Chesser <damon@damtek.com>
had a reproducable case and was kind enough to tcpdump it
for me. With the tcpdump log it was quite trivial to figure
out.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: a1c1f281b84a751fdb5ff919da3b09df7297619f ]
It seems that commit 009a2e3e4ec ("[TCP] FRTO: Improve
interoperability with other undo_marker users") run into
another land-mine which caused fallback to conventional
recovery to break:
1. Cumulative ACK arrives after FRTO retransmission
2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp
which should be kept like in CA_Loss state it would be
3. undo_marker change allowed tcp_packet_delayed to return
true because of the cleared retrans_stamp once FRTO is
terminated causing LossUndo to occur, which means all loss
markings FRTO made are reverted.
This means that the conventional recovery basically recovered
one loss per RTT, which is not that efficient. It was quite
unobvious that the undo_marker change broken something like
this, I had a quite long session to track it down because of
the non-intuitiviness of the bug (luckily I had a trivial
reproducer at hand and I was also able to learn to use kprobes
in the process as well :-)).
This together with the NewReno+FRTO fix and FRTO in-order
workaround this fixes Damon's problems, this and the first
mentioned are enough to fix Bugzilla #10063.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Sebastian Hyrwall <zibbe@cisko.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: a6604471db5e7a33474a7f16c64d6b118fae3e74 ]
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
1) DSACK in the middle of previous SACK block must be generated.
2) In order to take that particular branch, part or all of the
DSACKed segment must already be SACKed so that we have that
in cache in the first place.
3) The new info must be top enough so that fackets_out will be
updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.
It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.
Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).
This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.
This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 246eb2af060fc32650f07203c02bdc0456ad76c7 ]
This fixes inappropriately large cwnd growth on sender-limited flows
when GSO is enabled, limiting cwnd growth to 64k.
[ Backport to 2.6.25 by replacing sk->sk_gso_max_size with 65536 -DaveM ]
Signed-off-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: ce447eb91409225f8a488f6b7b2a1bdf7b2d884f ]
This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow
up to tcp_max_burst() even when sk_can_gso() is false, or when
sysctl_tcp_tso_win_divisor != 0.
Signed-off-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 7d227cd235c809c36c847d6a597956ad9e9d2bae ]
We are seeing an issue with TCP in handling an ICMP frag needed
message that is received after net.ipv4.tcp_retries1 retransmits.
The default value of retries1 is 3. So if the path mtu changes
and ICMP frag needed is lost for the first 3 retransmits or if
it gets delayed until 3 retransmits are done, TCP doesn't update
MSS correctly and continues to retransmit the orginal message
until it timesout after tcp_retries2 retransmits.
I am seeing this issue even with the latest 2.6.25.4 kernel.
In tcp_retransmit_timer(), when retransmits counter exceeds
tcp_retries1 value, the dst cache entry of the socket is reset.
At this time, if we receive an ICMP frag needed message, the
dst entry gets updated with the new MTU, but the TCP sockets
dst_cache entry remains NULL.
So the next time when we try to retransmit after the ICMP frag
needed is received, tcp_retransmit_skb() gets called. Here the
cur_mss value is calculated at the start of the routine with
a NULL sk_dst_cache. Instead we should call tcp_current_mss after
the rebuild_header that caches the dst entry with the updated mtu.
Also the rebuild_header should be called before tcp_fragment
so that skb is fragmented if the mss goes down.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 81d85346b3fcd8b3167eac8b5fb415a210bd4345 ]
Commit 30688a9 ([VLAN]: Handle vlan devices net namespace changing)
changed the device notifier to special-case notifications for VLAN
devices, effectively disabling state propagation to underlying VLAN
devices. This is needed for layered VLANs though, so restore the
original behaviour.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 1ac06e0306d0192a7a4d9ea1c9e06d355ce7e7d3 ]
Because the IPsec output function xfrm_output_resume does its
own dst_output call it should always call __ip_local_output
instead of ip_local_output as the latter may invoke dst_output
directly. Otherwise the return values from nf_hook and dst_output
may clash as they both use the value 1 but for different purposes.
When that clash occurs this can cause a packet to be used after
it has been freed which usually leads to a crash. Because the
offending value is only returned from dst_output with qdiscs
such as HTB, this bug is normally not visible.
Thanks to Marco Berizzi for his perseverance in tracking this
down.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: f2df824948d559ea818e03486a8583e42ea6ab37 ]
cls_api should return ENOENT when the requested classifier doesn't
exist.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 0e91796eb46e29edc791131c832a2232bcaed9dd ]
Am I just being particularly dim today, or can the call to
dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never
happen?
We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the
condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be
true.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ Upstream commit: 3f91bd420a955803421f2db17b2e04aacfbb2bb8 ]
Both copy_to_ and _from_user return the number of bytes, that failed to
reach their destination, not the 0/-EXXX values.
Based on patch from Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
commit 537d59af73d894750cff14f90fe2b6d77fbab15b in mainline
There's logic in __rfcomm_dlc_close:
rfcomm_dlc_lock(d);
d->state = BT_CLOSED;
d->state_changed(d, err);
rfcomm_dlc_unlock(d);
In rfcomm_dev_state_change, it's possible that rfcomm_dev_put try to
take the dlc lock, then we will deadlock.
Here fixed it by unlock dlc before rfcomm_dev_get in
rfcomm_dev_state_change.
why not unlock just before rfcomm_dev_put? it's because there's
another problem. rfcomm_dev_get/rfcomm_dev_del will take
rfcomm_dev_lock, but in rfcomm_dev_add the lock order is :
rfcomm_dev_lock --> dlc lock
so I unlock dlc before the taken of rfcomm_dev_lock.
Actually it's a regression caused by commit
1905f6c736cb618e07eca0c96e60e3c024023428 ("bluetooth :
__rfcomm_dlc_close lock fix"), the dlc state_change could be two
callbacks : rfcomm_sk_state_change and rfcomm_dev_state_change. I
missed the rfcomm_sk_state_change that time.
Thanks Arjan van de Ven <arjan@linux.intel.com> for the effort in
commit 4c8411f8c115def968820a4df6658ccfd55d7f1a ("bluetooth: fix
locking bug in the rfcomm socket cleanup handling") but he missed the
rfcomm_dev_state_change lock issue.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 7dccf1f4e1696c79bff064c3770867cc53cbc71c ]
in net/bluetooth/rfcomm/sock.c, rfcomm_sk_state_change() does the
following operation:
if (parent && sock_flag(sk, SOCK_ZAPPED)) {
/* We have to drop DLC lock here, otherwise
* rfcomm_sock_destruct() will dead lock. */
rfcomm_dlc_unlock(d);
rfcomm_sock_kill(sk);
rfcomm_dlc_lock(d);
}
}
which is fine, since rfcomm_sock_kill() will call sk_free() which will call
rfcomm_sock_destruct() which takes the rfcomm_dlc_lock()... so far so good.
HOWEVER, this assumes that the rfcomm_sk_state_change() function always gets
called with the rfcomm_dlc_lock() taken. This is the case for all but one
case, and in that case where we don't have the lock, we do a double unlock
followed by an attempt to take the lock, which due to underflow isn't
going anywhere fast.
This patch fixes this by moving the stragling case inside the lock, like
the other usages of the same call are doing in this code.
This was found with the help of the www.kerneloops.org project, where this
deadlock was observed 51 times at this point in time:
http://www.kerneloops.org/search.php?search=rfcomm_sock_destruct
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ Upstream commit: 7dccf1f4e1696c79bff064c3770867cc53cbc71c ]
There is only one function in AX25 calling skb_append(), and it really
looks suspicious: appends skb after previously enqueued one, but in
the meantime this previous skb could be removed from the queue.
This patch Fixes it the simple way, so this is not fully compatible with
the current method, but testing hasn't shown any problems.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
[ upstream commit: 4da5105687e0993a3bbdcffd89b2b94d9377faab ]
This propagates the xfrm_user fix made in commit
bcf0dda8d2408fe1c1040cdec5a98e5fcad2ac72 ("[XFRM]: xfrm_user: fix
selector family initialization")
Based upon a bug report from, and tested by, Alan Swanson.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
nf_ct_frag6_gather()
upstream commit: b9c698964614f71b9c8afeca163a945b4c2e2d20
[ 63.531438] =================================
[ 63.531520] [ INFO: inconsistent lock state ]
[ 63.531520] 2.6.26-rc4 #7
[ 63.531520] ---------------------------------
[ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[ 63.531520] {softirq-on-W} state was registered at:
[ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080
[ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0
[ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40
[ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...
According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
ESTABLISHED state
upstream commit: d2ee3f2c4b1db1320c1efb4dcaceeaf6c7e6c2d3
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4
Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
nf_conntrack_expect_init()
upstream commit: 12293bf91126ad253a25e2840b307fdc7c2754c3
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 01b7a314291b2ef56ad718ee1374a1bac4768b29
Using iptables 1.3.8 with kernel 2.6.25, rules which include '-m
iprange' don't automatically pull in xt_iprange module. Below patch
adds module aliases to fix that. Patch against latest -git, but seems
like a good candidate for -stable also.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5
- Don't trust a length which is greater than the working buffer.
An invalid length could cause overflow when calculating buffer size
for decoding oid.
- An oid length of zero is invalid and allows for an off-by-one error when
decoding oid because the first subid actually encodes first 2 subids.
- A primitive encoding may not have an indefinite length.
Thanks to Wei Wang from McAfee for report.
Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
[NETFILTER]: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
From: Arnaud Ebalard <arno@natisbad.org>
Upstream commit 9a732ed6d:
While reinjecting *bigger* modified versions of IPv6 packets using
libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too)
but I get the following on recents kernels (2.6.25, trace below is
against today's net-2.6 git tree):
skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0
------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT
Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000)
Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80
f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60
c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c
Call Trace:
[<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
[<c04cdbfc>] ? skb_put+0x3c/0x40
[<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
[<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160
[<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160
[<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160
[<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0
[<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30
[<c04f8c73>] ? netlink_unicast+0x243/0x2b0
[<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70
[<c04f9406>] ? netlink_sendmsg+0x1c6/0x270
[<c04c8244>] ? sock_sendmsg+0xc4/0xf0
[<c011970d>] ? set_next_entity+0x1d/0x50
[<c0133a80>] ? autoremove_wake_function+0x0/0x40
[<c0118f9e>] ? __wake_up_common+0x3e/0x70
[<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280
[<c011d308>] ? __wake_up+0x68/0x70
[<c02cea47>] ? copy_from_user+0x37/0x70
[<c04cfd7c>] ? verify_iovec+0x2c/0x90
[<c04c837a>] ? sys_sendmsg+0x10a/0x230
[<c011967a>] ? __dequeue_entity+0x2a/0xa0
[<c011970d>] ? set_next_entity+0x1d/0x50
[<c0345397>] ? pty_write+0x47/0x60
[<c033d59b>] ? tty_default_put_char+0x1b/0x20
[<c011d2e9>] ? __wake_up+0x49/0x70
[<c033df99>] ? tty_ldisc_deref+0x39/0x90
[<c033ff20>] ? tty_write+0x1a0/0x1b0
[<c04c93af>] ? sys_socketcall+0x7f/0x260
[<c0102ff9>] ? sysenter_past_esp+0x6a/0x91
[<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0
=======================
Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0
EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8
Looking at the code, I ended up in nfq_mangle() function (called by
nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to
the increased size of data passed to the function. AFAICT, it should ask
for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the
resulting sk_buff has not enough space to support the skb_put(skb, diff)
call a few lines later, this results in the call to skb_over_panic().
The patch below asks for allocation of a copy with enough space for
mangled packet and the same amount of headroom as old sk_buff. While
looking at how the regression appeared (e2b58a67), I noticed the same
pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects
those locations too.
Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things
are ok (2.6.25 and today's net-2.6 git tree).
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[NETFILTER]: nf_conntrack: padding breaks conntrack hash on ARM
Upstream commit 443a70d50:
commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()"
results in ARM platforms hashing uninitialised padding. This padding
doesn't exist on other architectures.
Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure
everything is initialised. There were only 4 bytes that
NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM).
Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: c2ab7ac225e29006b7117d6a9fe8f3be8d98b0c2 ]
The tx packet counting and the local loopback of CAN frames should
only happen in the case that the CAN frame has been enqueued to the
netdevice tx queue successfully.
Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 19443178fbfbf40db15c86012fc37df1a44ab857 ]
dccp_feat_change() validates length and on error is returning 1.
This happens to work since call chain is checking for 0 == success,
but this is returned to userspace, so make it a real error value.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 2ad17defd596ca7e8ba782d5fc6950ee0e99513c ]
Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556
where conn templates with protocol=IPPROTO_IP can oops backup box.
Result from ip_vs_proto_get() should be checked because
protocol value can be invalid or unsupported in backup. But
for valid message we should not fail for templates which use
IPPROTO_IP. Also, add checks to validate message limits and
connection state. Show state NONE for templates using IPPROTO_IP.
Fix tested and confirmed by L0op8ack <l0op8ack@hotmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 3ba08b00e0d8413d79be9cab8ec085ceb6ae6fd6 ]
There is lack of removing a class from the event queue while changing
from parent to leaf which can cause corruption of this rb tree. This
patch fixes a bug introduced by my patch: "sch_htb: turn intermediate
classes into leaves" commit: 160d5e10f87b1dc88fd9b84b31b1718e0fd76398.
Many thanks to Jan 'yanek' Bortl for finding a way to reproduce this
rare bug and narrowing the test case, which made possible proper
diagnosing.
This patch is recommended for all kernels starting from 2.6.20.
Reported-and-tested-by: Jan 'yanek' Bortl <yanek@ya.bofh.cz>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 27a27a2158f4fe56a29458449e880a52ddee3dc4 ]
Flowlabel text format was not correct and thus ambiguous.
For example, 0x00123 or 0x01203 are formatted as 0x123.
This is not what audit tools want.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 36ca34cc3b8335eb1fe8bd9a1d0a2592980c3f02 ]
Noticed by Paul Marks <paul@pmarks.net>.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: c5d18e984a313adf5a1a4ae69e0b1d93cf410229 ]
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.
The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.
After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.
The following patch does exactly that.
This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 653252c2302cdf2dfbca66a7e177f7db783f9efa ]
I found some places, that erroneously return the value obtained from
the copy_to_user() call: if some amount of bytes were not able to get
to the user (this is what this one returns) the proper behavior is to
return the -EFAULT error, not that number itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 43837b1e6c5aef803d57009a68db18df13e64892 ]
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xfbbd/3683 is leaving the kernel with locks still held!
1 lock held by xfbbd/3683:
#0: (sk_lock-AF_ROSE){--..}, at: [<c8cd1eb3>] rose_connect+0x73/0x420 [rose]
INFO: task xfbbd:3683 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfbbd D 00000246 0 3683 3669
c6965ee0 00000092 c02c5c40 00000246 c0f6b5f0 c0f6b5c0 c0f6b5f0 c0f6b5c0
c0f6b614 c6965f18 c024b74b ffffffff c06ba070 00000000 00000000 00000001
c6ab07c0 c012d450 c0f6b634 c0f6b634 c7b5bf10 c0d6004c c7b5bf10 c6965f40
Call Trace:
[<c024b74b>] lock_sock_nested+0x6b/0xd0
[<c012d450>] ? autoremove_wake_function+0x0/0x40
[<c02488f1>] sock_fasync+0x41/0x150
[<c0249e69>] sock_close+0x19/0x40
[<c0175d54>] __fput+0xb4/0x170
[<c0176018>] fput+0x18/0x20
[<c017300e>] filp_close+0x3e/0x70
[<c01744e9>] sys_close+0x69/0xb0
[<c0103bda>] sysenter_past_esp+0x5f/0xa5
=======================
INFO: lockdep is turned off.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: c9c1014b2bd014c7ec037bbb6f58818162fdb265 ]
ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
held. This bogus warnings when running in atomic context, which
f.e. happens when adding secondary unicast addresses through
macvlan or vlan or when synchronizing multicast addresses from
wireless devices.
Mid-term we might want to consider moving all address updates
to process context since the locking seems overly complicated,
for now just fix the bogus warning by changing ASSERT_RTNL to
use mutex_is_locked().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[ Upstream commit: 8d390efd903485923419584275fd0c2aa4c94183 ]
tcp_probe has a bounds-checking bug that causes many programs (less,
python) to crash reading /proc/net/tcp_probe. When it outputs a log
line to the reader, it only checks if that line alone will fit in the
reader's buffer, rather than that line and all the previous lines it
has already written.
tcpprobe_read also returns the wrong value if copy_to_user fails--it
just passes on the return value of copy_to_user (number of bytes not
copied), which makes a failure look like a success.
This patch fixes the buffer overflow and sets the return value to
-EFAULT if copy_to_user fails.
Patch is against latest net-2.6; tested briefly and seems to fix the
crashes in less and python.
Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Add return value indication to tcp_prune_ofo_queue().
PS3: gelic: fix the oops on the broken IE returned from the hypervisor
b43legacy: fix DMA mapping leakage
mac80211: remove message on receiving unexpected unencrypted frames
Update rt2x00 MAINTAINERS entry
Add rfkill to MAINTAINERS file
rfkill: Fix device type check when toggling states
b43legacy: Fix usage of struct device used for DMAing
ssb: Fix usage of struct device used for DMAing
MAINTAINERS: move to generic repository for iwlwifi
b43legacy: fix initvals loading on bcm4303
rtl8187: Add missing priv->vif assignments
netconsole: only set CON_PRINTBUFFER if the user specifies a netconsole
[CAN]: Update documentation of struct sockaddr_can
MAINTAINERS: isdn4linux@listserv.isdn4linux.de is subscribers-only
[TCP]: Fix never pruned tcp out-of-order queue.
[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop
|
|
Describe debug parameters with their names (and not their values).
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Returns non-zero if tp->out_of_order_queue was seen non-empty.
This allows tcp_try_rmem_schedule() to return early.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some people are getting this message a lot, and we have traced it to
broken access points that much too often send completely empty frames
(all bytes zeroed, which they shouldn't do at all.)
Since we cannot do anything about such frames in any case except the
special case where we're debugging an AP, just remove the message.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
rfkill_switch_all() is supposed to only switch all the interfaces of a
given type, but does not actually do this; instead, it just switches
everything currently in the same state.
Add the necessary type check in.
(This fixes a bug I've been seeing while developing an rfkill laptop
driver, with both bluetooth and wireless simultaneously changing state
after only pressing either KEY_WLAN or KEY_BLUETOOTH).
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
tcp_prune_queue() doesn't prune an out-of-order queue at all.
Therefore sk_rmem_schedule() can fail but the out-of-order queue isn't
pruned . This can lead to tcp deadlock state if the next two
conditions are held:
1. There are a sequence hole between last received in
order segment and segments enqueued to the out-of-order queue.
2. Size of all segments in the out-of-order queue is more than tcp_mem[2].
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC_H_MAJ(parentid) for root classes is the same as for ingress, and if
ingress qdisc is created qdisc_lookup() returns its pointer (without
ingress NULL is returned). After this all qdisc_lookups give the same,
and we get endless loop. (I don't know how this could hide for so long
- it should trigger with every leaf class deleted if it's qdisc isn't
empty.)
After this fix qdisc_lookup() is omitted both for ingress and root
parents, but looking for root is only wasting a little time here...
Many thanks to Enrico Demarin for finding a test for catching this
bug, which probably bothered quite a lot of admins.
Reported-by: Enrico Demarin <enrico@superclick.com>,
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
[BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
[NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
[IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
[IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
[IPV6]: Use appropriate sock tclass setting for routing lookup.
[IPV6]: IPv6 extension header structures need to be packed.
[IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
[NET]: Return more appropriate error from eth_validate_addr().
[ISDN]: Do not validate ISDN net device address prior to interface-up
[NET]: Fix kernel-doc for skb_segment
[SOCK] sk_stamp: should be initialized to ktime_set(-1L, 0)
net: check for underlength tap writes
net: make struct tun_struct private to tun.c
[SCTP]: IPv4 vs IPv6 addresses mess in sctp_inet[6]addr_event.
[SCTP]: Fix compiler warning about const qualifiers
[SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK
[SCTP]: Add check for hmac_algo parameter in sctp_verify_param()
[NET_SCHED] cls_u32: refcounting fix for u32_delete()
[DCCP]: Fix skb->cb conflicts with IP
[AX25]: Potential ax25_uid_assoc-s leaks on module unload.
...
|
|
The bridge netfilter code attaches a fake dst_entry with a pointer to a
fake net_device structure to skbs it passes up to IPv4 netfilter. This
leads to crashes when the skb is passed to __ip_route_output_key when
dereferencing the namespace pointer.
Since bridging can currently only operate in the init_net namespace,
the easiest fix for now is to initialize the nd_net pointer of the
fake net_device struct to &init_net.
Should fix bugzilla 10323: http://bugzilla.kernel.org/show_bug.cgi?id=10323
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
_entry_put
Consider we are putting a clusterip_config entry with the "entries"
count == 1, and on the other CPU there's a clusterip_config_find_get
in progress:
CPU1: CPU2:
clusterip_config_entry_put: clusterip_config_find_get:
if (atomic_dec_and_test(&c->entries)) {
/* true */
read_lock_bh(&clusterip_lock);
c = __clusterip_config_find(clusterip);
/* found - it's still in list */
...
atomic_inc(&c->entries);
read_unlock_bh(&clusterip_lock);
write_lock_bh(&clusterip_lock);
list_del(&c->list);
write_unlock_bh(&clusterip_lock);
...
dev_put(c->dev);
Oops! We have an entry returned by the clusterip_config_find_get,
which is a) not in list b) has a stale dev pointer.
The problems will happen when the CPU2 will release the entry - it
will remove it from the list for the 2nd time, thus spoiling it, and
will put a stale dev pointer.
The fix is to make atomic_dec_and_test under the clusterip_lock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|