summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2008-02-06INET: Fix netdev renaming and inet address labelsMark McLoughlin
[INET]: Fix netdev renaming and inet address labels [ Upstream commit: 44344b2a85f03326c7047a8c861b0c625c674839 ] When re-naming an interface, the previous secondary address labels get lost e.g. $> brctl addbr foo $> ip addr add 192.168.0.1 dev foo $> ip addr add 192.168.0.2 dev foo label foo:00 $> ip addr show dev foo | grep inet inet 192.168.0.1/32 scope global foo inet 192.168.0.2/32 scope global foo:00 $> ip link set foo name bar $> ip addr show dev bar | grep inet inet 192.168.0.1/32 scope global bar inet 192.168.0.2/32 scope global bar:2 Turns out to be a simple thinko in inetdev_changename() - clearly we want to look at the address label, rather than the device name, for a suffix to retain. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-06IPV4: ip_gre: set mac_header correctly in receive pathTimo Teras
[IPV4] ip_gre: set mac_header correctly in receive path [ Upstream commit: 1d0691674764098304ae4c63c715f5883b4d3784 ] mac_header update in ipgre_recv() was incorrectly changed to skb_reset_mac_header() when it was introduced. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-06IPV4 ROUTE: ip_rt_dump() is unecessary slowEric Dumazet
[IPV4] ROUTE: ip_rt_dump() is unecessary slow [ Upstream commit: d8c9283089287341c85a0a69de32c2287a990e71 ] I noticed "ip route list cache x.y.z.t" can be *very* slow. While strace-ing -T it I also noticed that first part of route cache is fetched quite fast : recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041> while the part at the end of the table is more expensive: recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848> The following patch corrects this performance/latency problem, removing quadratic behavior. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14TCP: illinois: Incorrect beta usageStephen Hemminger
[TCP] illinois: Incorrect beta usage [ Upstream commit: a357dde9df33f28611e6a3d4f88265e39bcc8880 ] Lachlan Andrew observed that my TCP-Illinois implementation uses the beta value incorrectly: The parameter beta in the paper specifies the amount to decrease *by*: that is, on loss, W <- W - beta*W but in tcp_illinois_ssthresh() uses beta as the amount to decrease *to*: W <- beta*W This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network, hurting performance. Note: since the base beta value is .5, it has no impact on a congested network. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14IPV4: Remove bogus ifdef mess in arp_processAdrian Bunk
[IPV4]: Remove bogus ifdef mess in arp_process [ Upstream commit: 3660019e5f96fd9a8b7d4214a96523c0bf7b676d ] The #ifdef's in arp_process() were not only a mess, they were also wrong in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or CONFIG_NETDEV_10000=y) cases. Since they are not required this patch removes them. Also removed are some #ifdef's around #include's that caused compile errors after this change. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14NET: Corrects a bug in ip_rt_acct_read()Eric Dumazet
[NET]: Corrects a bug in ip_rt_acct_read() [ Upstream commit: 483b23ffa3a5f44767038b0a676d757e0668437e ] It seems that stats of cpu 0 are counted twice, since for_each_possible_cpu() is looping on all possible cpus, including 0 Before percpu conversion of ip_rt_acct, we should also remove the assumption that CPU 0 is online (or even possible) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14TCP: MTUprobe: fix potential sk_send_head corruptionIlpo Järvinen
[TCP] MTUprobe: fix potential sk_send_head corruption [ Upstream commit: 6e42141009ff18297fe19d19296738b742f861db ] When the abstraction functions got added, conversion here was made incorrectly. As a result, the skb may end up pointing to skb which got included to the probe skb and then was freed. For it to trigger, however, skb_transmit must fail sending as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14TCP: Problem bug with sysctl_tcp_congestion_control functionSam Jansen
[TCP]: Problem bug with sysctl_tcp_congestion_control function [ Upstream commit: 5487796f0c9475586277a0a7a91211ce5746fa6a ] sysctl_tcp_congestion_control seems to have a bug that prevents it from actually calling the tcp_set_default_congestion_control function. This is not so apparent because it does not return an error and generally the /proc interface is used to configure the default TCP congestion control algorithm. This is present in 2.6.18 onwards and probably earlier, though I have not inspected 2.6.15--2.6.17. sysctl_tcp_congestion_control calls sysctl_string and expects a successful return code of 0. In such a case it actually sets the congestion control algorithm with tcp_set_default_congestion_control. Otherwise, it returns the value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string returned 0 on success. However, sysctl_string was updated to return 1 on success around about 2.6.15 and sysctl_tcp_congestion_control was not updated. Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy converts this return code to '0', so the caller never notices the error. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14nf_nat: fix memset errorLi Zefan
This patch fixes an incorrect memset in the NAT code, causing misbehaviour when unloading and reloading the NAT module. Applies to stable-2.6.22 and stable-2.6.23. Please apply, thanks. [NETFILTER]: nf_nat: fix memset error Upstream commit e0bf9cf15fc30d300b7fbd821c6bc975531fab44 The size passing to memset is the size of a pointer. Fixes misbehaviour when unloading and reloading the NAT module. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-21Fix crypto_alloc_comp() error checking.Herbert Xu
[IPSEC]: Fix crypto_alloc_comp error checking [ Upstream commit: 4999f3621f4da622e77931b3d33ada6c7083c705 ] The function crypto_alloc_comp returns an errno instead of NULL to indicate error. So it needs to be tested with IS_ERR. This is based on a patch by Vicenç Beltran Querol. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16TCP: Make sure write_queue_from does not begin with NULL ptr (CVE-2007-5501)Ilpo Järvinen
patch 96a2d41a3e495734b63bff4e5dd0112741b93b38 in mainline. NULL ptr can be returned from tcp_write_queue_head to cached_skb and then assigned to skb if packets_out was zero. Without this, system is vulnerable to a carefully crafted ACKs which obviously is remotely triggerable. Besides, there's very little that needs to be done in sacktag if there weren't any packets outstanding, just skipping the rest doesn't hurt. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-02Fix TCP MD5 on big-endian.David Miller
changeset f8ab18d2d987a59ccbf0495032b2aef05b730037 in mainline. Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-02Fix TCP's ->fastpath_cnt_hit handling.Ilpo Järvinen
changeset 48611c47d09023d9356e78550d1cadb8d61da9c8 in mainline. When only GSO skb was partially ACKed, no hints are reset, therefore fastpath_cnt_hint must be tweaked too or else it can corrupt fackets_out. The corruption to occur, one must have non-trivial ACK/SACK sequence, so this bug is not very often that harmful. There's a fackets_out state reset in TCP because fackets_out is known to be inaccurate and that fixes the issue eventually anyway. In case there was also at least one skb that got fully ACKed, the fastpath_skb_hint is set to NULL which causes a recount for fastpath_cnt_hint (the old value won't be accessed anymore), thus it can safely be decremented without additional checking. Reported by Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-26Fix TCP DSACK cwnd handlingIlpo Järvinen
commit 49ff4bb4cd4c04acf8f9e3d3ec2148305a1db445 in mainline. [TCP]: DSACK signals data receival, be conservative In case a DSACK is received, it's better to lower cwnd as it's a sign of data receival. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-26Handle snd_una in tcp_cwnd_down()Ilpo Järvinen
commit 6ee8009e38006da81d2a53da1aaa27365552553e in mainline Subject: [PATCH 1/1] [TCP]: Also handle snd_una changes in tcp_cwnd_down tcp_cwnd_down must check for it too as it should be conservative in case of collapse stuff and also when receiver is trying to lie (though it wouldn't be successful anyway). Note: - Separated also is_dupack and do_lost in fast_retransalert * Much cleaner look-and-feel now * This time it really fixes cumulative ACK + many new SACK blocks recovery entry (I claimed this fixes with last patch but it wasn't). TCP will now call tcp_update_scoreboard regardless of is_dupack when in recovery as long as there is enough fackets_out. - Introduce FLAG_SND_UNA_ADVANCED * Some prior_snd_una arguments are unnecessary after it - Added helper FLAG_ANY_PROGRESS to avoid long FLAG...|FLAG... constructs This is a reduced version of a mainline patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-26Fix IPSEC AH4 options handlingNick Bowler
commit 8ee4f391831cb96916a8e8a05f04b1c1d7dd30d8 in mainline. In testing our ESP/AH offload hardware, I discovered an issue with how AH handles mutable fields in IPv4. RFC 4302 (AH) states the following on the subject: For IPv4, the entire option is viewed as a unit; so even though the type and length fields within most options are immutable in transit, if an option is classified as mutable, the entire option is zeroed for ICV computation purposes. The current implementation does not zero the type and length fields, resulting in authentication failures when communicating with hosts that do (i.e. FreeBSD). I have tested record route and timestamp options (ping -R and ping -T) on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one router. In the presence of these options, the FreeBSD and Linux hosts (with the patch or with the hardware) can communicate. The Windows XP host simply fails to accept these packets with or without the patch. I have also been trying to test source routing options (using traceroute -g), but haven't had much luck getting this option to work *without* AH, let alone with. Signed-off-by: Nick Bowler <nbowler@ellipticsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-26Fix inet_diag OOPS.Patrick McHardy
commit 0a9c73014415d2a84dac346c1e12169142a6ad37 in mainline [INET_DIAG]: Fix oops in netlink_rcv_skb netlink_run_queue() doesn't handle multiple processes processing the queue concurrently. Serialize queue processing in inet_diag to fix a oops in netlink_rcv_skb caused by netlink_run_queue passing a NULL for the skb. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054 [349587.500454] printing eip: [349587.500457] c03318ae [349587.500459] *pde = 00000000 [349587.500464] Oops: 0000 [#1] [349587.500466] PREEMPT SMP [349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa [349587.500483] CPU: 0 [349587.500485] EIP: 0060:[<c03318ae>] Not tainted VLI [349587.500487] EFLAGS: 00010246 (2.6.22.3 #1) [349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e [349587.500506] eax: 00000000 ebx: 00000000 ecx: c148d2a0 edx: c0398819 [349587.500510] esi: 00000000 edi: c0398819 ebp: c7a21c8c esp: c7a21c80 [349587.500517] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [349587.500521] Process oidentd (pid: 17943, ti=c7a20000 task=cee231c0 task.ti=c7a20000) [349587.500527] Stack: 00000000 c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 0000004c [349587.500542] f736f000 c7a21cb4 c03988e3 00000001 f7c8ba00 c7a21cc4 c03312a5 0000004c [349587.500558] f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fffffff [349587.500578] Call Trace: [349587.500581] [<c010361a>] show_trace_log_lvl+0x1c/0x33 [349587.500591] [<c01036d4>] show_stack_log_lvl+0x8d/0xaa [349587.500595] [<c010390e>] show_registers+0x1cb/0x321 [349587.500604] [<c0103bff>] die+0x112/0x1e1 [349587.500607] [<c01132d2>] do_page_fault+0x229/0x565 [349587.500618] [<c03c8d3a>] error_code+0x72/0x78 [349587.500625] [<c0331962>] netlink_run_queue+0x40/0x76 [349587.500632] [<c03988e3>] inet_diag_rcv+0x1f/0x2c [349587.500639] [<c03312a5>] netlink_data_ready+0x57/0x59 [349587.500643] [<c0330681>] netlink_sendskb+0x24/0x45 [349587.500651] [<c03307c6>] netlink_unicast+0x100/0x116 [349587.500656] [<c0330f83>] netlink_sendmsg+0x1c2/0x280 [349587.500664] [<c02fcce9>] sock_sendmsg+0xba/0xd5 [349587.500671] [<c02fe4d1>] sys_sendmsg+0x17b/0x1e8 [349587.500676] [<c02fe92d>] sys_socketcall+0x230/0x24d [349587.500684] [<c01028d2>] syscall_call+0x7/0xb [349587.500691] ======================= [349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 <8b> 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f Reported by Athanasius <link@miggy.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-26Fix device address listing for ipv4.Stephen Hemminger
commit 596e41509550447b030f7b16adaeb0138ab585a8 in mainline Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876 Not all ips are shown by "ip addr show" command when IPs number assigned to an interface is more than 60-80 (in fact it depends on broadcast/label etc presence on each address). Steps to reproduce: It's terribly simple to reproduce: # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done # ip addr show this will _not_ show all IPs. Looks like the problem is in netlink/ipv4 message processing. This is fix from bug submitter, it looks correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-30TCP: Fix TCP handling of SACK in bidirectional flows.Ilpo Järvinen
It's possible that new SACK blocks that should trigger new LOST markings arrive with new data (which previously made is_dupack false). In addition, I think this fixes a case where we get a cumulative ACK with enough SACK blocks to trigger the fast recovery (is_dupack would be false there too). I'm not completely pleased with this solution because readability of the code is somewhat questionable as 'is_dupack' in SACK case is no longer about dupacks only but would mean something like 'lost_marker_work_todo' too... But because of Eifel stuff done in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to the if statement which seems attractive solution. Nevertheless, I didn't like adding another variable just for that either... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-30TCP: Fix TCP rate-halving on bidirectional flows.Ilpo Järvinen
Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed for ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-30TCP: Do not autobind ports for TCP socketsDavid Miller
[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-09Netfilter: Fix logging regressionPatrick McHardy
[NETFILTER]: Fix logging regression Loading one of the LOG target fails if a different target has already registered itself as backend for the same family. This can affect the ipt_LOG and ipt_ULOG modules when both are loaded. Reported and tested by: <t.artem@mailcity.com> Upstream-commit: 7e2acc7e Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-09nf_conntrack: don't track locally generated special ICMP errorYasuyuki Kozakai
[NETFILTER]: nf_conntrack: don't track locally generated special ICMP error The conntrack assigned to locally generated ICMP error is usually the one assigned to the original packet which has caused the error. But if the original packet is handled as invalid by nf_conntrack, no conntrack is assigned to the original packet. Then nf_ct_attach() cannot assign any conntrack to the ICMP error packet. In that case the current nf_conntrack_icmp assigns appropriate conntrack to it. But the current code mistakes the direction of the packet. As a result, NAT code mistakes the address to be mangled. To fix the bug, this changes nf_conntrack_icmp not to assign conntrack to such ICMP error. Actually no address is necessary to be mangled in this case. Spotted by Jordan Russell. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Upstream commit ID: 130e7a83d7ec8c5c673225e0fa8ea37b1ed507a5 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-09TCP FRTO retransmit bug fixIlpo Järvinen
[TCP]: Verify the presence of RETRANS bit when leaving FRTO For yet unknown reason, something cleared SACKED_RETRANS bit underneath FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-06-23[TCP] tcp_read_sock: Allow recv_actor() return return negative error value.Jens Axboe
tcp_read_sock() currently assumes that the recv_actor() only returns number of bytes copied. For network splice receive, we may have to return an error in some cases. So allow the actor to return a negative error value. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-18[IPVS]: Fix state variable on failure to start ipvs threadsNeil Horman
ip_vs currently fails to reset its ip_vs_sync_state variable if the sync thread fails to start properly. The result is that the kernel will report a running daemon when their actuall is none. If you issue the following commands: 1. ipvsadm --start-daemon master --mcast-interface bla 2. ipvsadm -L --daemon 3. ipvsadm --stop-daemon master Assuming that bla is not an actual interface, step 2 should return no data, but instead returns: $ ipvsadm -L --daemon master sync daemon (mcast=bla, syncid=0) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-15[TCP]: Fix logic breakage due to DSACK separationIlpo Järvinen
Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty of breaking DSACK counting, which should be done only for the SACK block reported by the DSACK instead of every SACK block that is received along with DSACK information. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-15[TCP]: Congestion control API RTT sampling fixIlpo Järvinen
Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT sampling of congestion control modules. Inaccurate timestamps could be fed to them without providing any way for them to identify such cases. Previously RTT sampler was called only if FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate timestamps nicely. In addition, the new behavior could give an invalid timestamp (zero) to RTT sampler if only skbs with TCPCB_RETRANS were ACKed. This solves both problems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-14[TCP]: Add missing break to TCP option parsing codeIlpo Järvinen
This flaw does not affect any behavior (currently). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-13[TCP]: Set initial_ssthresh default to zero in Cubic and BIC.David S. Miller
Because of the current default of 100, Cubic and BIC perform very poorly compared to standard Reno. In the worst case, this change makes Cubic and BIC as aggressive as Reno. So this change should be very safe. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-12[TCP]: Fix left_out setting during FRTOIlpo Järvinen
Without FRTO, the tcp_try_to_open is never called with lost_out > 0 (see tcp_time_to_recover). However, when FRTO is enabled, the !tp->lost condition is not used until end of FRTO because that way TCP avoids premature entry to fast recovery during FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-12[TCP]: Disable TSO if MD5SIG is enabled.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08[CIPSO]: Fix several unaligned kernel accesses in the CIPSO engine.Paul Moore
IPv4 options are not very well aligned within the packet and the format of a CIPSO option is even worse. The result is that the CIPSO engine in the kernel does a few unaligned accesses when parsing and validating incoming packets with CIPSO options attached which generate error messages on certain alignment sensitive platforms. This patch fixes this by marking these unaligned accesses with the get_unaliagned() macro. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08[NetLabel]: consolidate the struct socket/sock handling to just struct sockPaul Moore
The current NetLabel code has some redundant APIs which allow both "struct socket" and "struct sock" types to be used; this may have made sense at some point but it is wasteful now. Remove the functions that operate on sockets and convert the callers. Not only does this make the code smaller and more consistent but it pushes the locking burden up to the caller which can be more intelligent about the locks. Also, perform the same conversion (socket to sock) on the SELinux/NetLabel glue code where it make sense. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08[IPV4]: Do not remove idev when addresses are clearedHerbert Xu
Now that we create idev before addresses are added, it no longer makes sense to remove them when addresses are all deleted. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[UDP]: Revert 2-pass hashing changes.David S. Miller
This reverts changesets: 6aaf47fa48d3c44280810b1b470261d340e4ed87 b7b5f487ab39bc10ed0694af35651a03d9cb97ff de34ed91c4ffa4727964a832c46e624dd1495cf5 fc038410b4b1643766f8033f4940bcdb1dace633 There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[NETFILTER]: ip_tables: fix compat related crashDmitry Mishin
check_compat_entry_size_and_hooks iterates over the matches and calls compat_check_calc_match, which loads the match and calculates the compat offsets, but unlike the non-compat version, doesn't call ->checkentry yet. On error however it calls cleanup_matches, which in turn calls ->destroy, which can result in crashes if the destroy function (validly) expects to only get called after the checkentry function. Add a compat_release_match function that only drops the module reference on error and rename compat_check_calc_match to compat_find_calc_match to reflect the fact that it doesn't call the checkentry function. Reported by Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[NETFILTER]: nf_conntrack: fix helper module unload racesPatrick McHarrdy
When a helper module is unloaded all conntracks refering to it have their helper pointer NULLed out, leading to lots of races. In most places this can be fixed by proper use of RCU (they do already check for != NULL, but in a racy way), additionally nf_conntrack_expect_related needs to bail out when no helper is present. Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy and not worth fixing. Signed-off-by: Patrick McHarrdy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[NETLINK]: Mark netlink policies constPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[TCP] tcp_probe: Attach printf attribute properly to printl().David S. Miller
GCC doesn't like the way Stephen initially did it: net/ipv4/tcp_probe.c:83: warning: empty declaration Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[TCP]: Use LIMIT_NETDEBUG in tcp_retransmit_timer().Eric Dumazet
LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0 >/proc/sys/net/core/warnings). The "TCP: Treason uncloaked!" message can use this facility. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[IPV4]: Restore old behaviour of default config valuesHerbert Xu
Previously inet devices were only constructed when addresses are added (or rarely in ipmr). Therefore the default config values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default config values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[IPV4]: Add default config support after inetdev_initHerbert Xu
Previously once inetdev_init has been called on a device any changes made to ipv4_devconf_dflt would have no effect on that device's configuration. This creates a problem since we have moved the point where inetdev_init is called from when an address is added to where the device is registered. This patch is the first half of a set that tries to mimic the old behaviour while still calling inetdev_init. It propagates any changes to ipv4_devconf_dflt to those devices that have not had the corresponding attribute set. The next patch will forcibly set all values at the point where inetdev_init was previously called. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[IPV4]: Convert IPv4 devconf to an arrayHerbert Xu
This patch converts the ipv4_devconf config members (everything except sysctl) to an array. This allows easier manipulation which will be needed later on to provide better management of default config values. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[IPV4]: Only panic if inetdev_init fails for loopbackHerbert Xu
When I made the inetdev_init call work on all devices I incorrectly left in the panic call as well. It is obviously undesirable to panic on an allocation failure for a normal network device. This patch moves the panic call under the loopback if clause. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ackPatrick McHardy
A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03[ICMP]: Fix icmp_errors_use_inbound_ifaddr sysctlPatrick McHardy
Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is sent after the packet passed through ip_output(), an address from the outgoing interface is chosen as ICMP source address since skb->dev doesn't point to the incoming interface anymore. Fix this by doing an interface lookup on rt->dst.iif and using that device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDPWei Dong
Signed-off-by: Wei Dong <weidong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03[TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)Ilpo Järvinen
The code used to ignore GSO completely, passing either way too small or zero pkts_acked when GSO skb or part of it got ACKed. In addition, there is no need to calculate the value in the loop but simple arithmetics after the loop is sufficient. There is no need to handle SYN case specially because congestion control modules are not yet initialized when FLAG_SYN_ACKED is set. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03[TCP]: Use default 32768-61000 outgoing port range in all cases.Mark Glines
This diff changes the default port range used for outgoing connections, from "use 32768-61000 in most cases, but use N-4999 on small boxes (where N is a multiple of 1024, depending on just *how* small the box is)" to just "use 32768-61000 in all cases". I don't believe there are any drawbacks to this change, and it keeps outgoing connection ports farther away from the mess of IANA-registered ports. Signed-off-by: Mark Glines <mark@glines.org> Signed-off-by: David S. Miller <davem@davemloft.net>