summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2008-10-31net: replace NIPQUAD() in net/*/Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31net: replace NIPQUAD() in net/netfilter/Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31net: replace NIPQUAD() in net/ipv4/ net/ipv6/Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31net: replace NIPQUAD() in net/ipv4/netfilter/Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add peek emulation for non-work-conserving qdiscs.Jarek Poplawski
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until dequeuing. This is mainly for compatibility reasons not to break some strange configs because peeking is expected for non-work-conserving parent qdiscs to query work-conserving child qdiscs. This implementation requires using qdisc_dequeue_peeked() wrapper instead of directly calling qdisc->dequeue() for all qdiscs ever querried with qdisc->ops->peek() or qdisc_peek_dequeued(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Use qdisc->ops->peek() instead of ->dequeue() & ->requeue()Jarek Poplawski
Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair. After this patch the only remaining user of qdisc->ops->requeue() is netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add qdisc->ops->peek() implementation.Jarek Poplawski
Add qdisc->ops->peek() implementation for work-conserving qdiscs. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: sch_generic: Add generic qdisc->ops->peek() implementation.Jarek Poplawski
With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs.Patrick McHardy
From: Patrick McHardy <kaber@trash.net> Just as a demonstration how easy adding a peek operation to the work-conserving qdiscs actually is. It doesn't need to keep or change any internal state in many cases thanks to the guarantee that the packet will either be dequeued or, if another packet arrives, the upper qdisc will immediately ->peek again to reevaluate the state. (This is only slightly modified Patrick's patch.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31xfrm: C99 for xfrm_dev_notifierAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/p54/p54common.c
2008-10-31xfrm: do not leak ESRCH to user spacefernando@oss.ntt.co
I noticed that, under certain conditions, ESRCH can be leaked from the xfrm layer to user space through sys_connect. In particular, this seems to happen reliably when the kernel fails to resolve a template either because the AF_KEY receive buffer being used by racoon is full or because the SA entry we are trying to use is in XFRM_STATE_EXPIRED state. However, since this could be a transient issue it could be argued that EAGAIN would be more appropriate. Besides this error code is not even documented in the man page for sys_connect (as of man-pages 3.07). Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30Merge branch 'master' of git://git.infradead.org/users/pcmoore/lblnet-2.6David S. Miller
2008-10-30netfilter: nf_conntrack_proto_gre: switch to register_pernet_gen_subsys()Alexey Dobriyan
register_pernet_gen_device() can't be used is nf_conntrack_pptp module is also used (compiled in or loaded). Right now, proto_gre_net_exit() is called before nf_conntrack_pptp_net_exit(). The former shutdowns and frees GRE piece of netns, however the latter absolutely needs it to flush keymap. Oops is inevitable. Switch to shiny new register_pernet_gen_subsys() to get correct ordering in netns ops list. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsysAlexey Dobriyan
netns ops which are registered with register_pernet_gen_device() are shutdown strictly before those which are registered with register_pernet_subsys(). Sometimes this leads to opposite (read: buggy) shutdown ordering between two modules. Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules which aren't elite enough for entry in struct net, and which can't use register_pernet_gen_device(). PPTP conntracking module is such one. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30udp: Should use spin_lock_bh()/spin_unlock_bh() in udp_lib_unhash()Eric Dumazet
Spotted by Alexander Beregalov Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix potential race in put_rpccred() SUNRPC: Fix rpcauth_prune_expired NFS: Convert nfs_attr_generation_counter into an atomic_long SUNRPC: Respond promptly to server TCP resets
2008-10-30netlabel: Fix compilation warnings in net/netlabel/netlabel_addrlist.cManish Katiyar
Enable netlabel auditing functions only when CONFIG_AUDIT is set Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-29netlabel: Fix compiler warnings in netlabel_mgmt.cPaul Moore
Fix the compiler warnings below, thanks to Andrew Morton for finding them. net/netlabel/netlabel_mgmt.c: In function `netlbl_mgmt_listentry': net/netlabel/netlabel_mgmt.c:268: warning: 'ret_val' might be used uninitialized in this function Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-29cipso: unsigned buf_len cannot be negativeroel kluin
unsigned buf_len cannot be negative Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-29net: replace %p6 with %pI6Harvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29net: replace %#p6 format specifier with %pi6Harvey Harrison
gcc warns when using the # modifier with the %p format specifier, so we can't use this to omit the colons when needed, introduces %pi6 instead. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: introduce sk_for_each_rcu_safenext()Eric Dumazet
Corey Minyard found a race added in commit 271b72c7fa82c2c7a795bc16896149933110672d (udp: RCU handling for Unicast packets.) "If the socket is moved from one list to another list in-between the time the hash is calculated and the next field is accessed, and the socket has moved to the end of the new list, the traversal will not complete properly on the list it should have, since the socket will be on the end of the new list and there's not a way to tell it's on a new list and restart the list traversal. I think that this can be solved by pre-fetching the "next" field (with proper barriers) before checking the hash." This patch corrects this problem, introducing a new sk_for_each_rcu_safenext() macro. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: udp_get_next() should use spin_unlock_bh()Eric Dumazet
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: calculate udp_mem based on low memory instead of all memoryEric Dumazet
This patch mimics commit 57413ebc4e0f1e471a3b4db4aff9a85c083d090e (tcp: calculate tcp_mem based on low memory instead of all memory) The udp_mem array which contains limits on the total amount of memory used by UDP sockets is calculated based on nr_all_pages. On a 32 bits x86 system, we should base this on the number of lowmem pages. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: RCU handling for Unicast packets.Eric Dumazet
Goals are : 1) Optimizing handling of incoming Unicast UDP frames, so that no memory writes should happen in the fast path. Note: Multicasts and broadcasts still will need to take a lock, because doing a full lockless lookup in this case is difficult. 2) No expensive operations in the socket bind/unhash phases : - No expensive synchronize_rcu() calls. - No added rcu_head in socket structure, increasing memory needs, but more important, forcing us to use call_rcu() calls, that have the bad property of making sockets structure cold. (rcu grace period between socket freeing and its potential reuse make this socket being cold in CPU cache). David did a previous patch using call_rcu() and noticed a 20% impact on TCP connection rates. Quoting Cristopher Lameter : "Right. That results in cacheline cooldown. You'd want to recycle the object as they are cache hot on a per cpu basis. That is screwed up by the delayed regular rcu processing. We have seen multiple regressions due to cacheline cooldown. The only choice in cacheline hot sensitive areas is to deal with the complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU." - Because udp sockets are allocated from dedicated kmem_cache, use of SLAB_DESTROY_BY_RCU can help here. Theory of operation : --------------------- As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()), special attention must be taken by readers and writers. Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed, reused, inserted in a different chain or in worst case in the same chain while readers could do lookups in the same time. In order to avoid loops, a reader must check each socket found in a chain really belongs to the chain the reader was traversing. If it finds a mismatch, lookup must start again at the begining. This *restart* loop is the reason we had to use rdlock for the multicast case, because we dont want to send same message several times to the same socket. We use RCU only for fast path. Thus, /proc/net/udp still takes spinlocks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: introduce struct udp_table and multiple spinlocksEric Dumazet
UDP sockets are hashed in a 128 slots hash table. This hash table is protected by *one* rwlock. This rwlock is readlocked each time an incoming UDP message is handled. This rwlock is writelocked each time a socket must be inserted in hash table (bind time), or deleted from this table (close time) This is not scalable on SMP machines : 1) Even in read mode, lock() and unlock() are atomic operations and must dirty a contended cache line, shared by all cpus. 2) A writer might be starved if many readers are 'in flight'. This can happen on a machine with some NIC receiving many UDP messages. User process can be delayed a long time at socket creation/dismantle time. This patch prepares RCU migration, by introducing 'struct udp_table and struct udp_hslot', and using one spinlock per chain, to reduce contention on central rwlock. Introducing one spinlock per chain reduces latencies, for port randomization on heavily loaded UDP servers. This also speedup bindings to specific ports. udp_lib_unhash() was uninlined, becoming to big. Some cleanups were done to ease review of following patch (RCUification of UDP Unicast lookups) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final usersHarvey Harrison
Open code NIP6_FMT in the one call inside sscanf and one user of NIP6() that could use %p6 in the netfilter code. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28vlan: propogate ethtool speed valuesStephen Hemminger
This enables more ethtool information. The speed and settings of the underlying device are propagated up. This makes services like SNMP that use ethtool to get speed setting, work when managing a vlan, without adding silly heurtistics into SNMP daemon. For the driver info, just use existing driver strings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net, misc: replace uses of NIP6_FMT with %p6Harvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: replace uses of NIP6_FMT with %p6Harvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28netfilter: replace uses of NIP6_FMT with %p6Harvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: replace all current users of NIP6_SEQFMT with %#p6Harvey Harrison
The define in kernel.h can be done away with at a later time. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28xfrm: Notify changes in UDP encapsulation via netlinkMartin Willi
Add new_mapping() implementation to the netlink xfrm_mgr to notify address/port changes detected in UDP encapsulated ESP packets. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: don't use INIT_RCU_HEADAlexey Dobriyan
call_rcu() will unconditionally rewrite RCU head anyway. Applies to struct neigh_parms struct neigh_table struct net struct cipso_v4_doi struct in_ifaddr struct in_device rt->u.dst Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: reduce structures when XFRM=nAlexey Dobriyan
ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28pktgen: fix multiple queue warningJesse Brandeburg
when testing the new pktgen module with multiple queues and ixgbe with: pgset "flag QUEUE_MAP_CPU" I found that I was getting errors in dmesg like: pktgen: WARNING: QUEUE_MAP_CPU disabled because CPU count (8) exceeds number <4>pktgen: WARNING: of tx queues (8) on eth15 you'll note, 8 really doesn't exceed 8. This patch seemed to fix the logic errors and also the attempts at limiting line length in printk (which didn't work anyway) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28SUNRPC: Fix potential race in put_rpccred()Trond Myklebust
We have to be careful when we try to unhash the credential in put_rpccred(), because we're not holding the credcache lock, so the call to rpcauth_unhash_cred() may fail if someone else has looked the cred up, and obtained a reference to it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28SUNRPC: Fix rpcauth_prune_expiredTrond Myklebust
We need to make sure that we don't remove creds from the cred_unused list if they are still under the moratorium, or else they will never get garbage collected. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28SUNRPC: Respond promptly to server TCP resetsTrond Myklebust
If the server sends us an RST error while we're in the TCP_ESTABLISHED state, then that will not result in a state change, and so the RPC client ends up hanging forever (see http://bugzilla.kernel.org/show_bug.cgi?id=11154) We can intercept the reset by setting up an sk->sk_error_report callback, which will then allow us to initiate a proper shutdown and retry... We also make sure that if the send request receives an ECONNRESET, then we shutdown too... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28netlink: constify struct nlattr * arg to parsing functionsPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27netns: Coexist with the sysfs limitations v2Eric W. Biederman
To make testing of the network namespace simpler allow the network namespace code and the sysfs code to be compiled and run at the same time. To do this only virtual devices are allowed in the additional network namespaces and those virtual devices are not placed in the kobject tree. Since virtual devices don't actually do anything interesting hardware wise that needs device management there should be no loss in keeping them out of the kobject tree and by implication sysfs. The gain in ease of testing and code coverage should be significant. Changelog: v2: As pointed out by Benjamin Thery it only makes sense to call device_rename in the initial network namespace for now. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Tested-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27net: convert print_mac to %pMJohannes Berg
This converts pretty much everything to print_mac. There were a few things that had conflicts which I have just dropped for now, no harm done. I've built an allyesconfig with this and looked at the files that weren't built very carefully, but it's a huge patch. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27mac80211: convert to %pM away from print_macJohannes Berg
Also remove a few stray DECLARE_MAC_BUF that were no longer used at all. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27net: implement emergency route cache rebulds when gc_elasticity is exceededNeil Horman
This is a patch to provide on demand route cache rebuilding. Currently, our route cache is rebulid periodically regardless of need. This introduced unneeded periodic latency. This patch offers a better approach. Using code provided by Eric Dumazet, we compute the standard deviation of the average hash bucket chain length while running rt_check_expire. Should any given chain length grow to larger that average plus 4 standard deviations, we trigger an emergency hash table rebuild for that net namespace. This allows for the common case in which chains are well behaved and do not grow unevenly to not incur any latency at all, while those systems (which may be being maliciously attacked), only rebuild when the attack is detected. This patch take 2 other factors into account: 1) chains with multiple entries that differ by attributes that do not affect the hash value are only counted once, so as not to unduly bias system to rebuilding if features like QOS are heavily used 2) if rebuilding crosses a certain threshold (which is adjustable via the added sysctl in this patch), route caching is disabled entirely for that net namespace, since constant rebuilding is less efficient that no caching at all Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27mac80211: correct warnings in minstrel rate control algorithmJohn W. Linville
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-27RFKILL: fix input layer initialisationDmitry Baryshkov
Initialise correctly last fields, so tasks can be actually executed. On some architectures the initial jiffies value is not zero, so later all rfkill incorrectly decides that rfkill_*.last is in future. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-26syncookies: fix inclusion of tcp options in syn-ackFlorian Westphal
David Miller noticed that commit 33ad798c924b4a1afad3593f2796d465040aadd5 '(tcp: options clean up') did not move the req->cookie_ts check. This essentially disabled commit 4dfc2817025965a2fc78a18c50f540736a6b5c24 '[Syncookies]: Add support for TCP options via timestamps.'. This restores the original logic. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-26Phonet: do not reply to indication reset packetsRemi Denis-Courmont
This fixes a potential error packet loop. Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-26wireless: fix regression caused by regulatory config optionArjan van de Ven
The default for the regulatory compatibility option is wrong; if you picked the default you ended up with a non-functional wifi system (at least I did on Fedora 9 with iwl4965). I don't think even the October 2008 releases of the various distros has the new userland so clearly the default is wrong, and also we can't just go about deleting this in 2.6.29... Change the default to "y" and also adjust the config text a little to reflect this. This patch fixes regression #11859 With thanks to Johannes Berg for the diagnostics Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>