summaryrefslogtreecommitdiff
path: root/net/core/neighbour.c
AgeCommit message (Collapse)Author
2013-09-14neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico
[ Upstream commit 63134803a6369dcf7dddf7f0d5e37b9566b308d2 ] dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28neighbour: fix a race in neigh_destroy()Eric Dumazet
[ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ] There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28net: Fix skb_under_panic oops in neigh_resolve_outputramesh.nagappa@gmail.com
[ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ] The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: Shawn Lu <shawn.lu@ericsson.com> Reviewed-by: Robert Coulson <robert.coulson@ericsson.com> Reviewed-by: Billie Alsup <billie.alsup@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16net: neighbour: fix neigh_dump_info()Eric Dumazet
[ Upstream commit 4bd6683bd400c8b1d2ad544bb155d86a5d10f91c ] Denys found out "ip neigh" output was truncated to about 54 neighbours. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/sfc/rx.c Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change the rx_buf->is_page boolean into a set of u16 flags, and another to adjust how ->ip_summed is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21neighbour: Fixed race condition at tbl->nhtMichel Machado
When the fixed race condition happens: 1. While function neigh_periodic_work scans the neighbor hash table pointed by field tbl->nht, it unlocks and locks tbl->lock between buckets in order to call cond_resched. 2. Assume that function neigh_periodic_work calls cond_resched, that is, the lock tbl->lock is available, and function neigh_hash_grow runs. 3. Once function neigh_hash_grow finishes, and RCU calls neigh_hash_free_rcu, the original struct neigh_hash_table that function neigh_periodic_work was using doesn't exist anymore. 4. Once back at neigh_periodic_work, whenever the old struct neigh_hash_table is accessed, things can go badly. Signed-off-by: Michel Machado <michel@digirati.com.br> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30net: Allow ipv6 proxies and arp proxies be shown with iproute2Tony Zelenoff
Add ability to return neighbour proxies list to caller if it sent full ndmsg structure and has NTF_PROXY flag set. Before this patch (and before iproute2 patches): $ ip neigh add proxy 2001::1 dev eth0 $ ip -6 neigh show $ After it and with applied iproute2 patches: $ ip neigh add proxy 2001::1 dev eth0 $ ip -6 neigh show 2001::1 dev eth0 proxy $ Compatibility with old versions of iproute2 is not broken, kernel checks for incoming structure size and properly works if old structure is came. [v2] * changed comments style. * removed useless line with continue and curly bracket. * changed incoming message size check from equal to more or equal. CC: davem@davemloft.net CC: kuznet@ms2.inr.ac.ru CC: netdev@vger.kernel.org CC: xemul@parallels.com Signed-off-by: Tony Zelenoff <antonz@parallels.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28ipv6: Use universal hash for NDISC.David S. Miller
In order to perform a proper universal hash on a vector of integers, we have to use different universal hashes on each vector element. Which means we need 4 different hash randoms for ipv6. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19Revert "net: Remove unused neighbour layer ops."David S. Miller
This reverts commit 5c3ddec73d01a1fae9409c197078cb02c42238c3. S390 qeth driver actually still uses the setup ops. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13net: Remove unused neighbour layer ops.David S. Miller
It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller
To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
2011-11-30neigh: Add device constructor/destructor capability.David Miller
If the neigh entry has device private state, it will need constructor/destructor ops. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Add infrastructure for allocating device neigh privates.David Miller
netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Get rid of neigh_table->kmem_cachepDavid Miller
We are going to alloc for device specific private areas for neighbour entries, and in order to do that we have to move away from the fixed allocation size enforced by using neigh_table->kmem_cachep As a nice side effect we can now use kfree_rcu(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv4/inet_diag.c
2011-11-25netns: fix proxy ARP entries listing on a netnsJorge Boncompte [DTI2]
Skip entries from foreign network namespaces. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14neigh: new unresolved queue limitsEric Dumazet
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit : > From: David Miller <davem@davemloft.net> > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST) > > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Wed, 09 Nov 2011 12:14:09 +0100 > > > >> unres_qlen is the number of frames we are able to queue per unresolved > >> neighbour. Its default value (3) was never changed and is responsible > >> for strange drops, especially if IP fragments are used, or multiple > >> sessions start in parallel. Even a single tcp flow can hit this limit. > > ... > > > > Ok, I've applied this, let's see what happens :-) > > Early answer, build fails. > > Please test build this patch with DECNET enabled and resubmit. The > decnet neigh layer still refers to the removed ->queue_len member. > > Thanks. Ouch, this was fixed on one machine yesterday, but not the other one I used this morning, sorry. [PATCH V5 net-next] neigh: new unresolved queue limits unres_qlen is the number of frames we are able to queue per unresolved neighbour. Its default value (3) was never changed and is responsible for strange drops, especially if IP fragments are used, or multiple sessions start in parallel. Even a single tcp flow can hit this limit. $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data. 8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01neigh: Kill bogus SMP protected debugging message.David S. Miller
Whatever situations make this state legitimate when SMP also would be legitimate when !SMP and f.e. preemption is enabled. This is dubious enough that we should just delete it entirely. If we want to add debugging for neigh timer races, better more thorough mechanisms are needed. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19neigh: fix rcu splat in neigh_update()roy.qing.li@gmail.com
when use dst_get_neighbour to get neighbour, we need rcu_read_lock to protect, since dst_get_neighbour uses rcu_dereference. The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com> [ 105.612095] [ 105.612096] =================================================== [ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ] [ 105.612101] --------------------------------------------------- [ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check() without protection! [ 105.612105] [ 105.612106] other info that might help us debug this: [ 105.612106] [ 105.612108] [ 105.612108] rcu_scheduler_active = 1, debug_locks = 0 [ 105.612110] 1 lock held by dnsmasq/2618: [ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>] rtnl_lock+0x17/0x20 [ 105.612120] [ 105.612121] stack backtrace: [ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41 [ 105.612125] Call Trace: [ 105.612129] [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0 [ 105.612132] [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0 [ 105.612135] [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220 [ 105.612139] [<ffffffff81639298>] arp_req_set+0xb8/0x230 [ 105.612142] [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310 [ 105.612146] [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60 [ 105.612150] [<ffffffff8163fb75>] inet_ioctl+0x85/0x90 [ 105.612154] [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70 [ 105.612157] [<ffffffff815b55d3>] sock_ioctl+0x73/0x280 [ 105.612162] [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570 [ 105.612165] [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0 [ 105.612168] [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80 [ 105.612172] [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-22Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
2011-08-24arp: fix rcu lockdep splat in arp_process()Eric Dumazet
Dave Jones reported a lockdep splat triggered by an arp_process() call from parp_redo(). Commit faa9dcf793be (arp: RCU changes) is the origin of the bug, since it assumed arp_process() was called under rcu_read_lock(), which is not true in this particular path. Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in neigh_proxy_process() to take care of IPv6 side too. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/inetdevice.h:209 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by setfiles/2123: #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff8114cbc4>] walk_component+0x1ef/0x3e8 #1: (&isec->lock){+.+.+.}, at: [<ffffffff81204bca>] inode_doinit_with_dentry+0x3f/0x41f #2: (&tbl->proxy_timer){+.-...}, at: [<ffffffff8106a803>] run_timer_softirq+0x157/0x372 #3: (class){+.-...}, at: [<ffffffff8141f256>] neigh_proxy_process +0x36/0x103 stack backtrace: Pid: 2123, comm: setfiles Tainted: G W 3.1.0-0.rc2.git7.2.fc16.x86_64 #1 Call Trace: <IRQ> [<ffffffff8108ca23>] lockdep_rcu_dereference+0xa7/0xaf [<ffffffff8146a0b7>] __in_dev_get_rcu+0x55/0x5d [<ffffffff8146a751>] arp_process+0x25/0x4d7 [<ffffffff8146ac11>] parp_redo+0xe/0x10 [<ffffffff8141f2ba>] neigh_proxy_process+0x9a/0x103 [<ffffffff8106a8c4>] run_timer_softirq+0x218/0x372 [<ffffffff8106a803>] ? run_timer_softirq+0x157/0x372 [<ffffffff8141f220>] ? neigh_stat_seq_open+0x41/0x41 [<ffffffff8108f2f0>] ? mark_held_locks+0x6d/0x95 [<ffffffff81062bb6>] __do_softirq+0x112/0x25a [<ffffffff8150d27c>] call_softirq+0x1c/0x30 [<ffffffff81010bf5>] do_softirq+0x4b/0xa2 [<ffffffff81062f65>] irq_exit+0x5d/0xcf [<ffffffff8150dc11>] smp_apic_timer_interrupt+0x7c/0x8a [<ffffffff8150baf3>] apic_timer_interrupt+0x73/0x80 <EOI> [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158 [<ffffffff814fc285>] ? __slab_free+0x30/0x24c [<ffffffff814fc283>] ? __slab_free+0x2e/0x24c [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81130cb0>] kfree+0x108/0x131 [<ffffffff81204e74>] inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204fc6>] selinux_d_instantiate+0x1c/0x1e [<ffffffff81200f4f>] security_d_instantiate+0x21/0x23 [<ffffffff81154625>] d_instantiate+0x5c/0x61 [<ffffffff811563ca>] d_splice_alias+0xbc/0xd2 [<ffffffff811b17ff>] ext4_lookup+0xba/0xeb [<ffffffff8114bf1e>] d_alloc_and_lookup+0x45/0x6b [<ffffffff8114cbea>] walk_component+0x215/0x3e8 [<ffffffff8114cdf8>] lookup_last+0x3b/0x3d [<ffffffff8114daf3>] path_lookupat+0x82/0x2af [<ffffffff8110fc53>] ? might_fault+0xa5/0xac [<ffffffff8110fc0a>] ? might_fault+0x5c/0xac [<ffffffff8114c564>] ? getname_flags+0x31/0x1ca [<ffffffff8114dd48>] do_path_lookup+0x28/0x97 [<ffffffff8114df2c>] user_path_at+0x59/0x96 [<ffffffff811467ad>] ? cp_new_stat+0xf7/0x10d [<ffffffff811469a6>] vfs_fstatat+0x44/0x6e [<ffffffff811469ee>] vfs_lstat+0x1e/0x20 [<ffffffff81146b3d>] sys_newlstat+0x1a/0x33 [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158 [<ffffffff812535fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8150af82>] system_call_fastpath+0x16/0x1b Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12neigh: reduce arp latencyEric Dumazet
Remove the artificial HZ latency on arp resolution. Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets send the ARP message immediately. Before patch : # arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms After patch : $ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17net: Abstract dst->neighbour accesses behind helpers.David S. Miller
dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17neigh: Pass neighbour entry to output ops.David S. Miller
This will get us closer to being able to do "neigh stuff" completely independent of the underlying dst_entry for protocols (ipv4/ipv6) that wish to do so. We will also be able to make dst entries neigh-less. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16neigh: Kill ndisc_ops->queue_xmitDavid S. Miller
It is always dev_queue_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16neigh: Kill hh_cache->hh_outputDavid S. Miller
It's just taking on one of two possible values, either neigh_ops->output or dev_queue_xmit(). And this is purely depending upon whether nud_state has NUD_CONNECTED set or not. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16neigh: Kill neigh_ops->hh_outputDavid S. Miller
It's always dev_queue_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16neigh: Simply destroy handling wrt. hh_cache.David S. Miller
Now that hh_cache entries are embedded inside of neighbour entries, their lifetimes and accesses are now synchronous to that of the encompassing neighbour object. Therefore we don't need to hook up the blackhole op to hh_output on destroy. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14net: Embed hh_cache inside of struct neighbour.David S. Miller
Now that there is a one-to-one correspondance between neighbour and hh_cache entries, we no longer need: 1) dynamic allocation 2) attachment to dst->hh 3) refcounting Initialization of the hh_cache entry is indicated by hh_len being non-zero, and such initialization is always done with the neighbour's lock held as a writer. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-13net: Kill support for multiple hh_cache entries per neighbourDavid S. Miller
This never, ever, happens. Neighbour entries are always tied to one address family, and therefore one set of dst_ops, and therefore one dst_ops->protocol "hh_type" value. This capability was blindly imported by Alexey Kuznetsov when he wrote the neighbour layer. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-13net: Push protocol type directly down to header_ops->cache()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-11ipv4: Use universal hash for ARP.David S. Miller
We need to make sure the multiplier is odd. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-11neigh: Store hash shift instead of mask.David S. Miller
And mask the hash function result by simply shifting down the "->hash_shift" most significant bits. Currently which bits we use is arbitrary since jhash produces entropy evenly across the whole hash function result. But soon we'll be using universal hashing functions, and in those cases more entropy exists in the higher bits than the lower bits, because they use multiplies. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose
The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-01-20neigh: __rcu annotationsEric Dumazet
fix some minor issues and sparse (__rcu) warnings Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-19net: kill unused macrosShan Wei
These macros never be used, so remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21net/neighbour: cancel_delayed_work() + flush_scheduled_work() -> ↵Tejun Heo
cancel_delayed_work_sync() flush_scheduled_work() is going away. Prepare for it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11neigh: Protect neigh->ha[] with a seqlockEric Dumazet
Add a seqlock in struct neighbour to protect neigh->ha[], and avoid dirtying neighbour in stress situation (many different flows / dsts) Dirtying takes place because of read_lock(&n->lock) and n->used writes. Switching to a seqlock, and writing n->used only on jiffies changes permits less dirtying. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11neigh: speedup neigh_hh_init()Eric Dumazet
When a new dst is used to send a frame, neigh_resolve_output() tries to associate an struct hh_cache to this dst, calling neigh_hh_init() with the neigh rwlock write locked. Most of the time, hh_cache is already known and linked into neighbour, so we find it and increment its refcount. This patch changes the logic so that we call neigh_hh_init() with neighbour lock read locked only, so that fast path can be run in parallel by concurrent cpus. This brings part of the speedup we got with commit c7d4426a98a5f (introduce DST_NOCACHE flag) for non cached dsts, even for cached ones, removing one of the contention point that routers hit on multiqueue enabled machines. Further improvements would need to use a seqlock instead of an rwlock to protect neigh->ha[], to not dirty neigh too often and remove two atomic ops. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06neigh: RCU conversion of struct neighbourEric Dumazet
This is the second step for neighbour RCU conversion. (first was commit d6bf7817 : RCU conversion of neigh hash table) neigh_lookup() becomes lockless, but still take a reference on found neighbour. (no more read_lock()/read_unlock() on tbl->lock) struct neighbour gets an additional rcu_head field and is freed after an RCU grace period. Future work would need to eventually not take a reference on neighbour for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to use a noref bit like we did for skb->_dst. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05net neigh: RCU conversion of neigh hash tableEric Dumazet
David This is the first step for RCU conversion of neigh code. Next patches will convert hash_buckets[] and "struct neighbour" to RCU protected objects. Thanks [PATCH net-next] net neigh: RCU conversion of neigh hash table Instead of storing hash_buckets, hash_mask and hash_rnd in "struct neigh_table", a new structure is defined : struct neigh_hash_table { struct neighbour **hash_buckets; unsigned int hash_mask; __u32 hash_rnd; struct rcu_head rcu; }; And "struct neigh_table" has an RCU protected pointer to such a neigh_hash_table. This means the signature of (*hash)() function changed: We need to add a third parameter with the actual hash_rnd value, since this is not anymore a neigh_table field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05net neigh: neigh_delete() and neigh_add() changesEric Dumazet
neigh_delete() and neigh_add() dont need to touch device refcount, we hold RTNL when calling them, so device cannot disappear under us. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03net: introduce DST_NOCACHE flagEric Dumazet
While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-23net: return operator cleanupEric Dumazet
Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-14net/core: neighbour update OopsDoug Kehn
When configuring DMVPN (GRE + openNHRP) and a GRE remote address is configured a kernel Oops is observed. The obserseved Oops is caused by a NULL header_ops pointer (neigh->dev->header_ops) in neigh_update_hhs() when void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *) = neigh->dev->header_ops->cache_update; is executed. The dev associated with the NULL header_ops is the GRE interface. This patch guards against the possibility that header_ops is NULL. This Oops was first observed in kernel version 2.6.26.8. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28net: fix __neigh_event_send()Eric Dumazet
commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot where an skb is enqueued, with a possibly not refcounted dst entry. __neigh_event_send() inserts skb into arp_queue, so we must make sure dst entry is refcounted, or dst entry can be freed by garbage collector after caller exits from rcu protected section. Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-10net: Annotates neigh_invalidate()Eric Dumazet
Annotates neigh_invalidate() with __releases() and __acquires() for sparse sake. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16net neigh: Decouple per interface neighbour table controls from binary sysctlsEric W. Biederman
Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-23neigh: simplify seq_file codeAlexey Dobriyan
Simpily pass 'struct neigh_table' with seq_file private pointer, and save one dereference. Proc entry itself isn't interesting. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>