summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2009-09-15net: net_assign_generic() fixEric Dumazet
[ Upstream commit 144586301f6af5ae5943a002f030d8c626fa4fdd ] memcpy() should take into account size of pointers, not only number of pointers to copy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11net: fix skb_seq_read returning wrong offset/length for page frag dataThomas Chenault
[ Upstream commit 995b337952cdf7e05d288eede580257b632a8343 ] When called with a consumed value that is less than skb_headlen(skb) bytes into a page frag, skb_seq_read() incorrectly returns an offset/length relative to skb->data. Ensure that data which should come from a page frag does. Signed-off-by: Thomas Chenault <thomas_chenault@dell.com> Tested-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11pktgen: do not access flows[] beyond its lengthFlorian Westphal
[ Upstream commit 5b5f792a6a9a2f9ae812d151ed621f72e99b1725 ] typo -- pkt_dev->nflows is for stats only, the number of concurrent flows is stored in cflows. Reported-By: Vladimir Ivashchenko <hazard@francoudi.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16net: Kill skb_truesize_check(), it only catches false-positives.David S. Miller
[ Upstream commit 92a0acce186cde8ead56c6915d9479773673ea1a ] A long time ago we had bugs, primarily in TCP, where we would modify skb->truesize (for TSO queue collapsing) in ways which would corrupt the socket memory accounting. skb_truesize_check() was added in order to try and catch this error more systematically. However this debugging check has morphed into a Frankenstein of sorts and these days it does nothing other than catch false-positives. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16net: amend the fix for SO_BSDCOMPAT gsopt infoleakEugene Teo
[ Upstream commit 50fee1dec5d71b8a14c1b82f2f42e16adc227f8b ] The fix for CVE-2009-0676 (upstream commit df0bca04) is incomplete. Note that the same problem of leaking kernel memory will reappear if someone on some architecture uses struct timeval with some internal padding (for example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to leak the padded bytes to userspace. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix data corruption when splicing from sockets.Jarek Poplawski
[ Upstream commit 8b9d3728977760f6bd1317c4420890f73695354e ] The trick in socket splicing where we try to convert the skb->data into a page based reference using virt_to_page() does not work so well. The idea is to pass the virt_to_page() reference via the pipe buffer, and refcount the buffer using a SKB reference. But if we are splicing from a socket to a socket (via sendpage) this doesn't work. The from side processing will grab the page (and SKB) references. The sendpage() calls will grab page references only, return, and then the from side processing completes and drops the SKB ref. The page based reference to skb->data is not enough to keep the kmalloc() buffer backing it from being reused. Yet, that is all that the socket send side has at this point. This leads to data corruption if the skb->data buffer is reused by SLAB before the send side socket actually gets the TX packet out to the device. The fix employed here is to simply allocate a page and copy the skb->data bytes into that page. This will hurt performance, but there is no clear way to fix this properly without a copy at the present time, and it is important to get rid of the data corruption. With fixes from Herbert Xu. Tested-by: Willy Tarreau <w@1wt.eu> Foreseen-by: Changli Gao <xiaosuo@gmail.com> Diagnosed-by: Willy Tarreau <w@1wt.eu> Reported-by: Willy Tarreau <w@1wt.eu> Fixed-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix OOPS in skb_seq_read().Shyam Iyer
[ Upstream commit 71b3346d182355f19509fadb8fe45114a35cc499 ] It oopsd for me in skb_seq_read. addr2line said it was linux-2.6/net/core/skbuff.c:2228, which is this line: while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) { I added some printks in there and it looks like we hit this: } else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; st->frag_idx = 0; goto next_skb; } Actually I did some testing and added a few printks and found that the st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null. This caused the kernel panic. if (abs_offset < block_limit) { - *data = st->cur_skb->data + abs_offset; + *data = st->cur_skb->data + (abs_offset - st->stepped_offset); I enabled the debug_tcp and with a few printks found that the code did not go to the next_skb label and could find that the sequence being followed was this - It hit this if condition - if (st->cur_skb->next) { st->cur_skb = st->cur_skb->next; st->frag_idx = 0; goto next_skb; And so, now the st pointer is shifted to the next skb whereas actually it should have hit the second else if first since the data is in the frag_list. else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; goto next_skb; } Reversing the two conditions the attached patch fixes the issue for me on top of Herbert's patches. Signed-off-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix frag_list handling in skb_seq_readHerbert Xu
[ Upstream commit 95e3b24cfb4ec0479d2c42f7a1780d68063a542a ] The frag_list handling was broken in skb_seq_read: 1) We didn't add the stepped offset when looking at the head are of fragments other than the first. 2) We didn't take the stepped offset away when setting the data pointer in the head area. 3) The frag index wasn't reset. This patch fixes both issues. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2Clément Lecigne
[ Upstream commit df0bca049d01c0ee94afb7cd5dfd959541e6c8da ] In function sock_getsockopt() located in net/core/sock.c, optval v.val is not correctly initialized and directly returned in userland in case we have SO_BSDCOMPAT option set. This dummy code should trigger the bug: int main(void) { unsigned char buf[4] = { 0, 0, 0, 0 }; int len; int sock; sock = socket(33, 2, 2); getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len); printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]); close(sock); } Here is a patch that fix this bug by initalizing v.val just after its declaration. Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-06net: fix packet socket delivery in rx irq handlerPatrick McHardy
commit 9b22ea560957de1484e6b3e8538f7eef202e3596 upstream. The changes to deliver hardware accelerated VLAN packets to packet sockets (commit bc1d0411) caused a warning for non-NAPI drivers. The __vlan_hwaccel_rx() function is called directly from the drivers RX function, for non-NAPI drivers that means its still in RX IRQ context: [ 27.779463] ------------[ cut here ]------------ [ 27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81() ... [ 27.782520] [<c0264755>] netif_nit_deliver+0x5b/0x75 [ 27.782590] [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162 [ 27.782664] [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1] [ 27.782738] [<c0155b17>] handle_IRQ_event+0x23/0x51 [ 27.782808] [<c015692e>] handle_edge_irq+0xc2/0x102 [ 27.782878] [<c0105fd5>] do_IRQ+0x4d/0x64 Split hardware accelerated VLAN reception into two parts to fix this: - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN device lookup, then calls netif_receive_skb()/netif_rx() - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb() in softirq context, performs the real reception and delivery to packet sockets. Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-18net: eliminate warning from NETIF_F_UFO on bridgeStephen Hemminger
Based on commit b63365a2d60268a3988285d6c3c6003d7066f93a upstream, but drastically cut down for 2.6.27.y The bridge device always causes a warning because when it is first created it has the no checksum flag set along with all the segmentation/fragmentation offload bits. The code in register_netdevice incorrectly checks for only hardware checksum bit and ignores no checksum bit. Similar code is already in 2.6.28: commit b63365a2d60268a3988285d6c3c6003d7066f93a net: Fix disjunct computation of netdev features Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Cc: David Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-11-07net: Fix recursive descent in __scm_destroy().David Miller
commit f8d570a4745835f2238a33b537218a1bb03fc671 and 3b53fbf4314594fa04544b02b2fc6e607912da18 upstream (because once wasn't good enough...) __scm_destroy() walks the list of file descriptors in the scm_fp_list pointed to by the scm_cookie argument. Those, in turn, can close sockets and invoke __scm_destroy() again. There is nothing which limits how deeply this can occur. The idea for how to fix this is from Linus. Basically, we do all of the fput()s at the top level by collecting all of the scm_fp_list objects hit by an fput(). Inside of the initial __scm_destroy() we keep running the list until it is empty. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-07net: Fix netdev_run_todo dead-lockHerbert Xu
Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: only invoke dev->change_rx_flags when device is UPPatrick McHardy
Jesper Dangaard Brouer <hawk@comx.dk> reported a bug when setting a VLAN device down that is in promiscous mode: When the VLAN device is set down, the promiscous count on the real device is decremented by one by vlan_dev_stop(). When removing the promiscous flag from the VLAN device afterwards, the promiscous count on the real device is decremented a second time by the vlan_change_rx_flags() callback. The root cause for this is that the ->change_rx_flags() callback is invoked while the device is down. The synchronization is meant to mirror the behaviour of the ->set_rx_mode callbacks, meaning the ->open function is responsible for doing a full sync on open, the ->close() function is responsible for doing full cleanup on ->stop() and ->change_rx_flags() is meant to do incremental changes while the device is UP. Only invoke ->change_rx_flags() while the device is UP to provide the intended behaviour. Tested-by: Jesper Dangaard Brouer <jdb@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20netdev: simple_tx_hash shouldn't hash inside fragmentsAlexander Duyck
Currently simple_tx_hash is hashing inside of udp fragments. As a result packets are getting getting sent to all queues when they shouldn't be. This causes a serious performance regression which can be seen by sending UDP frames larger than mtu on multiqueue devices. This change will make it so that fragments are hashed only as IP datagrams w/o any protocol information. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-07pkt_sched: Fix qdisc state in net_tx_action()Jarek Poplawski
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc is neither ran nor rescheduled, which may cause endless loop in dev_deactivate(). Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-19pkt_sched: Prevent livelock in TX queue running.David S. Miller
If dev_deactivate() is trying to quiesce the queue, it is theoretically possible for another cpu to livelock trying to process that queue. This happens because dev_deactivate() grabs the queue spinlock as it checks the queue state, whereas net_tx_action() does a trylock and reschedules the qdisc if it hits the lock. This breaks the livelock by adding a check on __QDISC_STATE_DEACTIVATED to net_tx_action() when the trylock fails. Based upon feedback from Herbert Xu and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-18Revert "pkt_sched: Protect gen estimators under est_lock."David S. Miller
This reverts commit d4766692e72422f3b0f0e9ac6773d92baad07d51. qdisc_destroy() now runs in RTNL fully again, so this change is no longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17pkt_sched: Fix missed RCU unlock in dev_queue_xmit()David S. Miller
Noticed by Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17net: Change handling of the __QDISC_STATE_SCHED flag in net_tx_action().Jarek Poplawski
Change handling of the __QDISC_STATE_SCHED flag in net_tx_action() to enable proper control in dev_deactivate(). Now, if this flag is seen as unset under root_lock means a qdisc can't be netif_scheduled. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17pkt_sched: Add 'deactivated' state.David S. Miller
This new state lets dev_deactivate() mark a qdisc as having been deactivated. dev_queue_xmit() and ing_filter() check for this bit and do not try to process the qdisc if the bit is set. dev_deactivate() polls the qdisc after setting the bit, waiting for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear. This isn't perfect yet, but subsequent changesets will make it so. This part is just one piece of the puzzle. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-15net: skb_copy_datagram_from_iovec()Rusty Russell
There's an skb_copy_datagram_iovec() to copy out of a paged skb, but nothing the other way around (because we don't do that). We want to allocate big skbs in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-15net: Preserve netfilter attributes in skb_gso_segment using __copy_skb_headerHerbert Xu
skb_gso_segment didn't preserve some attributes in the original skb such as the netfilter fields. This was harmless until they were used which is the case for packets going through lo. This patch makes it call __copy_skb_header which also picks up some other missing attributes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13pkt_sched: Protect gen estimators under est_lock.Jarek Poplawski
gen_kill_estimator() required rtnl_lock() protection, but since it is moved to an RCU callback __qdisc_destroy() let's use est_lock instead. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13pktgen: prevent pktgen from using bad tx queueAndrew Gallatin
With the new multi-queue transmit code, it is possible to accidentally make pktgen pick a non-existing tx queue simply by using a stale script to drive pktgen. Access to this non-existing tx queue will then trigger a bad memory access and kill the machine. For example, setting "queue_map_max 2" will cause my machine to die when accessing a garbage spinlock in the non-existing tx queue: BUG: spinlock bad magic on CPU#0, kpktgend_0/564 lock: ffff88001ddf6718, .magic: ffffffff, .owner: /-1, .owner_cpu: 0 Pid: 564, comm: kpktgend_0 Not tainted 2.6.27-rc3 #35 Call Trace: [<ffffffff803a1228>] spin_bug+0xa4/0xac [<ffffffff803a1253>] _raw_spin_lock+0x23/0x123 [<ffffffff8055b06f>] _spin_lock_bh+0x17/0x1b [<ffffffff804cb57d>] pktgen_thread_worker+0xa97/0x1002 [<ffffffff8022874d>] ? finish_task_switch+0x38/0x97 [<ffffffff80242077>] ? autoremove_wake_function+0x0/0x36 [<ffffffff80242077>] ? autoremove_wake_function+0x0/0x36 [<ffffffff804caae6>] ? pktgen_thread_worker+0x0/0x1002 [<ffffffff80241a40>] kthread+0x44/0x6d [<ffffffff8020c399>] child_rip+0xa/0x11 [<ffffffff802419fc>] ? kthread+0x0/0x6d [<ffffffff8020c38f>] ? child_rip+0x0/0x11 The attached patch adds some sanity checking to prevent these sorts of configuration errors. Signed-off-by: Andrew Gallatin <gallatin@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-07pktgen: multiqueue etc.Robert Olsson
Sofar far pktgen have had a restriction to only use one device per kernel thread. With the new multiqueue architecture this is no longer adequate. The patch below is an effort to remove this by in pktgen configuration adding a tag to the device name a la eth0@0 etc. The tag is used for usual device config just as before. Also a new flag is introduced to mirror queue_map with sending threads smp_processor_id() QUEUE_MAP_CPU. An example: We use 4 CPU's to send to one 10g interface (eth0) and we use the new tagging to send a mix of packet sizes, 64, 576 and 1500 bytes. Also we use TX queues according to smp_processor_id() PGDEV=/proc/net/pktgen/kpktgend_0 pgset "add_device eth0@0" PGDEV=/proc/net/pktgen/kpktgend_1 pgset "add_device eth0@1" PGDEV=/proc/net/pktgen/kpktgend_2 pgset "add_device eth0@2" PGDEV=/proc/net/pktgen/kpktgend_3 pgset "add_device eth0@3" .... PGDEV=/proc/net/pktgen/eth0@0 pgset "pkt_size 64" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@1 pgset "pkt_size 572" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@2 pgset "pkt_size 1496" PGDEV=/proc/net/pktgen/eth0@3 pgset "pkt_size 1496" pgset "flag QUEUE_MAP_CPU" Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-07net/core: Allow receive on active slaves.Joe Eykholt
If a packet_type specifies an active slave to bonding and not just any interface, allow it to receive frames that came in on that interface. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-07net/core: Allow certain receives on inactive slave.Joe Eykholt
Allow a packet_type that specifies the exact device to receive even on an inactive bonding slave devices. This is important for some L2 protocols such as LLDP and FCoE. This can eventually be used for the bonding special cases as well. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-07net/core: Uninline skb_bond().Joe Eykholt
Otherwise subsequent changes need multiple return values. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-05pktgen: mac countRobert Olsson
dst_mac_count and src_mac_count patch from Eneas Hunguana We have sent one mac address to much. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-05pktgen: random flow Robert Olsson
Random flow generation has not worked. This fixes it. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04net_sched: Add qdisc __NET_XMIT_BYPASS flagJarek Poplawski
Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03net: eliminate refcounting in backlog queueStephen Hemminger
Avoid the overhead of atomic increment/decrement on each received packet. This helps performance of non-NAPI devices (like loopback). Use cleanup function to walk queue on each cpu and clean out any left over packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03net: use software GSO for SG+CSUM capable netdevicesLennert Buytenhek
If a netdevice does not support hardware GSO, allowing the stack to use GSO anyway and then splitting the GSO skb into MSS-sized pieces as it is handed to the netdevice for transmitting is likely still a win as far as throughput and/or CPU usage are concerned, since it reduces the number of trips through the output path. This patch enables the use of GSO on any netdevice that supports SG. If a GSO skb is then sent to a netdevice that supports SG but does not support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take care of doing the necessary GSO segmentation in software. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03net: fix missing pneigh entries in the neighbor seq_file codeChris Larson
When pneigh entries exist, but the user's read buffer isn't sufficient to hold them all, one of the pneigh entries will be missing from the results. In neigh_get_idx_any, the number of elements which neigh_get_idx encountered is not correctly subtracted from the position number before the call to pneigh_get_idx. neigh_get_idx reduces the position by 1 for each call to neigh_get_next, but it does not reduce it by one for the first element (neigh_get_first). The patch alters the neigh_get_idx and pneigh_get_idx functions to subtract one from pos, for the first element, when pos is non-zero. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03net: in the first call to neigh_seq_next, call neigh_get_first, not ↵Chris Larson
neigh_get_idx. neigh_seq_next won't be called both with *pos > 0 && v == SEQ_START_TOKEN, so there's no point calling neigh_get_idx when we're on the start token, just call neigh_get_first directly. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-02pkt_sched: Use qdisc_lock() on already sampled root qdisc.David S. Miller
Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31netdev: Fix lockdep warnings in multiqueue configurations.David S. Miller
When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-30pkt_sched: Fix OOPS on ingress qdisc add.David S. Miller
Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-29mac80211: partially fix skb->cb useJohannes Berg
This patch fixes mac80211 to not use the skb->cb over the queue step from virtual interfaces to the master. The patch also, for now, disables aggregation because that would still require requeuing, will fix that in a separate patch. There are two other places (software requeue and powersaving stations) where requeue can happen, but that is not currently used by any drivers/not possible to use respectively. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON
2008-07-25net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25printk ratelimiting rewriteDave Young
All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: sch_sfq: dump a real number of flows atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups netfilter: make security table depend on NETFILTER_ADVANCED tcp: Clear probes_out more aggressively in tcp_ack(). e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call net: Update entry in af_family_clock_key_strings netdev: Remove warning from __netif_schedule(). sky2: don't stop queue on shutdown
2008-07-23Merge branch 'cpus4096-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23net: Update entry in af_family_clock_key_stringsOliver Hartkopp
In the merge phase of the CAN subsystem the af_family_clock_key_strings[] have been added to sock.c in commit 443aef0eddfa44c158d1b94ebb431a70638fcab4 (lockdep: fixup sk_callback_lock annotation). This trivial patch adds the missing name for address family 29 (AF_CAN). Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23netdev: Remove warning from __netif_schedule().David S. Miller
It isn't helping anything and we aren't going to be able to change all the drivers that do queue wakeups in strange situations. Just letting a noop_qdisc get scheduled will work because when qdisc_run() executes via net_tx_work() it will simply find no packets pending when it makes the ->dequeue() call in qdisc_restart. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits) I/OAT: I/OAT version 3.0 support I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT: Add watchdog/reset functionality to ioatdma iop_adma: cleanup iop_chan_xor_slot_count iop_adma: document how to calculate the minimum descriptor pool size iop_adma: directly reclaim descriptors on allocation failure async_tx: make async_tx_test_ack a boolean routine async_tx: remove depend_tx from async_tx_sync_epilog async_tx: export async_tx_quiesce async_tx: fix handling of the "out of descriptor" condition in async_xor async_tx: ensure the xor destination buffer remains dma-mapped async_tx: list_for_each_entry_rcu() cleanup dmaengine: Driver for the Synopsys DesignWare DMA controller dmaengine: Add slave DMA interface dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap dmaengine: Add dma_client parameter to device_alloc_chan_resources dmatest: Simple DMA memcpy test client dmaengine: DMA engine driver for Marvell XOR engine iop-adma: fix platform driver hotplug/coldplug dmaengine: track the number of clients using a channel ... Fixed up conflict in drivers/dca/dca-sysfs.c manually
2008-07-22I/OAT: tcp_dma_copybreak default value dependent on I/OAT versionMaciej Sosnowski
I/OAT DMA performance tuning showed different optimal values of tcp_dma_copybreak for different I/OAT versions (4096 for 1.2 and 2048 for 2.0). This patch lets ioatdma driver set tcp_dma_copybreak value according to these results. [dan.j.williams@intel.com: remove some ifdefs] Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-22netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep.David S. Miller
The new address list lock needs to handle the same device layering issues that the _xmit_lock one does. This integrates work done by Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>