summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits) qlcnic: adding co maintainer ixgbe: add support for active DA cables ixgbe: dcb, do not tag tc_prio_control frames ixgbe: fix ixgbe_tx_is_paused logic ixgbe: always enable vlan strip/insert when DCB is enabled ixgbe: remove some redundant code in setting FCoE FIP filter ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp ixgbe: fix header len when unsplit packet overflows to data buffer ipv6: Never schedule DAD timer on dead address ipv6: Use POSTDAD state ipv6: Use state_lock to protect ifa state ipv6: Replace inet6_ifaddr->dead with state cxgb4: notify upper drivers if the device is already up when they load cxgb4: keep interrupts available when the ports are brought down cxgb4: fix initial addition of MAC address cnic: Return SPQ credit to bnx2x after ring setup and shutdown. cnic: Convert cnic_local_flags to atomic ops. can: Fix SJA1000 command register writes on SMP systems bridge: fix build for CONFIG_SYSFS disabled ARCNET: Limit com20020 PCI ID matches for SOHARD cards ... Fix up various conflicts with pcmcia tree drivers/net/ {pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and wireless/orinoco/spectrum_cs.c} and feature removal (Documentation/feature-removal-schedule.txt). Also fix a non-content conflict due to pm_qos_requirement getting renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
2010-05-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits) vlynq: make whole Kconfig-menu dependant on architecture add descriptive comment for TIF_MEMDIE task flag declaration. EEPROM: max6875: Header file cleanup EEPROM: 93cx6: Header file cleanup EEPROM: Header file cleanup agp: use NULL instead of 0 when pointer is needed rtc-v3020: make bitfield unsigned PCI: make bitfield unsigned jbd2: use NULL instead of 0 when pointer is needed cciss: fix shadows sparse warning doc: inode uses a mutex instead of a semaphore. uml: i386: Avoid redefinition of NR_syscalls fix "seperate" typos in comments cocbalt_lcdfb: correct sections doc: Change urls for sparse Powerpc: wii: Fix typo in comment i2o: cleanup some exit paths Documentation/: it's -> its where appropriate UML: Fix compiler warning due to missing task_struct declaration UML: add kernel.h include to signal.c ...
2010-05-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: PM QOS update fix Freezer / cgroup freezer: Update stale locking comments PM / platform_bus: Allow runtime PM by default i2c: Fix bus-level power management callbacks PM QOS update PM / Hibernate: Fix block_io.c printk warning PM / Hibernate: Group swap ops PM / Hibernate: Move the first_sector out of swsusp_write PM / Hibernate: Separate block_io PM / Hibernate: Snapshot cleanup FS / libfs: Implement simple_write_to_buffer PM / Hibernate: document open(/dev/snapshot) side effects PM / Runtime: Add sysfs debug files PM: Improve device power management document PM: Update device power management document PM: Allow runtime_suspend methods to call pm_schedule_suspend() PM: pm_wakeup - switch to using bool
2010-05-19Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: (45 commits) Revert "nfsd4: distinguish expired from stale stateids" nfsd: safer initialization order in find_file() nfs4: minor callback code simplification, comment NFSD: don't report compiled-out versions as present nfsd4: implement reclaim_complete nfsd4: nfsd4_destroy_session must set callback client under the state lock nfsd4: keep a reference count on client while in use nfsd4: mark_client_expired nfsd4: introduce nfs4_client.cl_refcount nfsd4: refactor expire_client nfsd4: extend the client_lock to cover cl_lru nfsd4: use list_move in move_to_confirmed nfsd4: fold release_session into expire_client nfsd4: rename sessionid_lock to client_lock nfsd4: fix bare destroy_session null dereference nfsd4: use local variable in nfs4svc_encode_compoundres nfsd: further comment typos sunrpc: centralise most calls to svc_xprt_received nfsd4: fix unlikely race in session replay case nfsd4: fix filehandle comment ...
2010-05-19Merge branch 'nfs-for-2.6.35' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (78 commits) SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired SUNRPC: Reorder the struct rpc_task fields SUNRPC: Remove the 'tk_magic' debugging field SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst NFS: Don't call iput() in nfs_access_cache_shrinker NFS: Clean up nfs_access_zap_cache() NFS: Don't run nfs_access_cache_shrinker() when the mask is GFP_NOFS SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired() SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS NFS: Read requests can use GFP_KERNEL. NFS: Clean up nfs_create_request() NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls NFSv4: Don't use GFP_KERNEL allocations in state recovery SUNRPC: Fix xs_setup_bc_tcp() SUNRPC: Replace jiffies-based metrics with ktime-based metrics ktime: introduce ktime_to_ms() SUNRPC: RPC metrics and RTT estimator should use same RTT value NFS: Calldata for nfs4_renew_done() NFS: Squelch compiler warning in nfs_add_server_stats() ...
2010-05-19Merge branch 'bkl/procfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: sunrpc: Include missing smp_lock.h procfs: Kill the bkl in ioctl procfs: Push down the bkl from ioctl procfs: Use generic_file_llseek in /proc/vmcore procfs: Use generic_file_llseek in /proc/kmsg procfs: Use generic_file_llseek in /proc/kcore procfs: Kill BKL in llseek on proc base
2010-05-18ipv6: Never schedule DAD timer on dead addressHerbert Xu
This patch ensures that all places that schedule the DAD timer look at the address state in a safe manner before scheduling the timer. This ensures that we don't end up with pending timers after deleting an address. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18ipv6: Use POSTDAD stateHerbert Xu
This patch makes use of the new POSTDAD state. This prevents a race between DAD completion and failure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18ipv6: Use state_lock to protect ifa stateHerbert Xu
This patch makes use of the new state_lock to synchronise between updates to the ifa state. This fixes the issue where a remotely triggered address deletion (through DAD failure) coincides with a local administrative address deletion, causing certain actions to be performed twice incorrectly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18ipv6: Replace inet6_ifaddr->dead with stateHerbert Xu
This patch replaces the boolean dead flag on inet6_ifaddr with a state enum. This allows us to roll back changes when deleting an address according to whether DAD has completed or not. This patch only adds the state field and does not change the logic. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18bridge: fix build for CONFIG_SYSFS disabledRandy Dunlap
Fix build when CONFIG_SYSFS is not enabled: net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name' Note: dev->name == sysfs_name except when change name is in progress, and we are protected from that by RTNL mutex. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Remove unnecessary returns from void function()sJoe Perches
This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net sched: cleanup and rate limit warningstephen hemminger
If the user has a bad classification configuration, and gets a packet that goes through too many steps. Chances are more packets will arrive, and the message spew will overrun syslog because it is not rate limited. And because it is not tagged with appropriate priority it can't not be screened. Added the qdisc to the message to try and give some more context when the message does arrive. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17pfkey: add severity to printkstephen hemminger
Put severity level on pfkey printk messages Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17xfrm: add severity to printkstephen hemminger
Serious oh sh*t messages converted to WARN(). Add KERN_NOTICE severity to the unknown policy type messages. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net sched: printk message severitystephen hemminger
The previous patch encourage me to go look at all the messages in the network scheduler and fix them. Many messages were missing any severity level. Some serious ones that should never happen were turned into WARN(), and the random noise messages that were handled changed to pr_debug(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net/caif: Use kzallocJulia Lawall
Use kzalloc rather than the combination of kmalloc and memset. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,size,flags; statement S; @@ -x = kmalloc(size,flags); +x = kzalloc(size,flags); if (x == NULL) S -memset(x, 0, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17sctp: fix append error cause to ERROR chunk correctlyWei Yongjun
commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 sctp: Fix skb_over_panic resulting from multiple invalid \ parameter errors (CVE-2010-1173) (v4) cause 'error cause' never be add the the ERROR chunk due to some typo when check valid length in sctp_init_cause_fixed(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Add netlink support for virtual port management (was iovnl)Scott Feldman
Add new netdev ops ndo_{set|get}_vf_port to allow setting of port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/ RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF (added to end of IFLA_cmd list). These are both nested atrtibutes using this layout: [IFLA_NUM_VF] [IFLA_VF_PORTS] [IFLA_VF_PORT] [IFLA_PORT_*], ... [IFLA_VF_PORT] [IFLA_PORT_*], ... ... [IFLA_PORT_SELF] [IFLA_PORT_*], ... These attributes are design to be set and get symmetrically. VF_PORTS is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV device. PORT_SELF is for the PF of the SR-IOV device, in case it wants to also have a port-profile, or for the case where the VF==PF, like in enic patch 2/2 of this patch set. A port-profile is used to configure/enable the external switch virtual port backing the netdev interface, not to configure the host-facing side of the netdev. A port-profile is an identifier known to the switch. How port- profiles are installed on the switch or how available port-profiles are made know to the host is outside the scope of this patch. There are two types of port-profiles specs in the netlink msg. The first spec is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices that run a similar protocol as VDP but in firmware, thus hiding the protocol details. In either case, the specs have much in common and makes sense to define the netlink msg as the union of the two specs. For example, both specs have a notition of associating/deassociating a port-profile. And both specs require some information from the hypervisor manager, such as client port instance ID. The general flow is the port-profile is applied to a host netdev interface using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the switch, and the switch virtual port backing the host netdev interface is configured/enabled based on the settings defined by the port-profile. What those settings comprise, and how those settings are managed is again outside the scope of this patch, since this patch only deals with the first step in the flow. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Introduce skb_tunnel_rx() helperEric Dumazet
skb rxhash should be cleared when a skb is handled by a tunnel before being delivered again, so that correct packet steering can take place. There are other cleanups and accounting that we can factorize in a new helper, skb_tunnel_rx() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17tcp: tcp_synack_options() fix Eric Dumazet
Commit 33ad798c924b4a (tcp: options clean up) introduced a problem if MD5+SACK+timestamps were used in initial SYN message. Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP sessions, but since 40 bytes of tcp options space are not enough to store all the bits needed, we chose to disable timestamps in this case. We send a SYN-ACK _without_ timestamp option, but socket has timestamps enabled and all further outgoing messages contain a TS block, all with the initial timestamp of the remote peer. Fix is to really disable timestamps option for the whole session. Reported-by: Bijay Singh <Bijay.Singh@guavus.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17ipv6: fix the bug of address checkStephen Hemminger
The duplicate address check code got broken in the conversion to hlist (2.6.35). The earlier patch did not fix the case where two addresses match same hash value. Use two exit paths, rather than depending on state of loop variables (from macro). Based on earlier fix by Shan Wei. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-05-17net_sched: sch_hfsc: fix classification loopsPatrick McHardy
When attaching filters to a class pointing to a class higher up in the hierarchy, classification may enter an endless loop. Currently this is prevented for filters that are already resolved, but not for filters resolved at runtime. Only allow filters to point downwards in the hierarchy, similar to what CBQ does. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17tbf: stop wanton destruction of children (v2)stephen hemminger
Several netem users use TBF for rate control. But every time the parameters of TBF are changed it destroys the child qdisc, requiring reconfigation. Better to just keep child qdisc and just notify it of changed limit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Remove unnecessary semicolons after switch statementsJoe Perches
Also added an explicit break; to avoid a fallthrough in net/ipv4/tcp_input.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17X25: Remove bkl in sockoptsandrew hendry
Removes the BKL in x25 setsock and getsockopts. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17X25: Move accept approve flag to bitfieldandrew hendry
Moves the x25 accept approve flag from char into bitfield. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17X25: Move interrupt flag to bitfieldandrew hendry
Moves the x25 interrupt flag from char into bitfield. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17X25: Move qbit flag to bitfieldandrew hendry
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: No dst refcounting in ip_queue_xmit()Eric Dumazet
TCP outgoing packets can avoid two atomic ops, and dirtying of previously higly contended cache line using new refdst infrastructure. Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE Note 2: UDP packets dsts are built before ip_queue_xmit(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Use ip_route_input_noref() in input pathEric Dumazet
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per incoming packet. Note: loopback is excluded from this optimization in ip_rcv_finish() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: implements ip_route_input_noref()Eric Dumazet
ip_route_input() is the version returning a refcounted dst, while ip_route_input_noref() returns a non refcounted one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: add a noref bit on skb dstEric Dumazet
Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17rps: avoid one atomic in enqueue_to_backlogEric Dumazet
If CONFIG_SMP=y, then we own a queue spinlock, we can avoid the atomic test_and_set_bit() from napi_schedule_prep(). We now have same number of atomic ops per netif_rx() calls than with pre-RPS kernel. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17ipv6 addrlabel: permit deletion of labels assigned to removed devFlorian Westphal
as addrlabels with an interface index are left alone when the interface gets removed this results in addrlabels that can no longer be removed. Restrict validation of index to adding new addrlabels. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2010-05-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-05-16Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/linux/if_link.h
2010-05-17sunrpc: Include missing smp_lock.hFrederic Weisbecker
Now that cache_ioctl_procfs() calls the bkl explicitly, we need to include the relevant header as well. This fixes the following build error: net/sunrpc/cache.c: In function 'cache_ioctl_procfs': net/sunrpc/cache.c:1355: error: implicit declaration of function 'lock_kernel' net/sunrpc/cache.c:1359: error: implicit declaration of function 'unlock_kernel' Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-17procfs: Push down the bkl from ioctlFrederic Weisbecker
Push down the bkl from procfs's ioctl main handler to its users. Only three procfs users implement an ioctl (non unlocked) handler. Turn them into unlocked_ioctl and push down the Devil inside. v2: PDE(inode)->data doesn't need to be under bkl v3: And don't forget to git-add the result v4: Use wrappers to pushdown instead of an invasive and error prone handlers surgery. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Kacur <jkacur@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com>
2010-05-16rtnetlink: make SR-IOV VF interface symmetricChris Wright
Now we have a set of nested attributes: IFLA_VFINFO_LIST (NESTED) IFLA_VF_INFO (NESTED) IFLA_VF_MAC IFLA_VF_VLAN IFLA_VF_TX_RATE This allows a single set to operate on multiple attributes if desired. Among other things, it means a dump can be replayed to set state. The current interface has yet to be released, so this seems like something to consider for 2.6.34. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16sctp: delete active ICMP proto unreachable timer when free transportWei Yongjun
transport may be free before ICMP proto unreachable timer expire, so we should delete active ICMP proto unreachable timer when transport is going away. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16net: congestion notifications are not dropped packetsEric Dumazet
vlan/macvlan start_xmit() can inform caller of congestion with NET_XMIT_CN return value. This doesnt mean packet was dropped. Increment normal stat counters instead of tx_dropped. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16net: Introduce sk_route_nocapsEric Dumazet
TCP-MD5 sessions have intermittent failures, when route cache is invalidated. ip_queue_xmit() has to find a new route, calls sk_setup_caps(sk, &rt->u.dst), destroying the sk->sk_route_caps &= ~NETIF_F_GSO_MASK that MD5 desperately try to make all over its way (from tcp_transmit_skb() for example) So we send few bad packets, and everything is fine when tcp_transmit_skb() is called again for this socket. Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a socket field, sk_route_nocaps, containing bits to mask on sk_route_caps. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16tcp: fix MD5 (RFC2385) supportEric Dumazet
TCP MD5 support uses percpu data for temporary storage. It currently disables preemption so that same storage cannot be reclaimed by another thread on same cpu. We also have to make sure a softirq handler wont try to use also same context. Various bug reports demonstrated corruptions. Fix is to disable preemption and BH. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15net: Consistent skb timestampingEric Dumazet
With RPS inclusion, skb timestamping is not consistent in RX path. If netif_receive_skb() is used, its deferred after RPS dispatch. If netif_rx() is used, its done before RPS dispatch. This can give strange tcpdump timestamps results. I think timestamping should be done as soon as possible in the receive path, to get meaningful values (ie timestamps taken at the time packet was delivered by NIC driver to our stack), even if NAPI already can defer timestamping a bit (RPS can help to reduce the gap) Tom Herbert prefer to sample timestamps after RPS dispatch. In case sampling is expensive (HPET/acpi_pm on x86), this makes sense. Let admins switch from one mode to another, using a new sysctl, /proc/sys/net/core/netdev_tstamp_prequeue Its default value (1), means timestamps are taken as soon as possible, before backlog queueing, giving accurate timestamps. Setting a 0 value permits to sample timestamps when processing backlog, after RPS dispatch, to lower the load of the pre-RPS cpu. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15xfrm: fix policy unreferencing on larval dropTimo Teras
I mistakenly had the error path to use num_pols to decide how many policies we need to drop (cruft from earlier patch set version which did not handle socket policies right). This is wrong since normally we do not keep explicit references (instead we hold reference to the cache entry which holds references to policies). drop_pols is set to num_pols if we are holding the references, so use that. Otherwise we eventually BUG_ON inside xfrm_policy_destroy due to premature policy deletion. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15net: adjust handle_macvlan to pass port struct to hookJiri Pirko
Now there's null check here and also again in the hook. Looking at bridge bits which are simmilar, port structure is rcu_dereferenced right away in handle_bridge and passed to hook. Looks nicer. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15net: reserve ports for applications using fixed port numbersAmerigo Wang
(Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>