summaryrefslogtreecommitdiff
path: root/net/ipv4/devinet.c
AgeCommit message (Collapse)Author
2013-01-06ipv4: fix NULL checking in devinet_ioctl()Xi Wang
The NULL pointer check `!ifa' should come before its first use. [ Bug origin : commit fd23c3b31107e2fc483301ee923d8a1db14e53f4 (ipv4: Add hash table of interface addresses) in linux-2.6.39 ] Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04netconf: advertise mc_forwarding statusNicolas Dichtel
This patch advertise the MC_FORWARDING status for IPv4 and IPv6. This field is readonly, only multicast engine in the kernel updates it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Enable a userns root rtnl calls that are safe for unprivilged usersEric W. Biederman
- Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Enable some sysctls that are safe for the userns rootEric W. Biederman
- Enable the per device ipv4 sysctls: net/ipv4/conf/<if>/forwarding net/ipv4/conf/<if>/mc_forwarding net/ipv4/conf/<if>/accept_redirects net/ipv4/conf/<if>/secure_redirects net/ipv4/conf/<if>/shared_media net/ipv4/conf/<if>/rp_filter net/ipv4/conf/<if>/send_redirects net/ipv4/conf/<if>/accept_source_route net/ipv4/conf/<if>/accept_local net/ipv4/conf/<if>/src_valid_mark net/ipv4/conf/<if>/proxy_arp net/ipv4/conf/<if>/medium_id net/ipv4/conf/<if>/bootp_relay net/ipv4/conf/<if>/log_martians net/ipv4/conf/<if>/tag net/ipv4/conf/<if>/arp_filter net/ipv4/conf/<if>/arp_announce net/ipv4/conf/<if>/arp_ignore net/ipv4/conf/<if>/arp_accept net/ipv4/conf/<if>/arp_notify net/ipv4/conf/<if>/proxy_arp_pvlan net/ipv4/conf/<if>/disable_xfrm net/ipv4/conf/<if>/disable_policy net/ipv4/conf/<if>/force_igmp_version net/ipv4/conf/<if>/promote_secondaries net/ipv4/conf/<if>/route_localnet - Enable the global ipv4 sysctl: net/ipv4/ip_forward - Enable the per device ipv6 sysctls: net/ipv6/conf/<if>/forwarding net/ipv6/conf/<if>/hop_limit net/ipv6/conf/<if>/mtu net/ipv6/conf/<if>/accept_ra net/ipv6/conf/<if>/accept_redirects net/ipv6/conf/<if>/autoconf net/ipv6/conf/<if>/dad_transmits net/ipv6/conf/<if>/router_solicitations net/ipv6/conf/<if>/router_solicitation_interval net/ipv6/conf/<if>/router_solicitation_delay net/ipv6/conf/<if>/force_mld_version net/ipv6/conf/<if>/use_tempaddr net/ipv6/conf/<if>/temp_valid_lft net/ipv6/conf/<if>/temp_prefered_lft net/ipv6/conf/<if>/regen_max_retry net/ipv6/conf/<if>/max_desync_factor net/ipv6/conf/<if>/max_addresses net/ipv6/conf/<if>/accept_ra_defrtr net/ipv6/conf/<if>/accept_ra_pinfo net/ipv6/conf/<if>/accept_ra_rtr_pref net/ipv6/conf/<if>/router_probe_interval net/ipv6/conf/<if>/accept_ra_rt_info_max_plen net/ipv6/conf/<if>/proxy_ndp net/ipv6/conf/<if>/accept_source_route net/ipv6/conf/<if>/optimistic_dad net/ipv6/conf/<if>/mc_forwarding net/ipv6/conf/<if>/disable_ipv6 net/ipv6/conf/<if>/accept_dad net/ipv6/conf/<if>/force_tllao - Enable the global ipv6 sysctls: net/ipv6/bindv6only net/ipv6/icmp/ratelimit Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Allow userns root to control ipv4Eric W. Biederman
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow creating raw sockets. Allow the SIOCSARP ioctl to control the arp cache. Allow the SIOCSIFFLAG ioctl to allow setting network device flags. Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address. Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address. Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address. Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask. Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting gre tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipip tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipsec virtual tunnel interfaces. Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC, MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing sockets. Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and arbitrary ip options. Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option. Allow setting the IP_TRANSPARENT ipv4 socket option. Allow setting the TCP_REPAIR socket option. Allow setting the TCP_CONGESTION socket option. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Push capable(CAP_NET_ADMIN) into the rtnl methodsEric W. Biederman
- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Don't export sysctls to unprivileged usersEric W. Biederman
In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01rtnl/ipv4: use netconf msg to advertise rp_filter statusNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-28rtnl/ipv4: add support of RTM_GETNETCONFNicolas Dichtel
This message allows to get the devconf for an interface. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-28rtnl/ipv4: use netconf msg to advertise forwarding statusNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-21net: change return values from -EACCES to -EPERMZhao Hongjiang
Change return value from -EACCES to -EPERM when the permission check fails. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18ipv4/route: arg delay is useless in rt_cache_flush()Nicolas Dichtel
Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07ipv4/route: arg delay is useless in rt_cache_flush()Nicolas Dichtel
Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23net: reinstate rtnl in call_netdevice_notifiers()Eric Dumazet
Eric Biederman pointed out that not holding RTNL while calling call_netdevice_notifiers() was racy. This patch is a direct transcription his feedback against commit 0115e8e30d6fc (net: remove delay at device dismantle) Thanks Eric ! Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22net: remove delay at device dismantleEric Dumazet
I noticed extra one second delay in device dismantle, tracked down to a call to dst_dev_event() while some call_rcu() are still in RCU queues. These call_rcu() were posted by rt_free(struct rtable *rt) calls. We then wait a little (but one second) in netdev_wait_allrefs() before kicking again NETDEV_UNREGISTER. As the call_rcu() are now completed, dst_dev_event() can do the needed device swap on busy dst. To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called after a rcu_barrier(), but outside of RTNL lock. Use NETDEV_UNREGISTER_FINAL with care ! Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after IP cache removal. With help from Gao feng Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04ipv4: change inet_addr_hash()Eric Dumazet
Use net_hash_mix(net) instead of hash_ptr(net, 8), and use hash_32() instead of using a serie of XOR Define IN4_ADDR_HSIZE_SHIFT for clarity __ip_dev_find() can perform the net_eq() call only if ifa_local matches the key, to avoid unneeded dereferences. remove inline attributes # size net/ipv4/devinet.o.before net/ipv4/devinet.o text data bss dec hex filename 17471 2545 2048 22064 5630 net/ipv4/devinet.o.before 17335 2545 2048 21928 55a8 net/ipv4/devinet.o Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12ipv4: Add interface option to enable routing of 127.0.0.0/8Thomas Graf
Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debugJoe Perches
Use the current debugging style and enable dynamic_debug. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net ipv4: Convert devinet to use register_net_sysctlEric W. Biederman
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-04-02ipv4: Stop using NLA_PUT*().David S. Miller
These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-22bonding: remove entries for master_ip and vlan_ip and query devices insteadAndy Gospodarek
The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit 917fbdb32f37e9a93b00bb12ee83532982982df3 Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12net: reintroduce missing rcu_assign_pointer() callsEric Dumazet
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01ipv4: flush route cache after change accept_localPeter Pan(潘卫平)
After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0, we should flush route cache, or it will continue receive packets with local source address, which should be dropped. Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTERStephen Hemminger
When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-25IPv4: Send gratuitous ARP for secondary IP addresses alsoZoltan Kiss
If a device event generates gratuitous ARP messages, only primary address is used for sending. This patch iterates through the whole list. Tested with 2 IP addresses configuration on bonding interface. Signed-off-by: Zoltan Kiss <schaman@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose
The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-05-10net: fix two lockdep splatsEric Dumazet
Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump callbacks) switched rtnl protection to RCU, but we forgot to adjust two rcu_dereference() lockdep annotations : inet_get_link_af_size() or inet_fill_link_af() might be called with rcu_read_lock or rtnl held, so use rcu_dereference_rtnl() instead of rtnl_dereference() Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02sysctl: net: call unregister_net_sysctl_table where neededLucian Adrian Grijincu
ctl_table_headers registered with register_net_sysctl_table should have been unregistered with the equivalent unregister_net_sysctl_table Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23ipv4: Fallback to FIB local table in __ip_dev_find().David S. Miller
In commit 9435eb1cf0b76b323019cebf8d16762a50a12a19 ("ipv4: Implement __ip_dev_find using new interface address hash.") we reimplemented __ip_dev_find() so that it doesn't have to do a full FIB table lookup. Instead, it consults a hash table of addresses configured to interfaces. This works identically to the old code in all except one case, and that is for loopback subnets. The old code would match the loopback device for any IP address that falls within a subnet configured to the loopback device. Handle this corner case by doing the FIB lookup. We could implement this via inet_addr_onlink() but: 1) Someone could configure many addresses to loopback and inet_addr_onlink() is a simple list traversal. 2) We know the old code works. Reported-by: Julian Anastasov <ja@ssi.bg> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: optimize route adding on secondary promotionJulian Anastasov
Optimize the calling of fib_add_ifaddr for all secondary addresses after the promoted one to start from their place, not from the new place of the promoted secondary. It will save some CPU cycles because we are sure the promoted secondary was first for the subnet and all next secondaries do not change their place. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: remove the routes on secondary promotionJulian Anastasov
The secondary address promotion relies on fib_sync_down_addr to remove all routes created for the secondary addresses when the old primary address is deleted. It does not happen for cases when the primary address is also in another subnet. Fix that by deleting local and broadcast routes for all secondaries while they are on device list and by faking that all addresses from this subnet are to be deleted. It relies on fib_del_ifaddr being able to ignore the IPs from the concerned subnet while checking for duplication. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-10Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_cmn.c
2011-03-09ipv4: Fix erroneous uses of ifa_address.David S. Miller
In usual cases ifa_address == ifa_local, but in the case where SIOCSIFDSTADDR sets the destination address on a point-to-point link, ifa_address gets set to that destination address. Therefore we should use ifa_local when we want the local interface address. There were two cases where the selection was done incorrectly: 1) When devinet_ioctl() does matching, it checks ifa_address even though gifconf correct reported ifa_local to the user 2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using ifa_address instead of ifa_local. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix __ip_dev_find() to use ifa_local instead of ifa_address.David S. Miller
Reported-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c
2011-02-18ipv4: Implement __ip_dev_find using new interface address hash.David S. Miller
Much quicker than going through the FIB tables. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18ipv4: Add hash table of interface addresses.David S. Miller
This will be used to optimize __ip_dev_find() and friends. With help from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.Ian Campbell
NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link notification while NETDEV_UP/NETDEV_CHANGEADDR generate link notifications as a sort of side effect. In the later cases the sysctl option is present because link notification events can have undesired effects e.g. if the link is flapping. I don't think this applies in the case of an explicit request from a driver. This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we could add a new sysctl for this case which defaults to on. This change causes Xen post-migration ARP notifications (which cause switches to relearn their MAC tables etc) to be sent by default. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12ipv4: Don't pre-seed hoplimit metric.David S. Miller
Always go through a new ip4_dst_hoplimit() helper, just like ipv6. This allowed several simplifications: 1) The interim dst_metric_hoplimit() can go as it's no longer userd. 2) The sysctl_ip_default_ttl entry no longer needs to use ipv4_doint_and_flush, since the sysctl is not cached in routing cache metrics any longer. 3) ipv4_doint_and_flush no longer needs to be exported and therefore can be marked static. When ipv4_doint_and_flush_strategy was removed some time ago, the external declaration in ip.h was mistakenly left around so kill that off too. We have to move the sysctl_ip_default_ttl declaration into ipv4's route cache definition header net/route.h, because currently net/ip.h (where the declaration lives now) has a back dependency on net/route.h Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06net: kill an RCU warning in inet_fill_link_af()Eric Dumazet
commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfeb8c (rtnl: make link af-specific updates atomic) used incorrect __in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU warnings. Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold RTNL. Based on a report and initial patch from Amerigo Wang. Reported-by: Amerigo Wang <amwang@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27rtnl: make link af-specific updates atomicThomas Graf
As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17ipv4: AF_INET link address familyThomas Graf
Implements the AF_INET link address family exposing the per device configuration settings via netlink using the attribute IFLA_INET_CONF. The format of IFLA_INET_CONF differs depending on the direction the attribute is sent. The attribute sent by the kernel consists of a u32 array, basically a 1:1 copy of in_device->cnf.data[]. The attribute expected by the kernel must consist of a sequence of nested u32 attributes, each representing a change request, e.g. [IFLA_INET_CONF] = { [IPV4_DEVCONF_FORWARDING] = 1, [IPV4_DEVCONF_NOXFRM] = 0, } libnl userspace API documentation and example available from: http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-19inet: RCU changes in inetdev_by_index()Eric Dumazet
Convert inetdev_by_index() to not increment in_dev refcount. Callers hold RCU or RTNL, and should not decrement in_dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-15ipv4: ip_ptr cleanupsEric Dumazet
dev->ip_ptr is protected by rtnl and rcu. Yet some places dont use appropriate primitives and/or locking rules. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>