linux-toradex.git/drivers/net/bonding, branch v3.7

bonding: fix race condition in bonding_store_slaves_active

2012-11-29T18:13:15+00:00

Race between bonding_store_slaves_active() and slave manipulation
 functions. The bond_for_each_slave use in bonding_store_slaves_active()
 is not protected by any synchronization mechanism.
 NULL pointer dereference is easy to reach.
 Fixed by acquiring the bond->lock for the slave walk.

 v2: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: Jay Vosburgh 
Signed-off-by: David S. Miller

bonding: make arp_ip_target parameter checks consistent with sysfs

2012-11-29T18:13:15+00:00

The module can be loaded with arp_ip_target="255.255.255.255" which makes
 it impossible to remove as the function in sysfs checks for that value,
 so we make the parameter checks consistent with sysfs.

 v2: Fix formatting
 v3: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: Jay Vosburgh 
Signed-off-by: David S. Miller

bonding: fix miimon and arp_interval delayed work race conditions

2012-11-29T18:13:15+00:00

First I would give three observations which will be used later.
Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
 This usage is wrong because the pending bit is cleared just before the
 work's fn is executed and if the function re-arms itself we might end up
 with the work still running. It's safe to call cancel_delayed_work_sync()
 even if the work is not queued at all.
Observation 2: Use of INIT_DELAYED_WORK()
 Work needs to be initialized only once prior to (de/en)queueing.
Observation 3: IFF_UP is set only after ndo_open is called

Related race conditions:
1. Race between bonding_store_miimon() and bonding_store_arp_interval()
 Because of Obs.1 we can end up having both works enqueued.
2. Multiple races with INIT_DELAYED_WORK()
 Since the works are not protected by anything between INIT_DELAYED_WORK()
 and calls to (en/de)queue it is possible for races between the following
 functions:
 (races are also possible between the calls to INIT_DELAYED_WORK()
  and workqueue code)
 bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
			  bond_open(), enqueued functions
 bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
				bond_open(), enqueued functions
3. By Obs.1 we need to change bond_cancel_all()

Bugs 1 and 2 are fixed by moving all work initializations in bond_open
which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
are cancelled in bond_close(), is guaranteed not to have any work
enqueued.
Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
they can't race with bond_close and bond_open. The opposing work is
cancelled only if the IFF_UP flag is set and it is cancelled
unconditionally. The opposing work is already cancelled if the interface
is down so no need to cancel it again. This way we don't need new
synchronizations for the bonding workqueue. These bugs (and fixes) are
tied together and belong in the same patch.
Note: I have left 1 line intentionally over 80 characters (84) because I
      didn't like how it looks broken down. If you'd prefer it otherwise,
      then simply break it.

 v2: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: Jay Vosburgh 
Signed-off-by: David S. Miller

bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices.

2012-11-21T16:50:31+00:00

Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach.

Signed-off-by: Sarveshwar Bandi 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

bonding: fix second off-by-one error

2012-11-01T15:53:44+00:00

Fix off-by-one error because IFNAMSIZ == 16 and when this
code gets executed we stick a NULL byte where we should not.

How to reproduce:
 with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently)
 modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode;
 echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/active_slave;

Signed-off-by: Nikolay Aleksandrov 

Note: Sorry for the second patch but I missed this one while checking
      the file. You can squash them into one patch.
Signed-off-by: David S. Miller

bonding: fix off-by-one error

2012-11-01T15:53:43+00:00

Fix off-by-one error because IFNAMSIZ == 16 and when this
code gets executed we stick a NULL byte where we should not.

How to reproduce:
 with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently)
 modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode;
 echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/primary;

Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller

vlan: fix bond/team enslave of vlan challenged slave/port

2012-10-16T18:41:46+00:00

In vlan_uses_dev() check for number of vlan devs rather than existence
of vlan_info. The reason is that vlan id 0 is there without appropriate
vlan dev on it by default which prevented from enslaving vlan challenged
dev.

Reported-by: Jon Stanley 
Signed-off-by: Jiri Pirko 
Signed-off-by: David S. Miller

bonding: set qdisc_tx_busylock to avoid LOCKDEP splat

2012-10-04T19:53:48+00:00

If a qdisc is installed on a bonding device, its possible to get
following lockdep splat under stress :

 =============================================
 [ INFO: possible recursive locking detected ]
 3.6.0+ #211 Not tainted
 ---------------------------------------------
 ping/4876 is trying to acquire lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830

 but task is already holding lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by ping/4876:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [] raw_sendmsg+0x600/0xc30
  #1:  (rcu_read_lock_bh){.+....}, at: [] ip_finish_output+0x12d/0x870
  #2:  (rcu_read_lock_bh){.+....}, at: [] dev_queue_xmit+0x0/0x830
  #3:  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830
  #4:  (&bond->lock){++.?..}, at: [] bond_start_xmit+0x31/0x4b0 [bonding]
  #5:  (rcu_read_lock_bh){.+....}, at: [] dev_queue_xmit+0x0/0x830

 stack backtrace:
 Pid: 4876, comm: ping Not tainted 3.6.0+ #211
 Call Trace:
  [] __lock_acquire+0x715/0x1b80
  [] ? mark_held_locks+0x9b/0x100
  [] lock_acquire+0x92/0x1d0
  [] ? dev_queue_xmit+0xe1/0x830
  [] _raw_spin_lock+0x3c/0x50
  [] ? dev_queue_xmit+0xe1/0x830
  [] ? rcu_read_lock_bh_held+0x5d/0x90
  [] dev_queue_xmit+0xe1/0x830
  [] ? netdev_pick_tx+0x570/0x570
  [] bond_start_xmit+0x1da/0x4b0 [bonding]
  [] dev_hard_start_xmit+0x240/0x6b0
  [] sch_direct_xmit+0xfe/0x2a0
  [] dev_queue_xmit+0x199/0x830
  [] ? netdev_pick_tx+0x570/0x570
  [] ip_finish_output+0x5df/0x870
  [] ? ip_finish_output+0x12d/0x870
  [] ip_output+0x54/0xf0
  [] ip_local_out+0x28/0x90
  [] ip_send_skb+0x14/0x50
  [] ip_push_pending_frames+0x32/0x40
  [] raw_sendmsg+0x93a/0xc30
  [] ? selinux_file_send_sigiotask+0x1f0/0x1f0
  [] ? __lock_is_held+0x54/0x80
  [] ? inet_recvmsg+0x220/0x220
  [] ? __lock_is_held+0x54/0x80
  [] inet_sendmsg+0x125/0x240
  [] ? inet_recvmsg+0x220/0x220
  [] sock_sendmsg+0xab/0xe0
  [] ? lock_release_non_nested+0xa0/0x2e0
  [] ? lock_release_non_nested+0xa0/0x2e0
  [] __sys_sendmsg+0x37c/0x390
  [] ? fsnotify+0x2ca/0x7e0
  [] ? fsnotify+0x88/0x7e0
  [] ? put_ldisc+0x56/0xd0
  [] ? fget_light+0x3da/0x510
  [] sys_sendmsg+0x44/0x80
  [] system_call_fastpath+0x16/0x1b

Avoid this problem using a distinct lock_class_key for bonding
devices.

Signed-off-by: Eric Dumazet 
Cc: Jay Vosburgh 
Cc: Andy Gospodarek 
Signed-off-by: David S. Miller

bonding: add some slack to arp monitoring time limits

2012-08-31T20:37:12+00:00

Currently, all the time limits in the bonding ARP monitor are in
multiples of arp_interval -- the time interval at which the ARP
monitor is periodically scheduled.

With a fast network round-trip and a little scheduling latency
of the ARP monitor work, a limit of n*delta_in_ticks may
effectively mean (n-1)*delta_in_ticks.

This is fatal in case of n==1  (the link will stay down
forever) and makes the behaviour non-deterministic in all the
other cases.

Add a delta_in_ticks/2 time slack to all the time limits.

Signed-off-by: Jiri Bohac 
Signed-off-by: David S. Miller

bonding: support for IPv6 transmit hashing

2012-08-23T05:49:30+00:00

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham 
Signed-off-by: John Eaglesham 
Signed-off-by: David S. Miller