linux-toradex.git/drivers/infiniband/ulp/ipoib, branch v3.13

infiniband: make sure the src net is infiniband when create new link

2014-01-04T01:38:56+00:00

When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu 
Signed-off-by: David S. Miller

IPoIB: lower NAPI weight

2013-11-08T22:42:50+00:00

Since commit 82dc3c63c692 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt 
Signed-off-by: Roland Dreier

IPoIB: Start multicast join process only on active ports

2013-11-08T22:42:49+00:00

The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Add path query flushing in ipoib_ib_dev_cleanup

2013-11-08T22:42:49+00:00

The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv->pd object.

The following crash is one of the results:
RIP: 0010:[]  [] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[] ? get_random_bytes+0x20/0x30
[] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[] worker_thread+0x170/0x2a0
[] ? autoremove_wake_function+0x0/0x40
[] ? worker_thread+0x0/0x2a0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze 
Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Fix usage of uninitialized multicast objects

2013-11-08T22:42:49+00:00

The driver should avoid calling ib_sa_free_multicast on the mcast->mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Avoid flushing the driver workqueue on dev_down

2013-11-08T22:42:49+00:00

The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()

2013-11-08T22:42:49+00:00

When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [] schedule_timeout+0x215/0x2e0
[11028.178465]  [] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [] wait_for_common+0x123/0x180
[11028.192491]  [] ? default_wake_function+0x0/0x20
[11028.199504]  [] wait_for_completion+0x1d/0x20
[11028.206224]  [] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [] ? wq_barrier_func+0x0/0x20
[11028.219375]  [] flush_workqueue+0x54/0x80
[11028.225712]  [] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [] dev_close+0x71/0xc0
[11028.254447]  [] dev_change_flags+0xa1/0x1d0
[11028.261062]  [] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [] dev_close+0x71/0xc0
[11028.273922]  [] dev_change_flags+0xa1/0x1d0
[11028.280452]  [] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [] inet_ioctl+0x88/0xa0
[11028.292633]  [] sock_ioctl+0x7a/0x280
[11028.298576]  [] vfs_ioctl+0x22/0xa0
[11028.304326]  [] ? unmap_region+0x110/0x130
[11028.310756]  [] do_vfs_ioctl+0x84/0x580
[11028.316897]  [] sys_ioctl+0x81/0xa0

and

11028.017533]  [] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [] mutex_lock+0x2b/0x50
[11028.044910]  [] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [] worker_thread+0x170/0x2a0
[11028.076068]  [] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [] ? worker_thread+0x0/0x2a0
[11028.089709]  [] kthread+0x96/0xa0
[11028.095266]  [] child_rip+0xa/0x20
[11028.100921]  [] ? kthread+0x0/0xa0
[11028.106573]  [] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Change CM skb memory allocation to be non-atomic during init

2013-11-08T22:42:48+00:00

Change CM skb memory allocation to use GFP_KERNEL when possible.

During device init there's no need to use GFP_ATOMIC when allocating
memory for the CM skbs -- use GFP_KERNEL instead.

Signed-off-by: Tal Alon 
Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Fix crash in dev_open error flow

2013-11-08T22:42:48+00:00

If napi has never been enabled when calling ipoib_ib_dev_stop, a
kernel crash occurs, because the verbs layer completion handler
(ipoib_ib_completion) calls napi_schedule unconditionally.

If the napi structure passed in the napi_schedule call has not
been initialized, napi will crash.

The cleanest solution is to simply enable napi before calling
ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then
immediately disables napi).

Signed-off-by: Jack Morgenstein 
Signed-off-by: Erez Shitrit 
Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier

IPoIB: Fix race in deleting ipoib_neigh entries

2013-08-13T18:57:37+00:00

In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker 
Reviewed-by: Or Gerlitz 
Reviewed-by: Jack Wang 
Reviewed-by: Shlomo Pongratz 
Signed-off-by: Roland Dreier