<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/infiniband/ulp/ipoib, branch v3.14.4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next</title>
<updated>2014-01-23T07:24:13+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2014-01-23T07:24:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8f399921ea9a562bc8221258c4b8a7bd69577939'/>
<id>8f399921ea9a562bc8221258c4b8a7bd69577939</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Report operstate consistently when brought up without a link</title>
<updated>2014-01-23T07:01:05+00:00</updated>
<author>
<name>Michal Schmidt</name>
<email>mschmidt@redhat.com</email>
</author>
<published>2014-01-17T18:47:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=437708c44395a11e474fb33b4fd7f29483118e51'/>
<id>437708c44395a11e474fb33b4fd7f29483118e51</id>
<content type='text'>
After booting without a working link, "ip link" shows:

 5: mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbbd92 ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Acked-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After booting without a working link, "ip link" shows:

 5: mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbbd92 ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Acked-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/core: Add flow steering support for IPoIB UD traffic</title>
<updated>2014-01-14T22:06:50+00:00</updated>
<author>
<name>Matan Barak</name>
<email>matanb@mellanox.com</email>
</author>
<published>2013-11-07T13:25:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=90f1d1b41b70474bf73d07d4300196901cd81718'/>
<id>90f1d1b41b70474bf73d07d4300196901cd81718</id>
<content type='text'>
When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering.  This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak &lt;matanb@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering.  This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak &lt;matanb@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>infiniband: make sure the src net is infiniband when create new link</title>
<updated>2014-01-04T01:38:56+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2014-01-03T03:33:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0d68fc4f1210f8caea2bdd68f99dc6da35ee3740'/>
<id>0d68fc4f1210f8caea2bdd68f99dc6da35ee3740</id>
<content type='text'>
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: lower NAPI weight</title>
<updated>2013-11-08T22:42:50+00:00</updated>
<author>
<name>Michal Schmidt</name>
<email>mschmidt@redhat.com</email>
</author>
<published>2013-08-21T16:50:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f1a38671c55b5cbda77dbbda8b4651224c50cd7'/>
<id>7f1a38671c55b5cbda77dbbda8b4651224c50cd7</id>
<content type='text'>
Since commit 82dc3c63c692 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 82dc3c63c692 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Start multicast join process only on active ports</title>
<updated>2013-11-08T22:42:49+00:00</updated>
<author>
<name>Erez Shitrit</name>
<email>erezsh@mellanox.com</email>
</author>
<published>2013-10-16T14:37:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94232d9ce81755c4b0c1536648442383442b27e0'/>
<id>94232d9ce81755c4b0c1536648442383442b27e0</id>
<content type='text'>
The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Add path query flushing in ipoib_ib_dev_cleanup</title>
<updated>2013-11-08T22:42:49+00:00</updated>
<author>
<name>Erez Shitrit</name>
<email>erezsh@mellanox.com</email>
</author>
<published>2013-10-16T14:37:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a39c52ab887fdcefae1d7f467fb0621f30833c84'/>
<id>a39c52ab887fdcefae1d7f467fb0621f30833c84</id>
<content type='text'>
The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv-&gt;pd object.

The following crash is one of the results:
RIP: 0010:[&lt;ffffffffa021bb47&gt;]  [&lt;ffffffffa021bb47&gt;] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[&lt;ffffffff81309ef0&gt;] ? get_random_bytes+0x20/0x30
[&lt;ffffffffa021be2a&gt;] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[&lt;ffffffffa021f765&gt;] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[&lt;ffffffffa021f550&gt;] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[&lt;ffffffff8108b2b0&gt;] worker_thread+0x170/0x2a0
[&lt;ffffffff81090bf0&gt;] ? autoremove_wake_function+0x0/0x40
[&lt;ffffffff8108b140&gt;] ? worker_thread+0x0/0x2a0
[&lt;ffffffff81090886&gt;] kthread+0x96/0xa0
[&lt;ffffffff8100c14a&gt;] child_rip+0xa/0x20
[&lt;ffffffff810907f0&gt;] ? kthread+0x0/0xa0
[&lt;ffffffff8100c140&gt;] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze &lt;markuze@mellanox.com&gt;
Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv-&gt;pd object.

The following crash is one of the results:
RIP: 0010:[&lt;ffffffffa021bb47&gt;]  [&lt;ffffffffa021bb47&gt;] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[&lt;ffffffff81309ef0&gt;] ? get_random_bytes+0x20/0x30
[&lt;ffffffffa021be2a&gt;] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[&lt;ffffffffa021f765&gt;] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[&lt;ffffffffa021f550&gt;] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[&lt;ffffffff8108b2b0&gt;] worker_thread+0x170/0x2a0
[&lt;ffffffff81090bf0&gt;] ? autoremove_wake_function+0x0/0x40
[&lt;ffffffff8108b140&gt;] ? worker_thread+0x0/0x2a0
[&lt;ffffffff81090886&gt;] kthread+0x96/0xa0
[&lt;ffffffff8100c14a&gt;] child_rip+0xa/0x20
[&lt;ffffffff810907f0&gt;] ? kthread+0x0/0xa0
[&lt;ffffffff8100c140&gt;] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze &lt;markuze@mellanox.com&gt;
Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix usage of uninitialized multicast objects</title>
<updated>2013-11-08T22:42:49+00:00</updated>
<author>
<name>Erez Shitrit</name>
<email>erezsh@mellanox.com</email>
</author>
<published>2013-10-16T14:37:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a9c8ba588495547d1598f1b83d5eb086bef65e4b'/>
<id>a9c8ba588495547d1598f1b83d5eb086bef65e4b</id>
<content type='text'>
The driver should avoid calling ib_sa_free_multicast on the mcast-&gt;mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The driver should avoid calling ib_sa_free_multicast on the mcast-&gt;mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Avoid flushing the driver workqueue on dev_down</title>
<updated>2013-11-08T22:42:49+00:00</updated>
<author>
<name>Erez Shitrit</name>
<email>erezsh@mellanox.com</email>
</author>
<published>2013-10-16T14:37:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aede25011fddf559dcf216d86975187e3f64b109'/>
<id>aede25011fddf559dcf216d86975187e3f64b109</id>
<content type='text'>
The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()</title>
<updated>2013-11-08T22:42:49+00:00</updated>
<author>
<name>Erez Shitrit</name>
<email>erezsh@mellanox.com</email>
</author>
<published>2013-10-16T14:37:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f47944cc2dba3c7e6f753b81e9f713f4d12bdd5a'/>
<id>f47944cc2dba3c7e6f753b81e9f713f4d12bdd5a</id>
<content type='text'>
When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [&lt;ffffffff812b0977&gt;] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [&lt;ffffffff814eaac5&gt;] schedule_timeout+0x215/0x2e0
[11028.178465]  [&lt;ffffffff8105a5c3&gt;] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [&lt;ffffffff814ea743&gt;] wait_for_common+0x123/0x180
[11028.192491]  [&lt;ffffffff8105fa40&gt;] ? default_wake_function+0x0/0x20
[11028.199504]  [&lt;ffffffff814ea85d&gt;] wait_for_completion+0x1d/0x20
[11028.206224]  [&lt;ffffffff8108b4f1&gt;] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [&lt;ffffffff8108b5a0&gt;] ? wq_barrier_func+0x0/0x20
[11028.219375]  [&lt;ffffffff8108bfc4&gt;] flush_workqueue+0x54/0x80
[11028.225712]  [&lt;ffffffffa05a0576&gt;] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [&lt;ffffffffa059ccea&gt;] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [&lt;ffffffffa059849a&gt;] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [&lt;ffffffff8142adf1&gt;] dev_close+0x71/0xc0
[11028.254447]  [&lt;ffffffff8142a631&gt;] dev_change_flags+0xa1/0x1d0
[11028.261062]  [&lt;ffffffffa059851b&gt;] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [&lt;ffffffff8142adf1&gt;] dev_close+0x71/0xc0
[11028.273922]  [&lt;ffffffff8142a631&gt;] dev_change_flags+0xa1/0x1d0
[11028.280452]  [&lt;ffffffff8148f20b&gt;] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [&lt;ffffffff814903b8&gt;] inet_ioctl+0x88/0xa0
[11028.292633]  [&lt;ffffffff8141591a&gt;] sock_ioctl+0x7a/0x280
[11028.298576]  [&lt;ffffffff81189012&gt;] vfs_ioctl+0x22/0xa0
[11028.304326]  [&lt;ffffffff81140540&gt;] ? unmap_region+0x110/0x130
[11028.310756]  [&lt;ffffffff811891b4&gt;] do_vfs_ioctl+0x84/0x580
[11028.316897]  [&lt;ffffffff81189731&gt;] sys_ioctl+0x81/0xa0

and

11028.017533]  [&lt;ffffffff8105a5c3&gt;] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [&lt;ffffffff8100bb8e&gt;] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [&lt;ffffffff814eb2ae&gt;] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [&lt;ffffffff814eb14b&gt;] mutex_lock+0x2b/0x50
[11028.044910]  [&lt;ffffffffa059f7e7&gt;] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [&lt;ffffffffa059fa00&gt;] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [&lt;ffffffffa059fa17&gt;] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [&lt;ffffffff8108b120&gt;] worker_thread+0x170/0x2a0
[11028.076068]  [&lt;ffffffff81090990&gt;] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [&lt;ffffffff8108afb0&gt;] ? worker_thread+0x0/0x2a0
[11028.089709]  [&lt;ffffffff81090626&gt;] kthread+0x96/0xa0
[11028.095266]  [&lt;ffffffff8100c0ca&gt;] child_rip+0xa/0x20
[11028.100921]  [&lt;ffffffff81090590&gt;] ? kthread+0x0/0xa0
[11028.106573]  [&lt;ffffffff8100c0c0&gt;] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [&lt;ffffffff812b0977&gt;] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [&lt;ffffffff814eaac5&gt;] schedule_timeout+0x215/0x2e0
[11028.178465]  [&lt;ffffffff8105a5c3&gt;] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [&lt;ffffffff814ea743&gt;] wait_for_common+0x123/0x180
[11028.192491]  [&lt;ffffffff8105fa40&gt;] ? default_wake_function+0x0/0x20
[11028.199504]  [&lt;ffffffff814ea85d&gt;] wait_for_completion+0x1d/0x20
[11028.206224]  [&lt;ffffffff8108b4f1&gt;] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [&lt;ffffffff8108b5a0&gt;] ? wq_barrier_func+0x0/0x20
[11028.219375]  [&lt;ffffffff8108bfc4&gt;] flush_workqueue+0x54/0x80
[11028.225712]  [&lt;ffffffffa05a0576&gt;] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [&lt;ffffffffa059ccea&gt;] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [&lt;ffffffffa059849a&gt;] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [&lt;ffffffff8142adf1&gt;] dev_close+0x71/0xc0
[11028.254447]  [&lt;ffffffff8142a631&gt;] dev_change_flags+0xa1/0x1d0
[11028.261062]  [&lt;ffffffffa059851b&gt;] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [&lt;ffffffff8142adf1&gt;] dev_close+0x71/0xc0
[11028.273922]  [&lt;ffffffff8142a631&gt;] dev_change_flags+0xa1/0x1d0
[11028.280452]  [&lt;ffffffff8148f20b&gt;] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [&lt;ffffffff814903b8&gt;] inet_ioctl+0x88/0xa0
[11028.292633]  [&lt;ffffffff8141591a&gt;] sock_ioctl+0x7a/0x280
[11028.298576]  [&lt;ffffffff81189012&gt;] vfs_ioctl+0x22/0xa0
[11028.304326]  [&lt;ffffffff81140540&gt;] ? unmap_region+0x110/0x130
[11028.310756]  [&lt;ffffffff811891b4&gt;] do_vfs_ioctl+0x84/0x580
[11028.316897]  [&lt;ffffffff81189731&gt;] sys_ioctl+0x81/0xa0

and

11028.017533]  [&lt;ffffffff8105a5c3&gt;] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [&lt;ffffffff8100bb8e&gt;] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [&lt;ffffffff814eb2ae&gt;] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [&lt;ffffffff814eb14b&gt;] mutex_lock+0x2b/0x50
[11028.044910]  [&lt;ffffffffa059f7e7&gt;] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [&lt;ffffffffa059fa00&gt;] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [&lt;ffffffffa059fa17&gt;] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [&lt;ffffffff8108b120&gt;] worker_thread+0x170/0x2a0
[11028.076068]  [&lt;ffffffff81090990&gt;] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [&lt;ffffffff8108afb0&gt;] ? worker_thread+0x0/0x2a0
[11028.089709]  [&lt;ffffffff81090626&gt;] kthread+0x96/0xa0
[11028.095266]  [&lt;ffffffff8100c0ca&gt;] child_rip+0xa/0x20
[11028.100921]  [&lt;ffffffff81090590&gt;] ? kthread+0x0/0xa0
[11028.106573]  [&lt;ffffffff8100c0c0&gt;] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit &lt;erezsh@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
