<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/infiniband/ulp/ipoib, branch v3.4.32</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>IPoIB: Fix use-after-free of multicast object</title>
<updated>2012-10-07T15:32:28+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2012-08-30T07:01:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=350f3edb0a05c340853c75f4c5007a72ebd9e32f'/>
<id>350f3edb0a05c340853c75f4c5007a72ebd9e32f</id>
<content type='text'>
commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>IB: Change CQE "csum_ok" field to a bit flag</title>
<updated>2012-03-08T20:34:27+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@mellanox.com</email>
</author>
<published>2012-01-11T17:03:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d927d505c59a0c7353343174e6225c43c61fba6d'/>
<id>d927d505c59a0c7353343174e6225c43c61fba6d</id>
<content type='text'>
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
&lt;liranl@mellanox.com&gt;.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Reviewed-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
&lt;liranl@mellanox.com&gt;.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Reviewed-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Stop lying about hard_header_len and use skb-&gt;cb to stash LL addresses</title>
<updated>2012-02-08T23:26:54+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2012-02-07T14:51:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=936d7de3d736e0737542641269436f4b5968e9ef'/>
<id>936d7de3d736e0737542641269436f4b5968e9ef</id>
<content type='text'>
Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb-&gt;cb
between its header_ops.create method and its .ndo_start_xmit
method.  Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack.  This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb-&gt;cb
between its header_ops.create method and its .ndo_start_xmit
method.  Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack.  This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c</title>
<updated>2011-12-05T20:20:20+00:00</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-12-02T16:52:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=17e6abeec4cb8df1e33ea0e2b889586c731a68be'/>
<id>17e6abeec4cb8df1e33ea0e2b889586c731a68be</id>
<content type='text'>
Reduce the number of dst_get_neighbour_noref() calls within a single
call chain.  Primarily by passing the neighbour pointer down to the
helper functions.

Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet.  We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduce the number of dst_get_neighbour_noref() calls within a single
call chain.  Primarily by passing the neighbour pointer down to the
helper functions.

Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet.  We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.</title>
<updated>2011-12-05T20:20:19+00:00</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-12-02T16:52:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2721745501a26d0dc3b88c0d2f3aa11471891388'/>
<id>2721745501a26d0dc3b88c0d2f3aa11471891388</id>
<content type='text'>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2011-12-02T18:49:21+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-12-02T18:49:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3613118eb30a589d971e4eccbbb2a1314f5dfd4'/>
<id>b3613118eb30a589d971e4eccbbb2a1314f5dfd4</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>neigh: Add infrastructure for allocating device neigh privates.</title>
<updated>2011-11-30T23:46:43+00:00</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-07-25T00:01:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=596b9b68ef118f7409afbc78487263e08ef96261'/>
<id>596b9b68ef118f7409afbc78487263e08ef96261</id>
<content type='text'>
netdev-&gt;neigh_priv_len records the private area length.

This will trigger for neigh_table objects which set tbl-&gt;entry_size
to zero, and the first instances of this will be forthcoming.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
netdev-&gt;neigh_priv_len records the private area length.

This will trigger for neigh_table objects which set tbl-&gt;entry_size
to zero, and the first instances of this will be forthcoming.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next</title>
<updated>2011-11-30T02:01:53+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-11-30T02:01:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a493f1a24a496711d96b91c4dc0a1bd35eb6954b'/>
<id>a493f1a24a496711d96b91c4dc0a1bd35eb6954b</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>IB: Fix RCU lockdep splats</title>
<updated>2011-11-29T21:37:11+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-29T21:31:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=580da35a31f91a594f3090b7a2c39b85cb051a12'/>
<id>580da35a31f91a594f3090b7a2c39b85cb051a12</id>
<content type='text'>
Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France &lt;tsi@ualberta.ca&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France &lt;tsi@ualberta.ca&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/ipoib: Prevent hung task or softlockup processing multicast response</title>
<updated>2011-11-29T21:20:02+00:00</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-11-21T13:43:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3874397c0bdec3c21ce071711cd105165179b8eb'/>
<id>3874397c0bdec3c21ce071711cd105165179b8eb</id>
<content type='text'>
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[&lt;ffffffff814ddb27&gt;]  [&lt;ffffffff814ddb27&gt;] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [&lt;ffffffffa032c247&gt;] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffffa03283d4&gt;] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [&lt;ffffffffa03286fc&gt;] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [&lt;ffffffff8141e758&gt;] ? dev_hard_start_xmit+0x2c8/0x3f0
    [&lt;ffffffff81439d0a&gt;] ? sch_direct_xmit+0x15a/0x1c0
    [&lt;ffffffff81423098&gt;] ? dev_queue_xmit+0x388/0x4d0
    [&lt;ffffffffa032d6b7&gt;] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [&lt;ffffffffa032dab8&gt;] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [&lt;ffffffffa02a0946&gt;] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [&lt;ffffffffa015f01e&gt;] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [&lt;ffffffffa00f6c93&gt;] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [&lt;ffffffffa02a0f9b&gt;] ? join_handler+0xeb/0x200 [ib_sa]
    [&lt;ffffffffa029e4fc&gt;] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [&lt;ffffffffa029e79c&gt;] ? recv_handler+0x3c/0x70 [ib_sa]
    [&lt;ffffffffa01603a4&gt;] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [&lt;ffffffffa015fb60&gt;] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [&lt;ffffffff81088830&gt;] ? worker_thread+0x170/0x2a0
    [&lt;ffffffff8108e160&gt;] ? autoremove_wake_function+0x0/0x40
    [&lt;ffffffff810886c0&gt;] ? worker_thread+0x0/0x2a0
    [&lt;ffffffff8108ddf6&gt;] ? kthread+0x96/0xa0
    [&lt;ffffffff8100c1ca&gt;] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv-&gt;pd, &amp;av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast-&gt;pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast-&gt;ah) {
                if (skb_queue_len(&amp;mcast-&gt;pkt_queue) &lt; IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&amp;mcast-&gt;pkt_queue, skb);
                else {
                        ++dev-&gt;stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa &lt;ram.vepa@qlogic.com&gt;
Reviewed-by: Gary Leshner &lt;gary.leshner@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[&lt;ffffffff814ddb27&gt;]  [&lt;ffffffff814ddb27&gt;] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [&lt;ffffffffa032c247&gt;] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffffa03283d4&gt;] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [&lt;ffffffffa03286fc&gt;] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [&lt;ffffffff8141e758&gt;] ? dev_hard_start_xmit+0x2c8/0x3f0
    [&lt;ffffffff81439d0a&gt;] ? sch_direct_xmit+0x15a/0x1c0
    [&lt;ffffffff81423098&gt;] ? dev_queue_xmit+0x388/0x4d0
    [&lt;ffffffffa032d6b7&gt;] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [&lt;ffffffffa032dab8&gt;] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [&lt;ffffffffa02a0946&gt;] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [&lt;ffffffffa015f01e&gt;] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [&lt;ffffffffa00f6c93&gt;] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [&lt;ffffffffa02a0f9b&gt;] ? join_handler+0xeb/0x200 [ib_sa]
    [&lt;ffffffffa029e4fc&gt;] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [&lt;ffffffffa029e79c&gt;] ? recv_handler+0x3c/0x70 [ib_sa]
    [&lt;ffffffffa01603a4&gt;] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [&lt;ffffffffa015fb60&gt;] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [&lt;ffffffff81088830&gt;] ? worker_thread+0x170/0x2a0
    [&lt;ffffffff8108e160&gt;] ? autoremove_wake_function+0x0/0x40
    [&lt;ffffffff810886c0&gt;] ? worker_thread+0x0/0x2a0
    [&lt;ffffffff8108ddf6&gt;] ? kthread+0x96/0xa0
    [&lt;ffffffff8100c1ca&gt;] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv-&gt;pd, &amp;av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast-&gt;pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast-&gt;ah) {
                if (skb_queue_len(&amp;mcast-&gt;pkt_queue) &lt; IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&amp;mcast-&gt;pkt_queue, skb);
                else {
                        ++dev-&gt;stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa &lt;ram.vepa@qlogic.com&gt;
Reviewed-by: Gary Leshner &lt;gary.leshner@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
