<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs, branch v3.12.16</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>fs/proc/proc_devtree.c: remove empty /proc/device-tree when no openfirmware exists.</title>
<updated>2014-03-31T12:22:31+00:00</updated>
<author>
<name>Dave Jones</name>
<email>davej@redhat.com</email>
</author>
<published>2014-01-23T23:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d05cdce4fb86af309b91cc1e055e995f79c9a72'/>
<id>6d05cdce4fb86af309b91cc1e055e995f79c9a72</id>
<content type='text'>
commit c1d867a54d426b45da017fbe8e585f8a3064ce8d upstream.

Distribution kernels might want to build in support for /proc/device-tree
for kernels that might end up running on hardware that doesn't support
openfirmware.  This results in an empty /proc/device-tree existing.
Remove it if the OFW root node doesn't exist.

This situation actually confuses grub2, resulting in install failures.
grub2 sees the /proc/device-tree and picks the wrong install target cf.
http://bzr.savannah.gnu.org/lh/grub/trunk/grub/annotate/4300/util/grub-install.in#L311
grub should be more robust, but still, leaving an empty proc dir seems
pointless.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=818378.

Signed-off-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Josh Boyer &lt;jwboyer@fedoraproject.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c1d867a54d426b45da017fbe8e585f8a3064ce8d upstream.

Distribution kernels might want to build in support for /proc/device-tree
for kernels that might end up running on hardware that doesn't support
openfirmware.  This results in an empty /proc/device-tree existing.
Remove it if the OFW root node doesn't exist.

This situation actually confuses grub2, resulting in install failures.
grub2 sees the /proc/device-tree and picks the wrong install target cf.
http://bzr.savannah.gnu.org/lh/grub/trunk/grub/annotate/4300/util/grub-install.in#L311
grub should be more robust, but still, leaving an empty proc dir seems
pointless.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=818378.

Signed-off-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Josh Boyer &lt;jwboyer@fedoraproject.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFSv4: Fix another nfs4_sequence corruptor</title>
<updated>2014-03-26T08:44:13+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@primarydata.com</email>
</author>
<published>2014-03-22T14:00:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dec2e6e6bd7cc728a9b3091075dd17aef6b4c33c'/>
<id>dec2e6e6bd7cc728a9b3091075dd17aef6b4c33c</id>
<content type='text'>
commit b7e63a1079b266866a732cf699d8c4d61391bbda upstream.

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b7e63a1079b266866a732cf699d8c4d61391bbda upstream.

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bio-integrity: Fix bio_integrity_verify segment start bug</title>
<updated>2014-03-24T08:45:06+00:00</updated>
<author>
<name>Nicholas Bellinger</name>
<email>nab@linux-iscsi.org</email>
</author>
<published>2014-01-22T04:32:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7cbcb219e4113e10ce4b036118992abdbc4a8273'/>
<id>7cbcb219e4113e10ce4b036118992abdbc4a8273</id>
<content type='text'>
commit 5837c80e870bc3b12ac6a98cdc9ce7a9522a8fb6 upstream.

This patch addresses a bug in bio_integrity_verify() code that has
been causing DIF READ verify operations to be silently skipped.

The issue is that bio-&gt;bi_idx will have been incremented within
bio_advance() code in the normal blk_update_request() -&gt;
req_bio_endio() completion path, and bio_integrity_verify() is
using bio_for_each_segment() which starts the bio segment walk
at the current bio-&gt;bi_idx.

So instead use bio_for_each_segment_all() to always start the bio
segment walk from zero, regardless of the current bio-&gt;bi_idx
value after bio_advance() has been called.

(Context change for v3.10.y -&gt; v3.13.y code - nab)

Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Nicholas Bellinger &lt;nab@linux-iscsi.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5837c80e870bc3b12ac6a98cdc9ce7a9522a8fb6 upstream.

This patch addresses a bug in bio_integrity_verify() code that has
been causing DIF READ verify operations to be silently skipped.

The issue is that bio-&gt;bi_idx will have been incremented within
bio_advance() code in the normal blk_update_request() -&gt;
req_bio_endio() completion path, and bio_integrity_verify() is
using bio_for_each_segment() which starts the bio segment walk
at the current bio-&gt;bi_idx.

So instead use bio_for_each_segment_all() to always start the bio
segment walk from zero, regardless of the current bio-&gt;bi_idx
value after bio_advance() has been called.

(Context change for v3.10.y -&gt; v3.13.y code - nab)

Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Nicholas Bellinger &lt;nab@linux-iscsi.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/proc/base.c: fix GPF in /proc/$PID/map_files</title>
<updated>2014-03-22T21:01:59+00:00</updated>
<author>
<name>Artem Fetishev</name>
<email>artem_fetishev@epam.com</email>
</author>
<published>2014-03-10T22:49:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ccd13098cecd44d9b0694270100a700c46d52c28'/>
<id>ccd13098cecd44d9b0694270100a700c46d52c28</id>
<content type='text'>
commit 70335abb2689c8cd5df91bf2d95a65649addf50b upstream.

The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev &lt;artem_fetishev@epam.com&gt;
Signed-off-by: Aleksandr Terekhov &lt;aleksandr_terekhov@epam.com&gt;
Reported-by: &lt;wiebittewas@gmail.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 70335abb2689c8cd5df91bf2d95a65649addf50b upstream.

The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev &lt;artem_fetishev@epam.com&gt;
Signed-off-by: Aleksandr Terekhov &lt;aleksandr_terekhov@epam.com&gt;
Reported-by: &lt;wiebittewas@gmail.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid</title>
<updated>2014-03-22T21:01:58+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@primarydata.com</email>
</author>
<published>2014-03-05T13:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ff3918ebdc2d35c0e57b6aa8c32098f40ab83bd0'/>
<id>ff3918ebdc2d35c0e57b6aa8c32098f40ab83bd0</id>
<content type='text'>
commit e1253be0ece1a95a02c7f5843194877471af8179 upstream.

When nfs4_set_rw_stateid() can fails by returning EIO to indicate that
the stateid is completely invalid, then it makes no sense to have it
trigger a retry of the READ or WRITE operation. Instead, we should just
have it fall through and attempt a recovery.

This fixes an infinite loop in which the client keeps replaying the same
bad stateid back to the server.

Reported-by: Andy Adamson &lt;andros@netapp.com&gt;
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e1253be0ece1a95a02c7f5843194877471af8179 upstream.

When nfs4_set_rw_stateid() can fails by returning EIO to indicate that
the stateid is completely invalid, then it makes no sense to have it
trigger a retry of the READ or WRITE operation. Instead, we should just
have it fall through and attempt a recovery.

This fixes an infinite loop in which the client keeps replaying the same
bad stateid back to the server.

Reported-by: Andy Adamson &lt;andros@netapp.com&gt;
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Fix a delegation callback race</title>
<updated>2014-03-22T21:01:58+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@primarydata.com</email>
</author>
<published>2014-03-03T03:03:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=793c6f9509f87e2918a551e88605d833cacc9546'/>
<id>793c6f9509f87e2918a551e88605d833cacc9546</id>
<content type='text'>
commit 755a48a7a4eb05b9c8424e3017d947b2961a60e0 upstream.

The clean-up in commit 36281caa839f ended up removing a NULL pointer check
that is needed in order to prevent an Oops in
nfs_async_inode_return_delegation().

Reported-by: "Yan, Zheng" &lt;zheng.z.yan@intel.com&gt;
Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com
Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation)
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 755a48a7a4eb05b9c8424e3017d947b2961a60e0 upstream.

The clean-up in commit 36281caa839f ended up removing a NULL pointer check
that is needed in order to prevent an Oops in
nfs_async_inode_return_delegation().

Reported-by: "Yan, Zheng" &lt;zheng.z.yan@intel.com&gt;
Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com
Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation)
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2 syncs the wrong range...</title>
<updated>2014-03-22T21:01:48+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-02-10T20:18:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=313bb2423954fc861816e6c06a513df6334b66d8'/>
<id>313bb2423954fc861816e6c06a513df6334b66d8</id>
<content type='text'>
commit 1b56e98990bcdbb20b9fab163654b9315bf158e8 upstream.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1b56e98990bcdbb20b9fab163654b9315bf158e8 upstream.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: fix quota file corruption</title>
<updated>2014-03-22T21:01:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2014-03-03T23:38:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5a131c1d03f6a7bcb81101994038a4945a47c32e'/>
<id>5a131c1d03f6a7bcb81101994038a4945a47c32e</id>
<content type='text'>
commit 15c34a760630ca2c803848fba90ca0646a9907dd upstream.

Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Goldwyn Rodrigues &lt;rgoldwyn@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Reviewed-by: Mark Fasheh &lt;mfasheh@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 15c34a760630ca2c803848fba90ca0646a9907dd upstream.

Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Goldwyn Rodrigues &lt;rgoldwyn@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Reviewed-by: Mark Fasheh &lt;mfasheh@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix mountpoint reference leakage in linkat</title>
<updated>2014-03-14T00:53:45+00:00</updated>
<author>
<name>Oleg Drokin</name>
<email>green@linuxhacker.ru</email>
</author>
<published>2014-01-31T20:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=03cb0a7b03eda410fb0f7e03843e60c3318a38e7'/>
<id>03cb0a7b03eda410fb0f7e03843e60c3318a38e7</id>
<content type='text'>
commit d22e6338db7f613dd4f6095c190682fcc519e4b7 upstream.

Recent changes to retry on ESTALE in linkat
(commit 442e31ca5a49e398351b2954b51f578353fdf210)
introduced a mountpoint reference leak and a small memory
leak in case a filesystem link operation returns ESTALE
which is pretty normal for distributed filesystems like
lustre, nfs and so on.
Free old_path in such a case.

[AV: there was another missing path_put() nearby - on the previous
goto retry]

[js: the second path_put is not in 3.12 yet, hunk removed]

Signed-off-by: Oleg Drokin: &lt;green@linuxhacker.ru&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d22e6338db7f613dd4f6095c190682fcc519e4b7 upstream.

Recent changes to retry on ESTALE in linkat
(commit 442e31ca5a49e398351b2954b51f578353fdf210)
introduced a mountpoint reference leak and a small memory
leak in case a filesystem link operation returns ESTALE
which is pretty normal for distributed filesystems like
lustre, nfs and so on.
Free old_path in such a case.

[AV: there was another missing path_put() nearby - on the previous
goto retry]

[js: the second path_put is not in 3.12 yet, hunk removed]

Signed-off-by: Oleg Drokin: &lt;green@linuxhacker.ru&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll: do not take the nested ep-&gt;mtx on EPOLL_CTL_DEL</title>
<updated>2014-03-12T12:25:42+00:00</updated>
<author>
<name>Jason Baron</name>
<email>jbaron@akamai.com</email>
</author>
<published>2014-01-02T20:58:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=230a558c94d9f2411381bab5a5aa5be91ba53ff9'/>
<id>230a558c94d9f2411381bab5a5aa5be91ba53ff9</id>
<content type='text'>
commit 4ff36ee94d93ddb4b7846177f1118d9aa33408e2 upstream.

The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").

The acquistion of the ep-&gt;mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&amp;f.file-&gt;f_ep_links')

However, by simply not acquiring the lock, we do not serialize behind
the ep-&gt;mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.

The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon.  The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock.  Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.

Signed-off-by: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Cc: Eric Wong &lt;normalperson@yhbt.net&gt;
Cc: Nelson Elhage &lt;nelhage@nelhage.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4ff36ee94d93ddb4b7846177f1118d9aa33408e2 upstream.

The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").

The acquistion of the ep-&gt;mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&amp;f.file-&gt;f_ep_links')

However, by simply not acquiring the lock, we do not serialize behind
the ep-&gt;mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.

The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon.  The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock.  Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.

Signed-off-by: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Cc: Eric Wong &lt;normalperson@yhbt.net&gt;
Cc: Nelson Elhage &lt;nelhage@nelhage.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
</feed>
