<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/futex.c, branch v2.6.19.3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>[PATCH] schedule removal of FUTEX_FD</title>
<updated>2006-11-03T20:27:58+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2006-11-03T06:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=19c6b6ed3f597a583f58e3fc99256cc01ae8c394'/>
<id>19c6b6ed3f597a583f58e3fc99256cc01ae8c394</id>
<content type='text'>
Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it
shouldn't).

Add a warning printk, give any remaining users six months to migrate off it.

Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it
shouldn't).

Add a warning printk, give any remaining users six months to migrate off it.

Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] __user annotations: futex</title>
<updated>2006-10-10T22:37:22+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2006-10-10T21:46:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba46df984b8e8114c3cf19c51670fab084bd4196'/>
<id>ba46df984b8e8114c3cf19c51670fab084bd4196</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] file: modify struct fown_struct to use a struct pid</title>
<updated>2006-10-02T14:57:14+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2006-10-02T09:17:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=609d7fa9565c754428d2520cac2accc9052e1245'/>
<id>609d7fa9565c754428d2520cac2accc9052e1245</id>
<content type='text'>
File handles can be requested to send sigio and sigurg to processes.  By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
File handles can be requested to send sigio and sigurg to processes.  By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] sys_get_robust_list(): don't take tasklist_lock</title>
<updated>2006-09-29T16:18:18+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2006-09-29T09:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aaa2a97eb9c0e91d7abc66bf76811a9599fdb3ee'/>
<id>aaa2a97eb9c0e91d7abc66bf76811a9599fdb3ee</id>
<content type='text'>
use rcu locks for find_task_by_pid().

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
use rcu locks for find_task_by_pid().

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] futex_find_get_task(): don't take tasklist_lock</title>
<updated>2006-09-29T16:18:18+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2006-09-29T09:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d359b549bf3d7f42f0084918a4816ea4572e507c'/>
<id>d359b549bf3d7f42f0084918a4816ea4572e507c</id>
<content type='text'>
It is ok to do find_task_by_pid() + get_task_struct() under
rcu_read_lock(), we cand drop tasklist_lock.

Note that testing of -&gt;exit_state is racy with or without tasklist anyway.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is ok to do find_task_by_pid() + get_task_struct() under
rcu_read_lock(), we cand drop tasklist_lock.

Note that testing of -&gt;exit_state is racy with or without tasklist anyway.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Use the correct restart option for futex_lock_pi</title>
<updated>2006-09-08T17:22:50+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2006-09-08T16:47:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c5780e976e19faff345fcef4a01db87108b51a44'/>
<id>c5780e976e19faff345fcef4a01db87108b51a44</id>
<content type='text'>
The current implementation of futex_lock_pi returns -ERESTART_RESTARTBLOCK
in case that the lock operation has been interrupted by a signal.  This
results in a return of -EINTR to userspace in case there is an handler for
the signal.  This is wrong, because userspace expects that the lock
function does not return in any case of signal delivery.

This was not caught by my insufficient test case, but triggered a nasty
userspace problem in an high load application scenario.  Unfortunately also
glibc does not check for this invalid return value.

Using -ERSTARTNOINTR makes sure, that the interrupted syscall is restarted.
 The restart block related code can be safely removed, as the possible
timeout argument is an absolute time value.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation of futex_lock_pi returns -ERESTART_RESTARTBLOCK
in case that the lock operation has been interrupted by a signal.  This
results in a return of -EINTR to userspace in case there is an handler for
the signal.  This is wrong, because userspace expects that the lock
function does not return in any case of signal delivery.

This was not caught by my insufficient test case, but triggered a nasty
userspace problem in an high load application scenario.  Unfortunately also
glibc does not check for this invalid return value.

Using -ERSTARTNOINTR makes sure, that the interrupted syscall is restarted.
 The restart block related code can be safely removed, as the possible
timeout argument is an absolute time value.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] futex_find_get_task(): remove an obscure EXIT_ZOMBIE check</title>
<updated>2006-08-27T18:01:30+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2006-08-27T08:23:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d015baebba44613ef59ddffeae2114fa4ede7104'/>
<id>d015baebba44613ef59ddffeae2114fa4ede7104</id>
<content type='text'>
futex_find_get_task:

	if (p-&gt;state == EXIT_ZOMBIE || p-&gt;exit_state == EXIT_ZOMBIE)
		return NULL;

I can't understand this.  First, p-&gt;state can't be EXIT_ZOMBIE.  The
-&gt;exit_state check looks strange too.  Sub-threads or tasks whose -&gt;parent
ignores SIGCHLD go directly to EXIT_DEAD state (I am ignoring a ptrace
case).  Why EXIT_DEAD tasks should be ok?  Yes, EXIT_ZOMBIE is more
important (a task may stay zombie for a long time), but this doesn't mean
we should explicitely ignore other EXIT_XXX states.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
futex_find_get_task:

	if (p-&gt;state == EXIT_ZOMBIE || p-&gt;exit_state == EXIT_ZOMBIE)
		return NULL;

I can't understand this.  First, p-&gt;state can't be EXIT_ZOMBIE.  The
-&gt;exit_state check looks strange too.  Sub-threads or tasks whose -&gt;parent
ignores SIGCHLD go directly to EXIT_DEAD state (I am ignoring a ptrace
case).  Why EXIT_DEAD tasks should be ok?  Yes, EXIT_ZOMBIE is more
important (a task may stay zombie for a long time), but this doesn't mean
we should explicitely ignore other EXIT_XXX states.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] futex_handle_fault always fails</title>
<updated>2006-08-14T19:54:29+00:00</updated>
<author>
<name>john stultz</name>
<email>johnstul@us.ibm.com</email>
</author>
<published>2006-08-14T06:24:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e579dcbf23604cb33c08b5c3c3ac06ca36e7c683'/>
<id>e579dcbf23604cb33c08b5c3c3ac06ca36e7c683</id>
<content type='text'>
We found this issue last week w/ the -RT kernel, but it seems the same
issue is in mainline as well.

Basically it is possible for futex_unlock_pi to return without actually
freeing the lock.  This is due to buggy logic in the use of
futex_handle_fault() and its attempt argument in a failure case.

Looking at futex.c the logic is as follows:

1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first
   futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT.  We then
   jump to the pi_faulted label.

2) From pi_faulted: We increment attempt, unlock the sem and hit the
   retry label.

3) From the retry label, with ret still zero, we again hit EFAULT on the
   first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted
   label.

4) Again from pi_faulted: we increment attempt and enter the
   conditional, where we call futex_handle_fault.

5) futex_handle_fault fails, and we goto the out_unlock_release_sem
   label.

6) From out_unlock_release_sem we return, and since ret is still zero,
   we return without error, while never actually unlocking the lock.

Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably
be setting ret=-EFAULT before jumping to pi_faulted: However in our case
this doesn't really affect anything, as the glibc we're using ignores the
error value from futex_unlock_pi().

Issue #2: Look at futex_handle_fault(), its first conditional will return
-EFAULT if attempt is &gt;= 2.  However, from the "if(attempt++)
futex_handle_fault(attempt)" logic above, we'll *never* call
futex_handle_fault when attempt is less then two.  So we never get a chance
to even try to fault the page in.

The following patch addresses these two issues by 1) Always setting ret to
-EFAULT if futex_handle_fault fails, and 2) Removing the = in
futex_handle_fault's (attempt &gt;= 2) check.

I'm really not sure this is the right fix, but wanted to bring it up so
folks knew the issue is alive and well in the current -git tree.  From
looking at the git logs the logic was first introduced (then later copied
to other places) in the following commit almost a year ago:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823

Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We found this issue last week w/ the -RT kernel, but it seems the same
issue is in mainline as well.

Basically it is possible for futex_unlock_pi to return without actually
freeing the lock.  This is due to buggy logic in the use of
futex_handle_fault() and its attempt argument in a failure case.

Looking at futex.c the logic is as follows:

1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first
   futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT.  We then
   jump to the pi_faulted label.

2) From pi_faulted: We increment attempt, unlock the sem and hit the
   retry label.

3) From the retry label, with ret still zero, we again hit EFAULT on the
   first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted
   label.

4) Again from pi_faulted: we increment attempt and enter the
   conditional, where we call futex_handle_fault.

5) futex_handle_fault fails, and we goto the out_unlock_release_sem
   label.

6) From out_unlock_release_sem we return, and since ret is still zero,
   we return without error, while never actually unlocking the lock.

Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably
be setting ret=-EFAULT before jumping to pi_faulted: However in our case
this doesn't really affect anything, as the glibc we're using ignores the
error value from futex_unlock_pi().

Issue #2: Look at futex_handle_fault(), its first conditional will return
-EFAULT if attempt is &gt;= 2.  However, from the "if(attempt++)
futex_handle_fault(attempt)" logic above, we'll *never* call
futex_handle_fault when attempt is less then two.  So we never get a chance
to even try to fault the page in.

The following patch addresses these two issues by 1) Always setting ret to
-EFAULT if futex_handle_fault fails, and 2) Removing the = in
futex_handle_fault's (attempt &gt;= 2) check.

I'm really not sure this is the right fix, but wanted to bring it up so
folks knew the issue is alive and well in the current -git tree.  From
looking at the git logs the logic was first introduced (then later copied
to other places) in the following commit almost a year ago:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823

Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] bug in futex unqueue_me</title>
<updated>2006-08-06T15:57:46+00:00</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntrae@de.ibm.com</email>
</author>
<published>2006-08-05T19:13:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e91467ecd1ef381377fd327c0ded922835ec52ab'/>
<id>e91467ecd1ef381377fd327c0ded922835ec52ab</id>
<content type='text'>
This patch adds a barrier() in futex unqueue_me to avoid aliasing of two
pointers.

On my s390x system I saw the following oops:

Unable to handle kernel pointer dereference at virtual kernel address
0000000000000000
Oops: 0004 [#1]
CPU:    0    Not tainted
Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8)
Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30)
Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000
           0000000000000000 0000000000000000 000001fe00002028 00000000000c091f
           000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0
           00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8
Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04
Call Trace:
([&lt;0000000000144f90&gt;] unqueue_me+0x40/0xe4)
 [&lt;0000000000145a0c&gt;] do_futex+0x33c/0xc40
 [&lt;000000000014643e&gt;] sys_futex+0x12e/0x144
 [&lt;000000000010bb00&gt;] sysc_noemu+0x10/0x16
 [&lt;000002000003741c&gt;] 0x2000003741c

The code in question is:

static int unqueue_me(struct futex_q *q)
{
        int ret = 0;
        spinlock_t *lock_ptr;

        /* In the common case we don't take the spinlock, which is nice. */
 retry:
        lock_ptr = q-&gt;lock_ptr;
        if (lock_ptr != 0) {
                spin_lock(lock_ptr);
		/*
                 * q-&gt;lock_ptr can change between reading it and
                 * spin_lock(), causing us to take the wrong lock.  This
                 * corrects the race condition.
[...]

and my compiler (gcc 4.1.0) makes the following out of it:

00000000000003c8 &lt;unqueue_me&gt;:
     3c8:       eb bf f0 70 00 24       stmg    %r11,%r15,112(%r15)
     3ce:       c0 d0 00 00 00 00       larl    %r13,3ce &lt;unqueue_me+0x6&gt;
                        3d0: R_390_PC32DBL      .rodata+0x2a
     3d4:       a7 f1 1e 00             tml     %r15,7680
     3d8:       a7 84 00 01             je      3da &lt;unqueue_me+0x12&gt;
     3dc:       b9 04 00 ef             lgr     %r14,%r15
     3e0:       a7 fb ff d0             aghi    %r15,-48
     3e4:       b9 04 00 b2             lgr     %r11,%r2
     3e8:       e3 e0 f0 98 00 24       stg     %r14,152(%r15)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
		/* write q-&gt;lock_ptr in r12 */
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       a7 84 00 4b             je      48e &lt;unqueue_me+0xc6&gt;
		/* if r12 is zero then jump over the code.... */
     3fc:       e3 20 b0 28 00 04       lg      %r2,40(%r11)
		/* write q-&gt;lock_ptr in r2 */
     402:       c0 e5 00 00 00 00       brasl   %r14,402 &lt;unqueue_me+0x3a&gt;
                        404: R_390_PC32DBL      _spin_lock+0x2
		/* use r2 as parameter for spin_lock */

So the code becomes more or less:
if (q-&gt;lock_ptr != 0) spin_lock(q-&gt;lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)

Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       b9 04 00 2c             lgr     %r2,%r12
     3fc:       a7 84 00 48             je      48c &lt;unqueue_me+0xc4&gt;
     400:       c0 e5 00 00 00 00       brasl   %r14,400 &lt;unqueue_me+0x38&gt;
                        402: R_390_PC32DBL      _spin_lock+0x2

As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q-&gt;lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q-&gt;lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.

Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@timesys.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntrae@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds a barrier() in futex unqueue_me to avoid aliasing of two
pointers.

On my s390x system I saw the following oops:

Unable to handle kernel pointer dereference at virtual kernel address
0000000000000000
Oops: 0004 [#1]
CPU:    0    Not tainted
Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8)
Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30)
Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000
           0000000000000000 0000000000000000 000001fe00002028 00000000000c091f
           000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0
           00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8
Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04
Call Trace:
([&lt;0000000000144f90&gt;] unqueue_me+0x40/0xe4)
 [&lt;0000000000145a0c&gt;] do_futex+0x33c/0xc40
 [&lt;000000000014643e&gt;] sys_futex+0x12e/0x144
 [&lt;000000000010bb00&gt;] sysc_noemu+0x10/0x16
 [&lt;000002000003741c&gt;] 0x2000003741c

The code in question is:

static int unqueue_me(struct futex_q *q)
{
        int ret = 0;
        spinlock_t *lock_ptr;

        /* In the common case we don't take the spinlock, which is nice. */
 retry:
        lock_ptr = q-&gt;lock_ptr;
        if (lock_ptr != 0) {
                spin_lock(lock_ptr);
		/*
                 * q-&gt;lock_ptr can change between reading it and
                 * spin_lock(), causing us to take the wrong lock.  This
                 * corrects the race condition.
[...]

and my compiler (gcc 4.1.0) makes the following out of it:

00000000000003c8 &lt;unqueue_me&gt;:
     3c8:       eb bf f0 70 00 24       stmg    %r11,%r15,112(%r15)
     3ce:       c0 d0 00 00 00 00       larl    %r13,3ce &lt;unqueue_me+0x6&gt;
                        3d0: R_390_PC32DBL      .rodata+0x2a
     3d4:       a7 f1 1e 00             tml     %r15,7680
     3d8:       a7 84 00 01             je      3da &lt;unqueue_me+0x12&gt;
     3dc:       b9 04 00 ef             lgr     %r14,%r15
     3e0:       a7 fb ff d0             aghi    %r15,-48
     3e4:       b9 04 00 b2             lgr     %r11,%r2
     3e8:       e3 e0 f0 98 00 24       stg     %r14,152(%r15)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
		/* write q-&gt;lock_ptr in r12 */
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       a7 84 00 4b             je      48e &lt;unqueue_me+0xc6&gt;
		/* if r12 is zero then jump over the code.... */
     3fc:       e3 20 b0 28 00 04       lg      %r2,40(%r11)
		/* write q-&gt;lock_ptr in r2 */
     402:       c0 e5 00 00 00 00       brasl   %r14,402 &lt;unqueue_me+0x3a&gt;
                        404: R_390_PC32DBL      _spin_lock+0x2
		/* use r2 as parameter for spin_lock */

So the code becomes more or less:
if (q-&gt;lock_ptr != 0) spin_lock(q-&gt;lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)

Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       b9 04 00 2c             lgr     %r2,%r12
     3fc:       a7 84 00 48             je      48c &lt;unqueue_me+0xc4&gt;
     400:       c0 e5 00 00 00 00       brasl   %r14,400 &lt;unqueue_me+0x38&gt;
                        402: R_390_PC32DBL      _spin_lock+0x2

As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q-&gt;lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q-&gt;lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.

Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@timesys.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntrae@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] pi-futex: robust-futex exit</title>
<updated>2006-07-29T04:02:00+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2006-07-29T03:17:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3f2ddeac718c768fdac4b7fe69d465172f788a8'/>
<id>e3f2ddeac718c768fdac4b7fe69d465172f788a8</id>
<content type='text'>
Fix robust PI-futexes to be properly unlocked on unexpected exit.

For this to work the kernel has to know whether a futex is a PI or a
non-PI one, because the semantics are different.  Since the space in
relevant glibc data structures is extremely scarce, the best solution is
to encode the 'PI' information in bit 0 of the robust list pointer.
Existing (non-PI) glibc robust futexes have this bit always zero, so the
ABI is kept.  New glibc with PI-robust-futexes will set this bit.

Further fixes from Thomas Gleixner &lt;tglx@linutronix.de&gt;

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix robust PI-futexes to be properly unlocked on unexpected exit.

For this to work the kernel has to know whether a futex is a PI or a
non-PI one, because the semantics are different.  Since the space in
relevant glibc data structures is extremely scarce, the best solution is
to encode the 'PI' information in bit 0 of the robust list pointer.
Existing (non-PI) glibc robust futexes have this bit always zero, so the
ABI is kept.  New glibc with PI-robust-futexes will set this bit.

Further fixes from Thomas Gleixner &lt;tglx@linutronix.de&gt;

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
