<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/futex.c, branch v5.6-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>futex: Unbreak futex hashing</title>
<updated>2020-03-09T21:33:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-03-08T18:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8d67743653dce5a0e7aa500fcccb237cde7ad88e'/>
<id>8d67743653dce5a0e7aa500fcccb237cde7ad88e</id>
<content type='text'>
The recent futex inode life time fix changed the ordering of the futex key
union struct members, but forgot to adjust the hash function accordingly,

As a result the hashing omits the leading 64bit and even hashes beyond the
futex key causing a bad hash distribution which led to a ~100% performance
regression.

Hand in the futex key pointer instead of a random struct member and make
the size calculation based of the struct offset.

Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
Reported-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Decoded-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The recent futex inode life time fix changed the ordering of the futex key
union struct members, but forgot to adjust the hash function accordingly,

As a result the hashing omits the leading 64bit and even hashes beyond the
futex key causing a bad hash distribution which led to a ~100% performance
regression.

Hand in the futex key pointer instead of a random struct member and make
the size calculation based of the struct offset.

Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
Reported-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Decoded-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Fix inode life-time issue</title>
<updated>2020-03-06T10:06:15+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-03-04T10:28:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8019ad13ef7f64be44d4f892af9c840179009254'/>
<id>8019ad13ef7f64be44d4f892af9c840179009254</id>
<content type='text'>
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Fix kernel-doc notation warning</title>
<updated>2020-01-09T12:23:40+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-12-09T04:26:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=51bfb1d11d6daf095addf9fe8471c20992caae0b'/>
<id>51bfb1d11d6daf095addf9fe8471c20992caae0b</id>
<content type='text'>
Fix a kernel-doc warning in kernel/futex.c by adding notation
for @ret.

../kernel/futex.c:1187: warning: Function parameter or member 'ret' not described in 'wait_for_owner_exiting'

Fixes: 3ef240eaff36 ("futex: Prevent exit livelock")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/223be78c-f3c8-52df-836d-c5fb8e7907e9@infradead.org

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a kernel-doc warning in kernel/futex.c by adding notation
for @ret.

../kernel/futex.c:1187: warning: Function parameter or member 'ret' not described in 'wait_for_owner_exiting'

Fixes: 3ef240eaff36 ("futex: Prevent exit livelock")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/223be78c-f3c8-52df-836d-c5fb8e7907e9@infradead.org

</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Prevent exit livelock</title>
<updated>2019-11-20T08:40:38+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3ef240eaff36b8119ac9e2ea17cbf41179c930ba'/>
<id>3ef240eaff36b8119ac9e2ea17cbf41179c930ba</id>
<content type='text'>
Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &amp;lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			-&gt;	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk-&gt;futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --&gt; wakes parent
  sys_futex()
    loop infinite because
    tsk-&gt;futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk-&gt;futex_state = FUTEX_STATE_EXITING;

  --&gt; preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff8624 .. ba31c1a48538
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e71 ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Stable Team &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &amp;lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			-&gt;	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk-&gt;futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --&gt; wakes parent
  sys_futex()
    loop infinite because
    tsk-&gt;futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk-&gt;futex_state = FUTEX_STATE_EXITING;

  --&gt; preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff8624 .. ba31c1a48538
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e71 ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Stable Team &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Provide distinct return value when owner is exiting</title>
<updated>2019-11-20T08:40:10+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ac31c7ff8624409ba3c4901df9237a616c187a5d'/>
<id>ac31c7ff8624409ba3c4901df9237a616c187a5d</id>
<content type='text'>
attach_to_pi_owner() returns -EAGAIN for various cases:

 - Owner task is exiting
 - Futex value has changed

The caller drops the held locks (hash bucket, mmap_sem) and retries the
operation. In case of the owner task exiting this can result in a live
lock.

As a preparatory step for seperating those cases, provide a distinct return
value (EBUSY) for the owner exiting case.

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.935606117@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
attach_to_pi_owner() returns -EAGAIN for various cases:

 - Owner task is exiting
 - Futex value has changed

The caller drops the held locks (hash bucket, mmap_sem) and retries the
operation. In case of the owner task exiting this can result in a live
lock.

As a preparatory step for seperating those cases, provide a distinct return
value (EBUSY) for the owner exiting case.

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.935606117@linutronix.de


</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Add mutex around futex exit</title>
<updated>2019-11-20T08:40:10+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3f186d974826847a07bc7964d79ec4eded475ad9'/>
<id>3f186d974826847a07bc7964d79ec4eded475ad9</id>
<content type='text'>
The mutex will be used in subsequent changes to replace the busy looping of
a waiter when the futex owner is currently executing the exit cleanup to
prevent a potential live lock.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.845798895@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The mutex will be used in subsequent changes to replace the busy looping of
a waiter when the futex owner is currently executing the exit cleanup to
prevent a potential live lock.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.845798895@linutronix.de


</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Provide state handling for exec() as well</title>
<updated>2019-11-20T08:40:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=af8cbda2cfcaa5515d61ec500498d46e9a8247e2'/>
<id>af8cbda2cfcaa5515d61ec500498d46e9a8247e2</id>
<content type='text'>
exec() attempts to handle potentially held futexes gracefully by running
the futex exit handling code like exit() does.

The current implementation has no protection against concurrent incoming
waiters. The reason is that the futex state cannot be set to
FUTEX_STATE_DEAD after the cleanup because the task struct is still active
and just about to execute the new binary.

While its arguably buggy when a task holds a futex over exec(), for
consistency sake the state handling can at least cover the actual futex
exit cleanup section. This provides state consistency protection accross
the cleanup. As the futex state of the task becomes FUTEX_STATE_OK after the
cleanup has been finished, this cannot prevent subsequent attempts to
attach to the task in case that the cleanup was not successfull in mopping
up all leftovers.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.753355618@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
exec() attempts to handle potentially held futexes gracefully by running
the futex exit handling code like exit() does.

The current implementation has no protection against concurrent incoming
waiters. The reason is that the futex state cannot be set to
FUTEX_STATE_DEAD after the cleanup because the task struct is still active
and just about to execute the new binary.

While its arguably buggy when a task holds a futex over exec(), for
consistency sake the state handling can at least cover the actual futex
exit cleanup section. This provides state consistency protection accross
the cleanup. As the futex state of the task becomes FUTEX_STATE_OK after the
cleanup has been finished, this cannot prevent subsequent attempts to
attach to the task in case that the cleanup was not successfull in mopping
up all leftovers.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.753355618@linutronix.de


</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Sanitize exit state handling</title>
<updated>2019-11-20T08:40:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a8e991b91aca9e20705d434677ac013974e0e30'/>
<id>4a8e991b91aca9e20705d434677ac013974e0e30</id>
<content type='text'>
Instead of having a smp_mb() and an empty lock/unlock of task::pi_lock move
the state setting into to the lock section.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.645603214@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of having a smp_mb() and an empty lock/unlock of task::pi_lock move
the state setting into to the lock section.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.645603214@linutronix.de


</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Mark the begin of futex exit explicitly</title>
<updated>2019-11-20T08:40:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=18f694385c4fd77a09851fd301236746ca83f3cb'/>
<id>18f694385c4fd77a09851fd301236746ca83f3cb</id>
<content type='text'>
Instead of relying on PF_EXITING use an explicit state for the futex exit
and set it in the futex exit function. This moves the smp barrier and the
lock/unlock serialization into the futex code.

As with the DEAD state this is restricted to the exit path as exec
continues to use the same task struct.

This allows to simplify that logic in a next step.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of relying on PF_EXITING use an explicit state for the futex exit
and set it in the futex exit function. This moves the smp barrier and the
lock/unlock serialization into the futex code.

As with the DEAD state this is restricted to the exit path as exec
continues to use the same task struct.

This allows to simplify that logic in a next step.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de


</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Set task::futex_state to DEAD right after handling futex exit</title>
<updated>2019-11-20T08:40:08+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f24f22435dcc11389acc87e5586239c1819d217c'/>
<id>f24f22435dcc11389acc87e5586239c1819d217c</id>
<content type='text'>
Setting task::futex_state in do_exit() is rather arbitrarily placed for no
reason. Move it into the futex code.

Note, this is only done for the exit cleanup as the exec cleanup cannot set
the state to FUTEX_STATE_DEAD because the task struct is still in active
use.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.439511191@linutronix.de


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Setting task::futex_state in do_exit() is rather arbitrarily placed for no
reason. Move it into the futex code.

Note, this is only done for the exit cleanup as the exec cleanup cannot set
the state to FUTEX_STATE_DEAD because the task struct is still in active
use.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.439511191@linutronix.de


</pre>
</div>
</content>
</entry>
</feed>
