<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/ipc/sem.c, branch v3.0.16</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID</title>
<updated>2011-08-05T04:58:41+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2011-07-26T00:11:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e73ff29041b5f8991ef81669a1a9f0553d14766a'/>
<id>e73ff29041b5f8991ef81669a1a9f0553d14766a</id>
<content type='text'>
commit d694ad62bf539dbb20a0899ac2a954555f9e4a83 upstream.

If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed.  This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.

The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142

Reported-by: Yuriy Yevtukhov &lt;yuriy@ucoz.com&gt;
Reported-by: Harald Laabs &lt;kernel@dasr.de&gt;
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d694ad62bf539dbb20a0899ac2a954555f9e4a83 upstream.

If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed.  This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.

The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142

Reported-by: Yuriy Yevtukhov &lt;yuriy@ucoz.com&gt;
Reported-by: Harald Laabs &lt;kernel@dasr.de&gt;
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix common misspellings</title>
<updated>2011-03-31T14:26:23+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@profusion.mobi</email>
</author>
<published>2011-03-31T01:57:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=25985edcedea6396277003854657b5f3cb31a628'/>
<id>25985edcedea6396277003854657b5f3cb31a628</id>
<content type='text'>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>userns: user namespaces: convert several capable() calls</title>
<updated>2011-03-24T02:47:08+00:00</updated>
<author>
<name>Serge E. Hallyn</name>
<email>serge@hallyn.com</email>
</author>
<published>2011-03-23T23:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b0e77598f87107001a00b8a4ece9c95e4254ccc4'/>
<id>b0e77598f87107001a00b8a4ece9c95e4254ccc4</id>
<content type='text'>
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
	Jan 11: Use task_ns_capable() in place of sched_capable().
	Jan 11: Use nsown_capable() as suggested by Bastian Blank.
	Jan 11: Clarify (hopefully) some logic in futex and sched.c
	Feb 15: use ns_capable for ipc, not nsown_capable
	Feb 23: let copy_ipcs handle setting ipc_ns-&gt;user_ns
	Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Daniel Lezcano &lt;daniel.lezcano@free.fr&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
	Jan 11: Use task_ns_capable() in place of sched_capable().
	Jan 11: Use nsown_capable() as suggested by Bastian Blank.
	Jan 11: Clarify (hopefully) some logic in futex and sched.c
	Feb 15: use ns_capable for ipc, not nsown_capable
	Feb 23: let copy_ipcs handle setting ipc_ns-&gt;user_ns
	Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Daniel Lezcano &lt;daniel.lezcano@free.fr&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sys_semctl: fix kernel stack leakage</title>
<updated>2010-10-01T17:50:58+00:00</updated>
<author>
<name>Dan Rosenberg</name>
<email>drosenberg@vsecurity.com</email>
</author>
<published>2010-09-30T22:15:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=982f7c2b2e6a28f8f266e075d92e19c0dd4c6e56'/>
<id>982f7c2b2e6a28f8f266e075d92e19c0dd4c6e56</id>
<content type='text'>
The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg &lt;dan.j.rosenberg@gmail.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg &lt;dan.j.rosenberg@gmail.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: bugfix for semop() not reporting successful operation</title>
<updated>2010-07-20T23:25:40+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2010-07-20T20:24:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c61284e99191b2284fb74dae6961d4d09e4e59e8'/>
<id>c61284e99191b2284fb74dae6961d4d09e4e59e8</id>
<content type='text'>
The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma-&gt;sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task.  Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Luca Tettamanti &lt;kronos.it@gmail.com&gt;
Tested-by: Luca Tettamanti &lt;kronos.it@gmail.com&gt;
Reported-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Maciej Rutecki &lt;maciej.rutecki@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma-&gt;sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task.  Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Luca Tettamanti &lt;kronos.it@gmail.com&gt;
Tested-by: Luca Tettamanti &lt;kronos.it@gmail.com&gt;
Reported-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Maciej Rutecki &lt;maciej.rutecki@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: use ERR_CAST</title>
<updated>2010-05-27T16:12:49+00:00</updated>
<author>
<name>Julia Lawall</name>
<email>julia@diku.dk</email>
</author>
<published>2010-05-26T21:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4de85cd6d6018825e19f76f1208775f23ecc393f'/>
<id>4de85cd6d6018825e19f76f1208775f23ecc393f</id>
<content type='text'>
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// &lt;smpl&gt;
@@
type T;
T x;
identifier f;
@@

T f (...) { &lt;+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+&gt; }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// &lt;/smpl&gt;

Signed-off-by: Julia Lawall &lt;julia@diku.dk&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// &lt;smpl&gt;
@@
type T;
T x;
identifier f;
@@

T f (...) { &lt;+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+&gt; }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// &lt;/smpl&gt;

Signed-off-by: Julia Lawall &lt;julia@diku.dk&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: update description of the implementation</title>
<updated>2010-05-27T16:12:49+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2010-05-26T21:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c5cf6359ad1d322c16e159011247341849cc0d3a'/>
<id>c5cf6359ad1d322c16e159011247341849cc0d3a</id>
<content type='text'>
ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

	for(i=0;i&lt;sma-&gt;sem_nsems;i++) spin_lock(sma-&gt;sem_lock);
	&lt;do stuff&gt;
	for(i=0;i&lt;sma-&gt;sem_nsems;i++) spin_unlock(sma-&gt;sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

	for(i=0;i&lt;sma-&gt;sem_nsems;i++) spin_lock(sma-&gt;sem_lock);
	&lt;do stuff&gt;
	for(i=0;i&lt;sma-&gt;sem_nsems;i++) spin_unlock(sma-&gt;sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: move wake_up_process out of the spinlock section</title>
<updated>2010-05-27T16:12:49+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2010-05-26T21:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0a2b9d4c79671b05956806ede5d054e03ae56280'/>
<id>0a2b9d4c79671b05956806ede5d054e03ae56280</id>
<content type='text'>
The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held.  This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem-&gt;sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held.  This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem-&gt;sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: optimize update_queue() for bulk wakeup calls</title>
<updated>2010-05-27T16:12:49+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2010-05-26T21:43:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fd5db42254518fbf241dc454e918598fbe494fa2'/>
<id>fd5db42254518fbf241dc454e918598fbe494fa2</id>
<content type='text'>
The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
  Bulk wake-up calls can enter this worst case.  The patch series fix
  that.

  Note that the benchmark app doesn't expose the problem, it just should
  be fixed: Real world apps might do the wake-ups in another order than
  perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
  significantly larger than necessary.

  The patch series fixes that.  This change is responsible for the main
  improvement.

- The cacheline with the spinlock is also used for a variable that is
  read in the hot path (sem_base) and for a variable that is unnecessarily
  written to multiple times (sem_otime).  The last step of the series
  cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations.  After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case.  For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
  Bulk wake-up calls can enter this worst case.  The patch series fix
  that.

  Note that the benchmark app doesn't expose the problem, it just should
  be fixed: Real world apps might do the wake-ups in another order than
  perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
  significantly larger than necessary.

  The patch series fixes that.  This change is responsible for the main
  improvement.

- The cacheline with the spinlock is also used for a variable that is
  read in the hot path (sem_base) and for a variable that is unnecessarily
  written to multiple times (sem_otime).  The last step of the series
  cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations.  After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case.  For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc: remove unreachable code in sem.c</title>
<updated>2009-12-16T15:20:10+00:00</updated>
<author>
<name>Amerigo Wang</name>
<email>amwang@redhat.com</email>
</author>
<published>2009-12-16T00:47:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e5cc9c7b1a3e7ae4d700d9fce168fb597bcfe9b6'/>
<id>e5cc9c7b1a3e7ae4d700d9fce168fb597bcfe9b6</id>
<content type='text'>
This line is unreachable, remove it.

[akpm@linux-foundation.org: remove unneeded initialisation of `err']
Signed-off-by: WANG Cong &lt;amwang@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This line is unreachable, remove it.

[akpm@linux-foundation.org: remove unneeded initialisation of `err']
Signed-off-by: WANG Cong &lt;amwang@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
