linux-toradex.git/ipc, branch v2.6.35-rc5

mqueue doesn't need make_bad_inode()

2010-06-04T21:16:27+00:00

It never hashes them anyway and does final iput() immediately
afterwards.  With ->drop_inode() being generic_delete_inode()...

Signed-off-by: Al Viro

drop unused dentry argument to ->fsync

2010-05-28T02:05:02+00:00

Signed-off-by: Christoph Hellwig 
Signed-off-by: Al Viro

ipc/sem.c: use ERR_CAST

2010-05-27T16:12:49+00:00

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// 

Signed-off-by: Julia Lawall 
Cc: Manfred Spraul 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ipc/sem.c: update description of the implementation

2010-05-27T16:12:49+00:00

ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

	for(i=0;isem_nsems;i++) spin_lock(sma->sem_lock);
	
	for(i=0;isem_nsems;i++) spin_unlock(sma->sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul 
Cc: Chris Mason 
Cc: Zach Brown 
Cc: Jens Axboe 
Cc: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ipc/sem.c: move wake_up_process out of the spinlock section

2010-05-27T16:12:49+00:00

The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held.  This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul 
Cc: Chris Mason 
Cc: Zach Brown 
Cc: Jens Axboe 
Cc: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ipc/sem.c: optimize update_queue() for bulk wakeup calls

2010-05-27T16:12:49+00:00

The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
  Bulk wake-up calls can enter this worst case.  The patch series fix
  that.

  Note that the benchmark app doesn't expose the problem, it just should
  be fixed: Real world apps might do the wake-ups in another order than
  perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
  significantly larger than necessary.

  The patch series fixes that.  This change is responsible for the main
  improvement.

- The cacheline with the spinlock is also used for a variable that is
  read in the hot path (sem_base) and for a variable that is unnecessarily
  written to multiple times (sem_otime).  The last step of the series
  cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations.  After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case.  For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul 
Cc: Chris Mason 
Cc: Zach Brown 
Cc: Jens Axboe 
Cc: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN

2010-05-25T15:07:02+00:00

- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan 
Acked-by: WANG Cong 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

2010-05-20T00:11:10+00:00

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Add clocksource_register_hz/khz interface
  posix-cpu-timers: Optimize run_posix_cpu_timers()
  time: Remove xtime_cache
  mqueue: Convert message queue timeout to use hrtimers
  hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
  timers: Introduce the concept of timer slack for legacy timers
  ntp: Remove tickadj
  ntp: Make time_adjust static
  time: Add xtime, wall_to_monotonic to feature-removal-schedule
  timer: Try to survive timer callback preempt_count leak
  timer: Split out timer function call
  timer: Print function name for timer callbacks modifying preemption count
  time: Clean up warp_clock()
  cpu-timers: Avoid iterating over all threads in fastpath_timer_check()
  cpu-timers: Change SIGEV_NONE timer implementation
  cpu-timers: Return correct previous timer reload value
  cpu-timers: Cleanup arm_timer()
  cpu-timers: Simplify RLIMIT_CPU handling

mqueue: fix kernel BUG caused by double free() on mq_open()

2010-05-12T00:33:42+00:00

In case of aborting because we reach the maximum amount of memory which
can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
try to free the message area twice when bailing out: first by the error
handling code itself, and then later when cleaning up the inode through
delete_inode().

Signed-off-by: André Goddard Rosa 
Cc: Alexey Dobriyan 
Cc: Al Viro 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'linus' into timers/core

2010-05-10T12:20:42+00:00

Reason: Further posix_cpu_timer patches depend on mainline changes

Signed-off-by: Thomas Gleixner