linux-toradex.git/kernel, branch v2.6.26.3

posix-timers: fix posix_timer_event() vs dequeue_signal() race

2008-08-20T18:05:00+00:00

commit ba661292a2bc6ddd305a212b0526e5dc22195fe7 upstream

The bug was reported and analysed by Mark McLoughlin ,
the patch is based on his and Roland's suggestions.

posix_timer_event() always rewrites the pre-allocated siginfo before sending
the signal. Most of the written info is the same all the time, but memset(0)
is very wrong. If ->sigq is queued we can race with collect_signal() which
can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
copy the wrong .si_code/si_tid/etc.

In short, sys_timer_settime() can in fact stop the active timer, or the user
can receive the siginfo with the wrong .si_xxx values.

Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
It would be nice to move the whole sigq->info initialization from send to
create path, but this is not easy to do without uglifying timer_create()
further.

As Roland rightly pointed out, we need more cleanups/fixes here, see the
"FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
it can mask the most bad implications.

Reported-by: Mark McLoughlin 
Signed-off-by: Oleg Nesterov 
Cc: Mark McLoughlin 
Cc: Oliver Pinter 
Cc: Roland McGrath 
Cc: Andrew Morton 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

posix-timers: do_schedule_next_timer: fix the setting of ->si_overrun

2008-08-20T18:05:00+00:00

commit 54da1174922cddd4be83d5a364b2e0fdd693f513 upstream

do_schedule_next_timer() sets info->si_overrun = timr->it_overrun_last,
this discards the already accumulated overruns.

Signed-off-by: Oleg Nesterov 
Cc: Mark McLoughlin 
Cc: Oliver Pinter 
Cc: Roland McGrath 
Cc: Andrew Morton 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

relay: fix "full buffer with exactly full last subbuffer" accounting problem

2008-08-20T18:04:59+00:00

commit 32194450330be327f3b25bf6b66298bd122599e9 upstream

In relay's current read implementation, if the buffer is completely full
but hasn't triggered the buffer-full condition (i.e. the last write
didn't cross the subbuffer boundary) and the last subbuffer is exactly
full, the subbuffer accounting code erroneously finds nothing available.
This patch fixes the problem.

Signed-off-by: Tom Zanussi 
Cc: Eduard - Gabriel Munteanu 
Cc: Pekka Enberg 
Cc: Jens Axboe 
Cc: Mathieu Desnoyers 
Cc: Andrea Righi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

markers: fix markers read barrier for multiple probes

2008-08-01T19:43:11+00:00

commit 5def9a3a22e09c99717f41ab7f07ec9e1a1f3ec8 upstream

Paul pointed out two incorrect read barriers in the marker handler code in
the path where multiple probes are connected.  Those are ordering reads of
"ptype" (single or multi probe marker), "multi" array pointer, and "multi"
array data access.

It should be ordered like this :

read ptype
smp_rmb()
read multi array pointer
smp_read_barrier_depends()
access data referenced by multi array pointer

The code with a single probe connected (optimized case, does not have to
allocate an array) has correct memory ordering.

It applies to kernel 2.6.26.x, 2.6.25.x and linux-next.

Signed-off-by: Mathieu Desnoyers 
Cc: Paul E. McKenney 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

cpusets: fix wrong domain attr updates

2008-08-01T19:43:04+00:00

commit 91cd4d6ef0abb1f65e81f8fe37e7d3c10344e38c upstream

Fix wrong domain attr updates, or we will always update the first sched
domain attr.

Signed-off-by: Miao Xie 
Cc: Hidetoshi Seto 
Cc: Paul Jackson 
Cc: Nick Piggin 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Fix build on COMPAT platforms when CONFIG_EPOLL is disabled

2008-08-01T19:43:03+00:00

commit 5f17156fc55abac476d180e480bedb0f07f01b14 upstream

Add missing cond_syscall() entry for compat_sys_epoll_pwait.

Signed-off-by: Atsushi Nemoto 
Cc: Davide Libenzi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

rcu: fix rcu_try_flip_waitack_needed() to prevent grace-period stall

2008-08-01T19:43:00+00:00

commit d7c0651390b6a03ad53f99faec0ba88109d7191d upstream

The comment was correct -- need to make the code match the comment.
Without this patch, if a CPU goes dynticks idle (and stays there forever)
in just the right phase of preemptible-RCU grace-period processing,
grace periods stall.  The offending sequence of events (courtesy
of Promela/spin, at least after I got the liveness criterion coded
correctly...) is as follows:

o	CPU 0 is in dynticks-idle mode.  Its dynticks_progress_counter
	is (say) 10.

o	CPU 0 takes an interrupt, so rcu_irq_enter() increments CPU 0's
	dynticks_progress_counter to 11.

o	CPU 1 is doing RCU grace-period processing in rcu_try_flip_idle(),
	sees rcu_pending(), so invokes dyntick_save_progress_counter(),
	which in turn takes a snapshot of CPU 0's dynticks_progress_counter
	into CPU 0's rcu_dyntick_snapshot -- now set to 11.  CPU 1 then
	updates the RCU grace-period state to rcu_try_flip_waitack().

o	CPU 0 returns from its interrupt, so rcu_irq_exit() increments
	CPU 0's dynticks_progress_counter to 12.

o	CPU 1 later invokes rcu_try_flip_waitack(), which notices that
	CPU 0 has not yet responded, and hence in turn invokes
	rcu_try_flip_waitack_needed().  This function examines the
	state of CPU 0's dynticks_progress_counter and rcu_dyntick_snapshot
	variables, which it copies to curr (== 12) and snap (== 11),
	respectively.

	Because curr!=snap, the first condition fails.

	Because curr-snap is only 1 and snap is odd, the second
	condition fails.

	rcu_try_flip_waitack_needed() therefore incorrectly concludes
	that it must wait for CPU 0 to explicitly acknowledge the
	counter flip.

o	CPU 0 remains forever in dynticks-idle mode, never taking
	any more hardware interrupts or any NMIs, and never running
	any more tasks.  (Of course, -something- will usually eventually
	happen, which might be why we haven't seen this one in the
	wild.  Still should be fixed!)

Therefore the grace period never ends.  Fix is to make the code match
the comment, as shown below.  With this fix, the above scenario
would be satisfied with curr being even, and allow the grace period
to proceed.

Signed-off-by: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Josh Triplett 
Cc: Dipankar Sarma 
Signed-off-by: Andrew Morton 
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

2008-07-13T18:03:59+00:00

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpusets, hotplug, scheduler: fix scheduler domain breakage

cpusets, hotplug, scheduler: fix scheduler domain breakage

2008-07-13T09:37:02+00:00

Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -> detach_destroy_domains(&cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.

Signed-off-by: Dmitry Adamushko 
Tested-by: Vegard Nossum 
Cc: Paul Menage 
Cc: Max Krasnyansky 
Cc: Paul Jackson 
Cc: Peter Zijlstra 
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds 
Signed-off-by: Ingo Molnar

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

2008-07-10T19:34:55+00:00

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix cpu hotplug, cleanup
  sched: fix cpu hotplug