linux-toradex.git/kernel, branch v2.6.27.3

modules: fix module "notes" kobject leak

2008-10-22T21:20:57+00:00

commit e94320939f44e0cbaccc3f259a5778abced4949c upstream

Fix "notes" kobject leak

It happens every rmmod if KALLSYMS=y and SYSFS=y.

	# modprobe foo

kobject: 'foo' (ffffffffa00743d0): kobject_add_internal: parent: 'module', set: 'module'
kobject: 'holders' (ffff88017e7c5770): kobject_add_internal: parent: 'foo', set: ''
kobject: 'foo' (ffffffffa00743d0): kobject_uevent_env
kobject: 'foo' (ffffffffa00743d0): fill_kobj_path: path = '/module/foo'
kobject: 'notes' (ffff88017fa9b668): kobject_add_internal: parent: 'foo', set: ''
	  ^^^^^

	# rmmod foo

kobject: 'holders' (ffff88017e7c5770): kobject_cleanup
kobject: 'holders' (ffff88017e7c5770): auto cleanup kobject_del
kobject: 'holders' (ffff88017e7c5770): calling ktype release
kobject: (ffff88017e7c5770): dynamic_kobj_release
kobject: 'holders': free name
kobject: 'foo' (ffffffffa00743d0): kobject_cleanup
kobject: 'foo' (ffffffffa00743d0): does not have a release() function, it is broken and must be fixed.
kobject: 'foo' (ffffffffa00743d0): auto cleanup 'remove' event
kobject: 'foo' (ffffffffa00743d0): kobject_uevent_env
kobject: 'foo' (ffffffffa00743d0): fill_kobj_path: path = '/module/foo'
kobject: 'foo' (ffffffffa00743d0): auto cleanup kobject_del
kobject: 'foo': free name

	[whooops]

Signed-off-by: Alexey Dobriyan 
Signed-off-by: Greg Kroah-Hartman

sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq

2008-10-18T17:49:10+00:00

commit f6121f4f8708195e88cbdf8dd8d171b226b3f858 upstream

While working on the new version of the code for SCHED_SPORADIC I
noticed something strange in the present throttling mechanism. More
specifically in the throttling timer handler in sched_rt.c
(do_sched_rt_period_timer()) and in rt_rq_enqueue().

The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only
asks for rescheduling if the runqueue has a sched_entity associated to
it (i.e., rt_rq->rt_se != NULL).
Now, if the runqueue is the root rq (which has a rt_se = NULL)
rescheduling does not take place, and it is delayed to some undefined
instant in the future.

This imply some random bandwidth usage by the RT tasks under throttling.
For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT
task will get less than 95%. In our tests we got something varying
between 70% to 95%.
Using smaller time values, e.g., 95ms/100ms, things are even worse, and
I can see values also going down to 20-25%!!

The tests we performed are simply running 'yes' as a SCHED_FIFO task,
and checking the CPU usage with top, but we can investigate thoroughly
if you think it is needed.

Things go much better, for us, with the attached patch... Don't know if
it is the best approach, but it solved the issue for us.

Signed-off-by: Dario Faggioli 
Signed-off-by: Michael Trimarchi 
Acked-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

disable CONFIG_DYNAMIC_FTRACE due to possible memory corruption on module unload

2008-10-15T23:02:33+00:00

While debugging the e1000e corruption bug with Intel, we discovered
today that the dynamic ftrace code in mainline is the likely source of
this bug.

For the stable kernel we are providing the only viable fix patch: labeling
CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)

We will follow up with a backport patch that contains the fixes. But since
the fixes are not a one liner, the safest approach for now is to
disable the code in question.

The cause of the bug is due to the way the current code in mainline
handles dynamic ftrace.  When dynamic ftrace is turned on, it also
turns on CONFIG_FTRACE which enables the -pg config in gcc that places
a call to mcount at every function call. With just CONFIG_FTRACE this
causes a noticeable overhead.  CONFIG_DYNAMIC_FTRACE works to ease this
overhead by dynamically updating the mcount call sites into nops.

The problem arises when we trace functions and modules are unloaded.
The first time a function is called, it will call mcount and the mcount
call will call ftrace_record_ip. This records the calling site and
stores it in a preallocated hash table. Later on a daemon will
wake up and call kstop_machine and convert any mcount callers into
nops.

The evolution of this code first tried to do this without the kstop_machine
and used cmpxchg to update the callers as they were called. But I
was informed that this is dangerous to do on SMP machines if another
CPU is running that same code. The solution was to do this with
kstop_machine.

We still used cmpxchg to test if the code that we are modifying is
indeed code that we expect to be before updating it - as a final
line of defense.

But on 32bit machines, ioremapped memory and modules share the same
address space. When a module would load its code into memory and execute
some code, that would register the function.

On module unload, ftrace incorrectly did not zap these functions from
its hash (this was the bug). The cmpxchg could have saved us in most
cases (via luck) - but with ioremap-ed memory that was exactly the wrong
thing to do - the results of cmpxchg on device memory are undefined.
(and will likely result in a write)

The pending .28 ftrace tree does not have this bug anymore, as a general push
towards more robustness of code patching, this is done differently: we do not
use cmpxchg and we do a WARN_ON and turn the tracer off if anything deviates
from its expected state. Furthermore, patch sites are statically identified
during build time so there's no runtime discovery of dynamic code areas
anymore, and no room for code unmaps to cause the hash to become out of date.

We believe the fragility of dynamic patching has been sufficiently
addressed in the development code via the static patching method, but further
suggestions to make it more robust are welcome.

Signed-off-by: Steven Rostedt 
Acked-by: Ingo Molnar 
Acked-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

kgdb: call touch_softlockup_watchdog on resume

2008-10-06T18:50:59+00:00

The softlockup watchdog needs to be touched when resuming the from the
kgdb stopped state to avoid the printk that a CPU is stuck if the
debugger was active for longer than the softlockup threshold.

Signed-off-by: Jason Wessel

clockevents: check broadcast tick device not the clock events device

2008-10-04T08:51:07+00:00

Impact: jiffies increment too fast.

Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
incremented too fast. The reason is a wrong check in the broadcast
enter/exit code, which keeps the local apic timer in periodic mode
when the switch happens.

Signed-off-by: Thomas Gleixner

fix error-path NULL deref in alloc_posix_timer()

2008-10-02T22:53:13+00:00

Found by static checker (http://repo.or.cz/w/smatch.git).

Signed-off-by: Dan Carpenter 
Acked-by: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

2008-09-30T15:39:28+00:00

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: prevent migration of per CPU hrtimers
  hrtimer: mark migration state
  hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers
  hrtimer: migrate pending list on cpu offline

Acked-by: Paul E. McKenney 
Acked-by: Benjamin Herrenschmidt 
Tested-by: Paul E. McKenney

mm owner: fix race between swapoff and exit

2008-09-29T15:41:47+00:00

There's a race between mm->owner assignment and swapoff, more easily
seen when task slab poisoning is turned on.  The condition occurs when
try_to_unuse() runs in parallel with an exiting task.  A similar race
can occur with callers of get_task_mm(), such as /proc//
or ptrace or page migration.

CPU0                                    CPU1
                                        try_to_unuse
                                        looks at mm = task0->mm
                                        increments mm->mm_users
task 0 exits
mm->owner needs to be updated, but no
new owner is found (mm_users > 1, but
no other task has task->mm = task0->mm)
mm_update_next_owner() leaves
                                        mmput(mm) decrements mm->mm_users
task0 freed
                                        dereferencing mm->owner fails

The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.

Jiri Slaby:
mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.

Daisuke Nishimura:
mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.

Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same.  And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.

Reported-by: Hugh Dickins 
Signed-off-by: Balbir Singh 
Signed-off-by: Jiri Slaby 
Signed-off-by: Daisuke Nishimura 
Signed-off-by: Hugh Dickins 
Cc: KAMEZAWA Hiroyuki 
Cc: Paul Menage 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

hrtimer: prevent migration of per CPU hrtimers

2008-09-29T15:09:14+00:00

Impact: per CPU hrtimers can be migrated from a dead CPU

The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.

Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.

Signed-off-by: Thomas Gleixner

hrtimer: mark migration state

2008-09-29T15:09:14+00:00

Impact: during migration active hrtimers can be seen as inactive

The migration code removes the hrtimers from the queues of the dead
CPU and sets the state temporary to INACTIVE. The enqueue code sets it
to ACTIVE/PENDING again.

Prevent that the wrong state can be seen by using a separate migration
state bit.

Signed-off-by: Thomas Gleixner