linux-toradex.git/kernel, branch v3.9.4

audit: Make testing for a valid loginuid explicit.

2013-05-19T18:38:47+00:00

commit 780a7654cee8d61819512385e778e4827db4bfbc upstream.

audit rule additions containing "-F auid!=4294967295" were failing
with EINVAL because of a regression caused by e1760bd.

Apparently some userland audit rule sets want to know if loginuid uid
has been set and are using a test for auid != 4294967295 to determine
that.

In practice that is a horrible way to ask if a value has been set,
because it relies on subtle implementation details and will break
every time the uid implementation in the kernel changes.

So add a clean way to test if the audit loginuid has been set, and
silently convert the old idiom to the cleaner and more comprehensible
new idiom.

RGB notes: In upstream, audit_rule_to_entry has been refactored out.
This is patch is already upstream in functionally the same form in
commit 780a7654cee8d61819512385e778e4827db4bfbc .  The decimal constant
was cast to unsigned to quiet GCC 4.6 32-bit architecture warnings.

Reported-By: Steve Grubb 
Signed-off-by: "Eric W. Biederman" 
Tested-by: Richard Guy Briggs 
Signed-off-by: Eric Paris 
Backported-by: Richard Guy Briggs 
Signed-off-by: Greg Kroah-Hartman

usermodehelper: check subprocess_info->path != NULL

2013-05-19T18:38:45+00:00

commit 264b83c07a84223f0efd0d1db9ccc66d6f88288f upstream.

argv_split(empty_or_all_spaces) happily succeeds, it simply returns
argc == 0 and argv[0] == NULL. Change call_usermodehelper_exec() to
check sub_info->path != NULL to avoid the crash.

This is the minimal fix, todo:

 - perhaps we should change argv_split() to return NULL or change the
   callers.

 - kill or justify ->path[0] check

 - narrow the scope of helper_lock()

Signed-off-by: Oleg Nesterov 
Acked-By: Lucas De Marchi 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

tracing: Fix leaks of filter preds

2013-05-19T18:38:25+00:00

commit 60705c89460fdc7227f2d153b68b3f34814738a4 upstream.

Special preds are created when folding a series of preds that
can be done in serial. These are allocated in an ops field of
the pred structure. But they were never freed, causing memory
leaks.

This was discovered using the kmemleak checker:

unreferenced object 0xffff8800797fd5e0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s)
  hex dump (first 32 bytes):
    00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [] kmemleak_alloc+0x73/0x98
    [] kmemleak_alloc_recursive.constprop.42+0x16/0x18
    [] __kmalloc+0xd7/0x125
    [] kcalloc.constprop.24+0x2d/0x2f
    [] fold_pred_tree_cb+0xa9/0xf4
    [] walk_pred_tree+0x47/0xcc
    [] replace_preds.isra.20+0x6f8/0x72f
    [] create_filter+0x4e/0x8b
    [] ftrace_test_event_filter+0x5a/0x155
    [] do_one_initcall+0xa0/0x137
    [] kernel_init_freeable+0x14d/0x1dc
    [] kernel_init+0xe/0xdb
    [] ret_from_fork+0x7c/0xb0
    [] 0xffffffffffffffff

Signed-off-by: Steven Rostedt 
Cc: Tom Zanussi 
Signed-off-by: Greg Kroah-Hartman

tick: Cleanup NOHZ per cpu data on cpu down

2013-05-19T18:38:25+00:00

commit 4b0c0f294f60abcdd20994a8341a95c8ac5eeb96 upstream.

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava 
Cc: Mike Galbraith 
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE

2013-05-19T18:38:24+00:00

commit 42a5cf46cd56f46267d2a9fcf2655f4078cd3042 upstream.

An inactive timer's base can refer to a offline cpu's base.

In the current code, cpu_base's lock is blindly reinitialized each
time a CPU is brought up. If a CPU is brought online during the period
that another thread is trying to modify an inactive timer on that CPU
with holding its timer base lock, then the lock will be reinitialized
under its feet. This leads to following SPIN_BUG().

<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [] (unwind_backtrace+0x0/0x11c) from [] (do_raw_spin_unlock+0x40/0xcc)
<4> [] (do_raw_spin_unlock+0x40/0xcc) from [] (_raw_spin_unlock+0x8/0x30)
<4> [] (_raw_spin_unlock+0x8/0x30) from [] (mod_timer+0x294/0x310)
<4> [] (mod_timer+0x294/0x310) from [] (queue_delayed_work_on+0x104/0x120)
<4> [] (queue_delayed_work_on+0x104/0x120) from [] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [] (sdhci_msm_bus_voting+0x88/0x9c) from [] (sdhci_disable+0x40/0x48)
<4> [] (sdhci_disable+0x40/0x48) from [] (mmc_release_host+0x4c/0xb0)
<4> [] (mmc_release_host+0x4c/0xb0) from [] (mmc_sd_detect+0x90/0xfc)
<4> [] (mmc_sd_detect+0x90/0xfc) from [] (mmc_rescan+0x7c/0x2c4)
<4> [] (mmc_rescan+0x7c/0x2c4) from [] (process_one_work+0x27c/0x484)
<4> [] (process_one_work+0x27c/0x484) from [] (worker_thread+0x210/0x3b0)
<4> [] (worker_thread+0x210/0x3b0) from [] (kthread+0x80/0x8c)
<4> [] (kthread+0x80/0x8c) from [] (kernel_thread_exit+0x0/0x8)

As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU
#2.  The code locked the timer_base corresponding to CPU #2. Before it
could proceed, CPU #2 came online and reinitialized the spinlock
corresponding to its base. Thus now CPU #3 held a lock which was
reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
corresponding to CPU #2, we hit the above SPIN_BUG().

CPU #0		CPU #3				       CPU #2
------		-------				       -------
.....		 ......				      
		mod_timer()
		 lock_timer_base
		   spin_lock_irqsave(&base->lock)

cpu_up(2)	 .....				        ......
							init_timers_cpu()
....		 .....				    	spin_lock_init(&base->lock)
.....		   spin_unlock_irqrestore(&base->lock)  ......
		   

Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the
base lock is reinitialized. Move base spinlock initialization under
the check.

Signed-off-by: Tirupathi Reddy 
Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-05-19T18:38:24+00:00

commit b4f711ee03d28f776fd2324fd0bd999cc428e4d2 upstream.

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.

Reported-by: Kay Sievers 
Signed-off-by: John Stultz 
Cc: Feng Tang 
Cc: Jason Gunthorpe 
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

sched: Avoid prev->stime underflow

2013-05-19T18:38:19+00:00

commit 68aa8efcd1ab961e4684ef5af32f72a6ec1911de upstream.

Dave Hansen reported strange utime/stime values on his system:
https://lkml.org/lkml/2013/4/4/435

This happens because prev->stime value is bigger than rtime
value. Root of the problem are non-monotonic rtime values (i.e.
current rtime is smaller than previous rtime) and that should be
debugged and fixed.

But since problem did not manifest itself before commit
62188451f0d63add7ad0cd2a1ae269d600c1663d "cputime: Avoid
multiplication overflow on utime scaling", it should be threated
as regression, which we can easily fixed on cputime_adjust()
function.

For now, let's apply this fix, but further work is needed to fix
root of the problem.

Reported-and-tested-by: Dave Hansen 
Signed-off-by: Stanislaw Gruszka 
Cc: Frederic Weisbecker 
Cc: rostedt@goodmis.org
Cc: Linus Torvalds 
Cc: Dave Hansen 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1367314507-9728-3-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Do not account bogus utime

2013-05-19T18:38:18+00:00

commit 772c808a252594692972773f6ee41c289b8e0b2a upstream.

Due to rounding in scale_stime(), for big numbers, scaled stime
values will grow in chunks. Since rtime grow in jiffies and we
calculate utime like below:

	prev->stime = max(prev->stime, stime);
	prev->utime = max(prev->utime, rtime - prev->stime);

we could erroneously account stime values as utime. To prevent
that only update prev->{u,s}time values when they are smaller
than current rtime.

Signed-off-by: Stanislaw Gruszka 
Cc: Frederic Weisbecker 
Cc: rostedt@goodmis.org
Cc: Linus Torvalds 
Cc: Dave Hansen 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1367314507-9728-2-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Avoid cputime scaling overflow

2013-05-19T18:38:17+00:00

commit 55eaa7c1f511af5fb6ef808b5328804f4d4e5243 upstream.

Here is patch, which adds Linus's cputime scaling algorithm to the
kernel.

This is a follow up (well, fix) to commit
d9a3c9823a2e6a543eb7807fb3d15d8233817ec5 ("sched: Lower chances
of cputime scaling overflow") which commit tried to avoid
multiplication overflow, but did not guarantee that the overflow
would not happen.

Linus crated a different algorithm, which completely avoids the
multiplication overflow by dropping precision when numbers are
big.

It was tested by me and it gives good relative error of
scaled numbers. Testing method is described here:
http://marc.info/?l=linux-kernel&m=136733059505406&w=2

Originally-From: Linus Torvalds 
Signed-off-by: Stanislaw Gruszka 
Cc: Frederic Weisbecker 
Cc: rostedt@goodmis.org
Cc: Dave Hansen 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20130430151441.GC10465@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Lower chances of cputime scaling overflow

2013-05-19T18:38:17+00:00

commit d9a3c9823a2e6a543eb7807fb3d15d8233817ec5 upstream.

Some users have reported that after running a process with
hundreds of threads on intensive CPU-bound loads, the cputime
of the group started to freeze after a few days.

This is due to how we scale the tick-based cputime against
the scheduler precise execution time value.

We add the values of all threads in the group and we multiply
that against the sum of the scheduler exec runtime of the whole
group.

This easily overflows after a few days/weeks of execution.

A proposed solution to solve this was to compute that multiplication
on stime instead of utime:
   62188451f0d63add7ad0cd2a1ae269d600c1663d
   ("cputime: Avoid multiplication overflow on utime scaling")

The rationale behind that was that it's easy for a thread to
spend most of its time in userspace under intensive CPU-bound workload
but it's much harder to do CPU-bound intensive long run in the kernel.

This postulate got defeated when a user recently reported he was still
seeing cputime freezes after the above patch. The workload that
triggers this issue relates to intensive networking workloads where
most of the cputime is consumed in the kernel.

To reduce much more the opportunities for multiplication overflow,
lets reduce the multiplication factors to the remainders of the division
between sched exec runtime and cputime. Assuming the difference between
these shouldn't ever be that large, it could work on many situations.

This gets the same results as in the upstream scaling code except for
a small difference: the upstream code always rounds the results to
the nearest integer not greater to what would be the precise result.
The new code rounds to the nearest integer either greater or not
greater. In practice this difference probably shouldn't matter but
it's worth mentioning.

If this solution appears not to be enough in the end, we'll
need to partly revert back to the behaviour prior to commit
     0cf55e1ec08bb5a22e068309e2d8ba1180ab4239
     ("sched, cputime: Introduce thread_group_times()")

Back then, the scaling was done on exit() time before adding the cputime
of an exiting thread to the signal struct. And then we'll need to
scale one-by-one the live threads cputime in thread_group_cputime(). The
drawback may be a slightly slower code on exit time.

Signed-off-by: Frederic Weisbecker 
Cc: Stanislaw Gruszka 
Cc: Steven Rostedt 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Signed-off-by: Stanislaw Gruszka 
Acked-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman