Age | Commit message (Collapse) | Author |
|
These 2 syncronize_rcu()s make attaching a task to a cgroup
quite slow, and it can't be ignored in some situations.
A real case from Colin Cross: Android uses cgroups heavily to
manage thread priorities, putting threads in a background group
with reduced cpu.shares when they are not visible to the user,
and in a foreground group when they are. Some RPCs from foreground
threads to background threads will temporarily move the background
thread into the foreground group for the duration of the RPC.
This results in many calls to cgroup_attach_task.
In cgroup_attach_task() it's task->cgroups that is protected by RCU,
and put_css_set() calls kfree_rcu() to free it.
If we remove this synchronize_rcu(), there can be threads in RCU-read
sections accessing their old cgroup via current->cgroups with
concurrent rmdir operation, but this is safe.
# time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }
real 0m2.524s
user 0m0.008s
sys 0m0.004s
With this patch:
real 0m0.004s
user 0m0.004s
sys 0m0.000s
tj: These synchronize_rcu()s are utterly confused. synchornize_rcu()
necessarily has to come between two operations to guarantee that
the changes made by the former operation are visible to all rcu
readers before proceeding to the latter operation. Here,
synchornize_rcu() are at the end of attach operations with nothing
beyond it. Its only effect would be delaying completion of
write(2) to sysfs tasks/procs files until all rcu readers see the
change, which doesn't mean anything.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Colin Cross <ccross@google.com>
Conflicts:
kernel/cgroup.c
|
|
Conflicts:
drivers/media/video/tegra_v4l2_camera.c
reverted to current driver supporting ACM rather than CSI2
drivers/media/video/videobuf2-dma-nvmap.c
drivers/video/tegra/host/Makefile
|
|
Trinity discovered that we fail to check all 64 bits of
attr.config passed by user space, resulting to out-of-bounds
access of the perf_swevent_enabled array in
sw_perf_event_destroy().
Introduced in commit b0a873ebb ("perf: Register PMU
implementations").
Bug 1289245
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
(cherry picked from commit 8176cced706b5e5d15887584150764894e94e02f)
Change-Id: Idde0330d7430f2ba1645f4dfed063c5df9bbb44a
Reviewed-on: http://git-master/r/228851
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Kiran Adduri <kadduri@nvidia.com>
Reviewed-by: Bo Yan <byan@nvidia.com>
|
|
Unfortunately when merging 15 patches NVIDIA missed 3 resp. 4 lines
which failed UP case (e.g. on PXA). The offending commit in question is
9dfdd9ac17ac9955b431cb962df3d0492384ba0e
The list of commits the above should have included is as follows:
974271c485a4d8bb801decc616748f90aafb07ec
bd7bdd43dcb81bb08240b9401b36a104f77dc135: just some comments missing
63d95a9150ee3bbd4117fcd609dee40313b454d9
11ebea50dbc1ade5994b2c838a096078d4c02399
4ce62e9e30cacc26885cab133ad1de358dd79f21: 2 lines missing
3270476a6c0ce322354df8679652f060d66526dc: one line missing
6575820221f7a4dd6eadecf7bf83cdd154335eda
f2d5a0ee06c1813f985bb9386f3ccc0d0315720f
403c821d452c03be4ced571ac91339a9d3631b17
6037315269d62bf967286ae2670fdd6b6acedab9
bc2ae0f5bb2f39e6db06a62f9d353e4601a332a1
25511a477657884d2164f338341fa89652610507
3ce63377305b694f53e7dd0c72907591c5344224
628c78e7ea19d5b70d2b6a59030362168cdbe1ad: just some comments missing
8db25e7891a47e03db6f04344a9c92be16e391bb
|
|
For testing purposes it is useful to be able to disable
PM Qos.
Bug 1020898
Bug 917572
Change-Id: I266f5b5730cfe4705197d8b09db7f9eda6766c7c
Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com>
Reviewed-on: http://git-master/r/124667
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Juha Tukkinen <jtukkinen@nvidia.com>
|
|
This change merges two patchsets. The first set,
containing 6 patches, reimplements WQ_HIGHPRI
to use a seperate worker_pool. gcwq->pools[0]
is used for normal priority work and pools[1]
for high priority.
The second patchset contains 9 patches and
reimplements CPU hotplug to keep idle workers.
Updates workqueue CPU hotplug path to use a
disassociated global_cwq, which runs as an
unbound one (WQ_UNBOUND). While this requires
rebinding idle workers, overall hotplug path
is much simpler.
Original patchset:
http://thread.gmane.org/gmane.linux.kernel/1329164
Bug 978010
Change-Id: Ic66ec8848a8d111b5278e63ef6a410846dfd8fcc
Signed-off-by: Mitch Luban <mluban@nvidia.com>
Reviewed-on: http://git-master/r/118387
Reviewed-by: Diwakar Tundlam <dtundlam@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
|
|
After a kthread is created it signals the requester using complete()
and enters TASK_UNINTERRUPTIBLE. However, since complete() wakes up
the requesting thread this can cause a preemption. The preemption will
not remove the task from the runqueue (for that schedule() has to be
invoked directly).
This is a problem if directly after kthread creation you try to do a
kthread_bind(), which will block in HZ steps until the thread is off
the runqueue.
This patch disables preemption during complete(), since we call
schedule() directly afterwards, so it will correctly enter
TASK_UNINTERRUPTIBLE. This speeds up kthread creation/binding during
cpu hotplug significantly.
Change-Id: I856ddd4e01ebdb198ba90f343b4a0c5933fd2b23
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
|
|
migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.
Instead unthrottle rt runqueues before migrating tasks.
Bug 976709
Change-Id: Ie3696702abc560fe8ffa7d2fb5dc5d54d532cc0d
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
(cherry picked from commit 4d18ba5765c206bf9f37634f532d97dabd507a58)
Reviewed-on: http://git-master/r/103417
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Aleksandr Frid <afrid@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
Re-compute time-average nr_running when it is read. This would
prevent reading stalled average value if there were no run-queue
changes for a long time. New average value is returned to the reader,
but not stored to avoid concurrent writes. Light-weight sequential
counter synchronization is used to assure data consistency for
re-computing average.
Change-Id: I8e4ea1b28ea00b3ddaf6ef7cdcd27866f87d360b
Signed-off-by: Alex Frid <afrid@nvidia.com>
(cherry picked from commit 527a759d9b40bf57958eb002edd2bb82014dab99)
Reviewed-on: http://git-master/r/111637
Reviewed-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Tested-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
Compute the time-average number of running tasks per run-queue for a
trailing window of a fixed time period. The detla add/sub to the
average value is weighted by the amount of time per nr_running value
relative to the total measurement period.
Change-Id: I076e24ff4ed65bed3b8dd8d2b279a503318071ff
Signed-off-by: Diwakar Tundlam <dtundlam@nvidia.com>
(cherry picked from commit 3a12d7499cee352e8a46eaf700259ba3c733f0e3)
Reviewed-on: http://git-master/r/111635
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Tested-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
Gcov's internal data structures, on which the kernel depends on, have
changed in GCC 4.6. This patch adds support for GCC 4.6 and should still
work on GCC 4.4 too.
For reference, look at 'struct gcov_fn_info' in GCC's 'gcc/gcov-io.h',
near line 698:
https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.4.3/
https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.6/
Bug 1003822
Change-Id: I527736f944c80b8b345d1685669c0b99eb38fb66
Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Reviewed-on: http://git-master/r/110073
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Juha Tukkinen <jtukkinen@nvidia.com>
Tested-by: Juha Tukkinen <jtukkinen@nvidia.com>
|
|
Simple trace points for measuring hotplug up/down times.
Bug 960310
Change-Id: I74dd3c5cddcc1ded02ad08a7ce38bacf3147ee3e
Reviewed-on: http://git-master/r/99806
Reviewed-by: Andy Park <andyp@nvidia.com>
Tested-by: Andy Park <andyp@nvidia.com>
Reviewed-by: Prajakta Gudadhe <pgudadhe@nvidia.com>
|
|
Bug 940061
Change-Id: Ibae842fdc3af3c92ec7e6125c602417110d8b55e
Signed-off-by: Gaurav Sarode <gsarode@nvidia.com>
Reviewed-on: http://git-master/r/84521
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Aleksandr Frid <afrid@nvidia.com>
Reviewed-by: Diwakar Tundlam <dtundlam@nvidia.com>
|
|
commit 30fb6aa74011dcf595f306ca2727254d708b786e upstream.
Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.
When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.
When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.
The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.
To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.
Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:
CONFIG_FUNCTION_PROFILER=y
# CONFIG_FUNCTION_GRAPH is not set
This will force the function profiler to use the function tracer instead
of the function graph tracer.
# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function > current_tracer
# cat set_ftrace_filter
schedule
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 692/68108025 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:2-909 [000] .... 531.235574: schedule <-worker_thread
<idle>-0 [001] .N.. 531.235575: schedule <-cpu_idle
kworker/0:2-909 [000] .... 531.235597: schedule <-worker_thread
sshd-2563 [001] .... 531.235647: schedule <-schedule_hrtimeout_range_clock
# echo 1 > function_profile_enabled
# echo 0 > function_porfile_enabled
# cat set_ftrace_filter
schedule
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 159701/118821262 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [002] ...1 604.870655: local_touch_nmi <-cpu_idle
<idle>-0 [002] d..1 604.870655: enter_idle <-cpu_idle
<idle>-0 [002] d..1 604.870656: atomic_notifier_call_chain <-enter_idle
<idle>-0 [002] d..1 604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain
The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.
The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.
Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com
Change-Id: Ifdc1c97df0d069226d6818648aade1519106950d
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-on: http://git-master/r/79658
|
|
Change-Id: Ica22a3f92c8ca33a5779a74d3afad775736b1663
Signed-off-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Reviewed-on: http://git-master/r/78450
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
Bug 878165
The next_balance parameter of nohz_idle_balancer should be initialized
to jiffies since jiffies itself is initialized to 300 seconds shy of
overflow. Otherwise, nohz_idle_balancer does not run for the first 5
mins after bootup.
Change-Id: I18334451f394ead8ddad3b94d725635a31e0173b
Signed-off-by: Diwakar Tundlam <dtundlam@nvidia.com>
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-on: http://git-master/r/77300
Reviewed-by: Automatic_Commit_Validation_User
|
|
Observe PM QoS CPU frequency minimum and maximum in addition
to policy settings.
Bug 888312
Change-Id: Ia4f60a1649a9952e02f6847c8add3b2ea5d47524
Reviewed-on: http://git-master/r/72207
Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com>
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-on: http://git-master/r/75884
Reviewed-by: Automatic_Commit_Validation_User
|
|
Add minimum and maximum CPU frequency as PM QoS parameters.
Bug 888312
Change-Id: I18abddded35a044a6ad8365035e31d1a2213a329
Reviewed-on: http://git-master/r/72206
Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com>
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-on: http://git-master/r/75883
Reviewed-by: Automatic_Commit_Validation_User
|
|
Linux 3.1.9
Conflicts:
Makefile
Change-Id: I22227ab33ba7ddaba8e6fe049393c58a83d73648
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit 79cfbdfa87e84992d509e6c1648a18e1d7e68c20 upstream.
The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
functions depend on the value of the 'tasks_frozen' argument passed to them
(which indicates whether tasks have been frozen or not).
(Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
CPU_DEAD, CPU_DEAD_FROZEN).
Thus, it is essential that while the callbacks for those notifications are
running, the state of the system with respect to the tasks being frozen or
not remains unchanged, *throughout that duration*. Hence there is a need for
synchronizing the CPU hotplug code with the freezer subsystem.
Since the freezer is involved only in the Suspend/Hibernate call paths, this
patch hooks the CPU hotplug code to the suspend/hibernate notifiers
PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
notifications will always be run with the state of the system really being
what the notifications indicate, _throughout_ their execution time.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0d19ea866562e46989412a0676412fa0983c9ce7 upstream.
If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3
Here's an example:
# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
# mount -t cgroup -o name=myhier xxx /cgroup2
But it was broken by commit 32a8cf235e2f192eb002755076994525cdbaa35a
(cgroup: make the mount options parsing more accurate)
This fixes the regression.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8a88951b5878dc475dcd841cefc767e36397d14e upstream.
This is the temporary simple fix for 3.2, we need more changes in this
area.
1. do_signal_stop() assumes that the running untraced thread in the
stopped thread group is not possible. This was our goal but it is
not yet achieved: a stopped-but-resumed tracee can clone the running
thread which can initiate another group-stop.
Remove WARN_ON_ONCE(!current->ptrace).
2. A new thread always starts with ->jobctl = 0. If it is auto-attached
and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
in do_jobctl_trap() if another debugger attaches.
Change __ptrace_unlink() to set the artificial SIGSTOP for report.
Alternatively we could change ptrace_init_task() to copy signr from
current, but this means we can copy it for no reason and hide the
possible similar problems.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: I2c60e18b07e8e08c7e3b6cc8288b0e04e18844f7
Reviewed-on: http://git-master/r/74233
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Tested-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit 50b8d257486a45cba7b65ca978986ed216bbcc10 upstream.
Test-case:
int main(void)
{
int pid, status;
pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);
assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);
do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);
return 1;
}
It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().
The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.
This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.
I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: I9e83ca1fcff0eee2ea7ef2508de95691a0cdeb0c
Reviewed-on: http://git-master/r/74232
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Tested-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit f9fab10bbd768b0e5254e53a4a8477a94bfc4b96 upstream.
vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: I12ed227da1fb7099978c73b65a756caf51a95553
Reviewed-on: http://git-master/r/74230
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Tested-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream.
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: I04a763aed3c611460ef4888d14a1f5101e8373bc
Reviewed-on: http://git-master/r/74198
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Tested-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit e0197aae59e55c06db172bfbe1a1cdb8c0e1cab3 upstream.
There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.
This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.
$ cat zombie.c
\#include <unistd.h>
int main()
{
if (fork())
pause();
return 0;
}
$
We are hitting this bug pretty regularly on ChromeOS.
This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:
https://lkml.org/lkml/2011/11/1/356
I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Downstream-Bug-Report: http://crosbug.com/23953
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Olof Johansson <olofj@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: I61bf2d48574d6ce5418b988e9547937c2efdd084
Reviewed-on: http://git-master/r/74185
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Rohan Somvanshi <rsomvanshi@nvidia.com>
Tested-by: Rohan Somvanshi <rsomvanshi@nvidia.com>
|
|
commit 3d3c8f93a237b64580c5c5e138edeb1377e98230 upstream.
binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()
The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.
This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change-Id: Ia6c066b0703eeafc61eafdd5addf157ee671bd68
Reviewed-on: http://git-master/r/74175
Reviewed-by: Varun Wadekar <vwadekar@nvidia.com>
Tested-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit 8a88951b5878dc475dcd841cefc767e36397d14e upstream.
This is the temporary simple fix for 3.2, we need more changes in this
area.
1. do_signal_stop() assumes that the running untraced thread in the
stopped thread group is not possible. This was our goal but it is
not yet achieved: a stopped-but-resumed tracee can clone the running
thread which can initiate another group-stop.
Remove WARN_ON_ONCE(!current->ptrace).
2. A new thread always starts with ->jobctl = 0. If it is auto-attached
and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
in do_jobctl_trap() if another debugger attaches.
Change __ptrace_unlink() to set the artificial SIGSTOP for report.
Alternatively we could change ptrace_init_task() to copy signr from
current, but this means we can copy it for no reason and hide the
possible similar problems.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 50b8d257486a45cba7b65ca978986ed216bbcc10 upstream.
Test-case:
int main(void)
{
int pid, status;
pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);
assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);
do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);
return 1;
}
It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().
The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.
This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.
I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f9fab10bbd768b0e5254e53a4a8477a94bfc4b96 upstream.
vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream.
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e0197aae59e55c06db172bfbe1a1cdb8c0e1cab3 upstream.
There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.
This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.
$ cat zombie.c
\#include <unistd.h>
int main()
{
if (fork())
pause();
return 0;
}
$
We are hitting this bug pretty regularly on ChromeOS.
This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:
https://lkml.org/lkml/2011/11/1/356
I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Downstream-Bug-Report: http://crosbug.com/23953
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Olof Johansson <olofj@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3d3c8f93a237b64580c5c5e138edeb1377e98230 upstream.
binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()
The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.
This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Change-Id: I99507d7cfdcee064f808856dc2ce99d806fd864f
|
|
commit 3b87487ac5008072f138953b07505a7e3493327f upstream.
This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c.
It results in resume problems for various people. See for example
http://thread.gmane.org/gmane.linux.kernel/1233033
http://thread.gmane.org/gmane.linux.kernel/1233389
http://thread.gmane.org/gmane.linux.kernel/1233159
http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877
and the fedora and ubuntu bug reports
https://bugzilla.redhat.com/show_bug.cgi?id=767248
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569
which got bisected down to the stable version of this commit.
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Phil Miller <mille121@illinois.edu>
Reported-by: Philip Langdale <philipl@overt.org>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
CPU0 CPUn
_cpu_up()
__cpu_up()
boostrap()
notify_cpu_starting()
set_cpu_online()
while (!cpu_active())
cpu_relax()
<PREEMPT-out>
smp_call_function(.wait=1)
/* we find cpu_online() is true */
arch_send_call_function_ipi_mask()
/* wait-forever-more */
<PREEMPT-in>
local_irq_enable()
cpu_notify(CPU_ONLINE)
sched_cpu_active()
set_cpu_active()
Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.
On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.
Bug 916986
Original Patch: https://lkml.org/lkml/2011/12/15/255
Change-Id: Ia1c07bdc1b3eb07a7cd4d69756fa7bec509c9400
Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Reviewed-on: http://git-master/r/72130
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Mayuresh Kulkarni <mkulkarni@nvidia.com>
|
|
Bug 894200
Change-Id: Ieb009a13c6ef9bca2388e234eb973d65a4e3a58b
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/71034
Reviewed-by: Rohan Somvanshi <rsomvanshi@nvidia.com>
Tested-by: Rohan Somvanshi <rsomvanshi@nvidia.com>
|
|
- Replace class ID #define with enumeration
- Loop through PM QoS objects during initialization (rather than
initializing them one-by-one)
Change-Id: I185b700b52c752c62e7550fe739adc498fc989ef
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/70603
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Antti Miettinen <amiettinen@nvidia.com>
Reviewed-by: Diwakar Tundlam <dtundlam@nvidia.com>
Reviewed-by: Scott Williams <scwilliams@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
Add a new module that will dump the contents of the ftrace ring buffer.
Data is compressed and can be in ascii or binary form. Data will
automatically dump on kernel panic to console. Data can be dumped by
reading /proc/tracedump. See tracedump.h for details.
Change-Id: I7b7afc3def0b88629dd120d17e43858306a8f357
Signed-off-by: Liang Cheng <licheng@nvidia.com>
Reviewed-on: http://git-master/r/69494
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Dan Willemsen <dwillemsen@nvidia.com>
|
|
commit a33caeb118198286309859f014c0662f3ed54ed4 upstream.
Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"),
lockdep_init_map() will clear all the struct. But it will break
lock_set_class()/lock_set_subclass(). A typical race condition
is like below:
CPU A CPU B
lock_set_subclass(lockA);
lock_set_class(lockA);
lockdep_init_map(lockA);
/* lockA->name is cleared */
memset(lockA);
__lock_acquire(lockA);
/* lockA->class_cache[] is cleared */
register_lock_class(lockA);
look_up_lock_class(lockA);
WARN_ON_ONCE(class->name !=
lock->name);
lock->name = name;
So restore to what we have done before commit f59de89 but annotate
->lock with kmemcheck_mark_initialized() to suppress the kmemcheck
warning reported in commit f59de89.
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c9c024b3f3e07d087974db4c0dc46217fff3a6c0 upstream.
The expiry function compares the timer against current time and does
not expire the timer when the expiry time is >= now. That's wrong. If
the timer is set for now, then it must expire.
Make the condition expiry > now for breaking out the loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Conflicts:
arch/arm/Kconfig
Change-Id: If8aaaf3efcbbf6c9017b38efb6d76ef933f147fa
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
|
|
commit de28f25e8244c7353abed8de0c7792f5f883588c upstream.
If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b1f919664d04a8d0ba29cb76673c7ca3325a2006 upstream.
In order to leave a margin of 12.5% we should >> 3 not >> 5.
Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com>
[jstultz: Modified commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit bbbf7af4bf8fc69bc751818cf30521080fa47dcb upstream.
If cpu A calls jump_label_inc() just after atomic_add_return() is
called by cpu B, atomic_inc_not_zero() will return value greater then
zero and jump_label_inc() will return to a caller before jump_label_update()
finishes its job on cpu B.
Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c1be84309c58b1e7c6d626e28fba41a22b364c3d upstream.
When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cb59974742aea24adf6637eb0c4b8e7b48bca6fb upstream.
Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
ever being released.
Ref_count was added to keep track of subsystem users, not for counting
events. Subsystem is created with ref_count = 1, so there is no need to
increment it for every event, we have nr_events for that. Fix this by
touching ref_count only when we actually have a new user -
subsystem_open().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
ftrace_event_call->filter
commit d3d9acf646679c1981032b0985b386d12fccc60c upstream.
ftrace_event_call->filter is sched RCU protected but didn't use
rcu_assign_pointer(). Use it.
TODO: Add proper __rcu annotation to call->filter and all its users.
-v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric.
Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c7c6ec8becaf742b223c7b491f4893014be23a07 upstream.
A forced undef of a config value was used for testing and was
accidently left in during the final commit. This causes x86 to
run slower than needed while running function tracing as well
as causes the function graph selftest to fail when DYNMAIC_FTRACE
is not set. This is because the code in MCOUNT expects the ftrace
code to be processed with the config value set that happened to
be forced not set.
The forced config option was left in by:
commit 6331c28c962561aee59e5a493b7556a4bb585957
ftrace: Fix dynamic selftest failure on some archs
Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian
Reported-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 550acb19269d65f32e9ac4ddb26c2b2070e37f1c upstream.
In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:
kthread_stop irq_wait_for_interrupt
------------ ----------------------
...
... while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
set_current_state(TASK_INTERRUPTIBLE);
...
schedule();
}
Fix this by checking if the thread should stop after modifying the
task's state.
[ tglx: Simplified it a bit ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|