<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel, branch v3.18.13</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>bpf: fix verifier memory corruption</title>
<updated>2015-04-27T20:48:31+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-04-14T22:57:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ee16f4a033f21a9a41d60a8d93f9663ec269913'/>
<id>6ee16f4a033f21a9a41d60a8d93f9663ec269913</id>
<content type='text'>
[ Upstream commit c3de6317d748e23b9e46ba36e10483728d00d144 ]

Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:

[8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
[8.451293] IP: [&lt;ffffffff811de33d&gt;] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329]  [&lt;ffffffff8116cc82&gt;] bpf_check+0x852/0x2000
[8.452329]  [&lt;ffffffff8116b7e4&gt;] bpf_prog_load+0x1e4/0x310
[8.452329]  [&lt;ffffffff811b190f&gt;] ? might_fault+0x5f/0xb0
[8.452329]  [&lt;ffffffff8116c206&gt;] SyS_bpf+0x806/0xa30

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit c3de6317d748e23b9e46ba36e10483728d00d144 ]

Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:

[8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
[8.451293] IP: [&lt;ffffffff811de33d&gt;] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329]  [&lt;ffffffff8116cc82&gt;] bpf_check+0x852/0x2000
[8.452329]  [&lt;ffffffff8116b7e4&gt;] bpf_prog_load+0x1e4/0x310
[8.452329]  [&lt;ffffffff811b190f&gt;] ? might_fault+0x5f/0xb0
[8.452329]  [&lt;ffffffff8116c206&gt;] SyS_bpf+0x806/0xa30

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop</title>
<updated>2015-04-24T21:14:12+00:00</updated>
<author>
<name>Preeti U Murthy</name>
<email>preeti@linux.vnet.ibm.com</email>
</author>
<published>2015-03-18T10:49:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b27b4b79d535672a19e9d0fe1256ad1016cc9e7f'/>
<id>b27b4b79d535672a19e9d0fe1256ad1016cc9e7f</id>
<content type='text'>
[ Upstream commit a127d2bcf1fbc8c8e0b5cf0dab54f7d3ff50ce47 ]

The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. The associated
call graph is :

	cpuidle_idle_call()
	|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
	     |_____tick_broadcast_set_event()
		   |____clockevents_program_event()
			|____bc_set_next()

The hrtimer_{start/cancel} functions call into tracing which uses RCU.
But it is not legal to call into RCU in cpuidle because it is one of the
quiescent states. Hence protect this region with RCU_NONIDLE which informs
RCU that the cpu is momentarily non-idle.

As an aside it is helpful to point out that the clock event device that is
programmed here is not a per-cpu clock device; it is a
pseudo clock device, used by the broadcast framework alone.
The per-cpu clock device programming never goes through bc_set_next().

Signed-off-by: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: linuxppc-dev@ozlabs.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a127d2bcf1fbc8c8e0b5cf0dab54f7d3ff50ce47 ]

The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. The associated
call graph is :

	cpuidle_idle_call()
	|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
	     |_____tick_broadcast_set_event()
		   |____clockevents_program_event()
			|____bc_set_next()

The hrtimer_{start/cancel} functions call into tracing which uses RCU.
But it is not legal to call into RCU in cpuidle because it is one of the
quiescent states. Hence protect this region with RCU_NONIDLE which informs
RCU that the cpu is momentarily non-idle.

As an aside it is helpful to point out that the clock event device that is
programmed here is not a per-cpu clock device; it is a
pseudo clock device, used by the broadcast framework alone.
The per-cpu clock device programming never goes through bc_set_next().

Signed-off-by: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: linuxppc-dev@ozlabs.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"</title>
<updated>2015-04-24T21:14:04+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2015-04-06T23:07:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=862158f2bba6497650cc441e3f97aeda07bf7071'/>
<id>862158f2bba6497650cc441e3f97aeda07bf7071</id>
<content type='text'>
[ Upstream commit f82daee49c09cf6a99c28303d93438a2566e5552 ]

Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved
regions) is reported to make resume from hibernation on Lenovo x230
unreliable, so revert it.

We will revisit the issue the commit in question was supposed to fix
in the future.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111
Reported-by: rhn &lt;kebuac.rhn@porcupinefactory.org&gt;
Cc: 3.17+ &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f82daee49c09cf6a99c28303d93438a2566e5552 ]

Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved
regions) is reported to make resume from hibernation on Lenovo x230
unreliable, so revert it.

We will revisit the issue the commit in question was supposed to fix
in the future.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111
Reported-by: rhn &lt;kebuac.rhn@porcupinefactory.org&gt;
Cc: 3.17+ &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix RLIMIT_RTTIME when PI-boosting to RT</title>
<updated>2015-04-24T21:13:43+00:00</updated>
<author>
<name>Brian Silverman</name>
<email>brian@peloton-tech.com</email>
</author>
<published>2015-02-19T00:23:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=42d921d36be6e0d315d1354921eaf0813544267c'/>
<id>42d921d36be6e0d315d1354921eaf0813544267c</id>
<content type='text'>
[ Upstream commit 746db9443ea57fd9c059f62c4bfbf41cf224fe13 ]

When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.

I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.

Signed-off-by: Brian Silverman &lt;brian@peloton-tech.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: austin@peloton-tech.com
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 746db9443ea57fd9c059f62c4bfbf41cf224fe13 ]

When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.

I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.

Signed-off-by: Brian Silverman &lt;brian@peloton-tech.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: austin@peloton-tech.com
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf: Fix irq_work 'tail' recursion</title>
<updated>2015-04-17T00:11:43+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-02-19T17:03:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f7d84d2b7c3e17b7004a1b4e71c3944bdf5d6cbb'/>
<id>f7d84d2b7c3e17b7004a1b4e71c3944bdf5d6cbb</id>
<content type='text'>
[ Upstream commit d525211f9d1be8b523ec7633f080f2116f5ea536 ]

Vince reported a watchdog lockup like:

	[&lt;ffffffff8115e114&gt;] perf_tp_event+0xc4/0x210
	[&lt;ffffffff810b4f8a&gt;] perf_trace_lock+0x12a/0x160
	[&lt;ffffffff810b7f10&gt;] lock_release+0x130/0x260
	[&lt;ffffffff816c7474&gt;] _raw_spin_unlock_irqrestore+0x24/0x40
	[&lt;ffffffff8107bb4d&gt;] do_send_sig_info+0x5d/0x80
	[&lt;ffffffff811f69df&gt;] send_sigio_to_task+0x12f/0x1a0
	[&lt;ffffffff811f71ce&gt;] send_sigio+0xae/0x100
	[&lt;ffffffff811f72b7&gt;] kill_fasync+0x97/0xf0
	[&lt;ffffffff8115d0b4&gt;] perf_event_wakeup+0xd4/0xf0
	[&lt;ffffffff8115d103&gt;] perf_pending_event+0x33/0x60
	[&lt;ffffffff8114e3fc&gt;] irq_work_run_list+0x4c/0x80
	[&lt;ffffffff8114e448&gt;] irq_work_run+0x18/0x40
	[&lt;ffffffff810196af&gt;] smp_trace_irq_work_interrupt+0x3f/0xc0
	[&lt;ffffffff816c99bd&gt;] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Tested-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit d525211f9d1be8b523ec7633f080f2116f5ea536 ]

Vince reported a watchdog lockup like:

	[&lt;ffffffff8115e114&gt;] perf_tp_event+0xc4/0x210
	[&lt;ffffffff810b4f8a&gt;] perf_trace_lock+0x12a/0x160
	[&lt;ffffffff810b7f10&gt;] lock_release+0x130/0x260
	[&lt;ffffffff816c7474&gt;] _raw_spin_unlock_irqrestore+0x24/0x40
	[&lt;ffffffff8107bb4d&gt;] do_send_sig_info+0x5d/0x80
	[&lt;ffffffff811f69df&gt;] send_sigio_to_task+0x12f/0x1a0
	[&lt;ffffffff811f71ce&gt;] send_sigio+0xae/0x100
	[&lt;ffffffff811f72b7&gt;] kill_fasync+0x97/0xf0
	[&lt;ffffffff8115d0b4&gt;] perf_event_wakeup+0xd4/0xf0
	[&lt;ffffffff8115d103&gt;] perf_pending_event+0x33/0x60
	[&lt;ffffffff8114e3fc&gt;] irq_work_run_list+0x4c/0x80
	[&lt;ffffffff8114e448&gt;] irq_work_run+0x18/0x40
	[&lt;ffffffff810196af&gt;] smp_trace_irq_work_interrupt+0x3f/0xc0
	[&lt;ffffffff816c99bd&gt;] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Tested-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/wait: Provide infrastructure to deal with nested blocking</title>
<updated>2015-04-17T00:11:19+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-09-24T08:18:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=32384effc14fcd45beacae7a586658b7af7c67e3'/>
<id>32384effc14fcd45beacae7a586658b7af7c67e3</id>
<content type='text'>
[ Upstream commit 61ada528dea028331e99e8ceaed87c683ad25de2 ]

There are a few places that call blocking primitives from wait loops,
provide infrastructure to support this without the typical
task_struct::state collision.

We record the wakeup in wait_queue_t::flags which leaves
task_struct::state free to be used by others.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 61ada528dea028331e99e8ceaed87c683ad25de2 ]

There are a few places that call blocking primitives from wait loops,
provide infrastructure to support this without the typical
task_struct::state collision.

We record the wakeup in wait_queue_t::flags which leaves
task_struct::state free to be used by others.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: Fix cpuset sched_relax_domain_level</title>
<updated>2015-03-28T13:42:58+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2015-02-13T03:58:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ea7358ff38e4f1b029fc7b5383f2f9d7b3a2f684'/>
<id>ea7358ff38e4f1b029fc7b5383f2f9d7b3a2f684</id>
<content type='text'>
[ Upstream commit 283cb41f426b723a0255702b761b0fc5d1b53a81 ]

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26acce ("cpuset: replace cpuset-&gt;stack_list with cpuset_for_each_descendant_pre()")
Cc: &lt;stable@vger.kernel.org&gt; # 3.9+
Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 283cb41f426b723a0255702b761b0fc5d1b53a81 ]

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26acce ("cpuset: replace cpuset-&gt;stack_list with cpuset_for_each_descendant_pre()")
Cc: &lt;stable@vger.kernel.org&gt; # 3.9+
Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: fix a warning when clearing configured masks in old hierarchy</title>
<updated>2015-03-28T13:42:51+00:00</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2015-02-13T03:20:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=79692efa3aadec55cbfbf385077a5415cf314f22'/>
<id>79692efa3aadec55cbfbf385077a5415cf314f22</id>
<content type='text'>
[ Upstream commit 79063bffc81f82689bd90e16da1b49408f3bf095 ]

When we clear cpuset.cpus, cpuset.effective_cpus won't be cleared:

  # mount -t cgroup -o cpuset xxx /mnt
  # mkdir /mnt/tmp
  # echo 0 &gt; /mnt/tmp/cpuset.cpus
  # echo &gt; /mnt/tmp/cpuset.cpus
  # cat cpuset.cpus

  # cat cpuset.effective_cpus
  0-15

And a kernel warning in update_cpumasks_hier() is triggered:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4028 at kernel/cpuset.c:894 update_cpumasks_hier+0x471/0x650()

Cc: &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 79063bffc81f82689bd90e16da1b49408f3bf095 ]

When we clear cpuset.cpus, cpuset.effective_cpus won't be cleared:

  # mount -t cgroup -o cpuset xxx /mnt
  # mkdir /mnt/tmp
  # echo 0 &gt; /mnt/tmp/cpuset.cpus
  # echo &gt; /mnt/tmp/cpuset.cpus
  # cat cpuset.cpus

  # cat cpuset.effective_cpus
  0-15

And a kernel warning in update_cpumasks_hier() is triggered:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4028 at kernel/cpuset.c:894 update_cpumasks_hier+0x471/0x650()

Cc: &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: initialize effective masks when clone_children is enabled</title>
<updated>2015-03-28T13:42:38+00:00</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2015-02-13T03:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dacb6ccdcc8374f9d100c23d50087b8cb1fd0eb3'/>
<id>dacb6ccdcc8374f9d100c23d50087b8cb1fd0eb3</id>
<content type='text'>
[ Upstream commit 790317e1b266c776765a4bdcedefea706ff0fada ]

If clone_children is enabled, effective masks won't be initialized
due to the bug:

  # mount -t cgroup -o cpuset xxx /mnt
  # echo 1 &gt; cgroup.clone_children
  # mkdir /mnt/tmp
  # cat /mnt/tmp/
  # cat cpuset.effective_cpus

  # cat cpuset.cpus
  0-15

And then this cpuset won't constrain the tasks in it.

Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.

Reported-by: Christian Brauner &lt;christianvanbrauner@gmail.com&gt;
Reported-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 790317e1b266c776765a4bdcedefea706ff0fada ]

If clone_children is enabled, effective masks won't be initialized
due to the bug:

  # mount -t cgroup -o cpuset xxx /mnt
  # echo 1 &gt; cgroup.clone_children
  # mkdir /mnt/tmp
  # cat /mnt/tmp/
  # cat cpuset.effective_cpus

  # cat cpuset.cpus
  0-15

And then this cpuset won't constrain the tasks in it.

Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.

Reported-by: Christian Brauner &lt;christianvanbrauner@gmail.com&gt;
Reported-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.17+
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE</title>
<updated>2015-03-28T13:37:48+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-03-05T13:04:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d4bc18f7bbd9aa62b1794fe8f409bd0c1b88a597'/>
<id>d4bc18f7bbd9aa62b1794fe8f409bd0c1b88a597</id>
<content type='text'>
[ Upstream commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 ]

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Cc: Tomeu Vizoso &lt;tomeu.vizoso@gmail.com&gt;
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Tested-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 ]

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Cc: Tomeu Vizoso &lt;tomeu.vizoso@gmail.com&gt;
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Tested-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
