<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/s390, branch v3.2.36</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>s390/kvm: dont announce RRBM support</title>
<updated>2013-01-03T03:32:55+00:00</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2012-10-02T14:25:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=511c73bcf8e747cc95925a311dfeb630989db5a6'/>
<id>511c73bcf8e747cc95925a311dfeb630989db5a6</id>
<content type='text'>
commit 87cac8f879a5ecd7109dbe688087e8810b3364eb upstream.

Newer kernels (linux-next with the transparent huge page patches)
use rrbm if the feature is announced via feature bit 66.
RRBM will cause intercepts, so KVM does not handle it right now,
causing an illegal instruction in the guest.
The  easy solution is to disable the feature bit for the guest.

This fixes bugs like:
Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
illegal operation: 0001 [#1] SMP
Modules linked in: virtio_balloon virtio_net ipv6 autofs4
CPU: 0 Not tainted 3.5.4 #1
Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
     R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
     00000000003cc000 0000000000000004 0000000000000000 0000000079800000
     0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
     0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
     000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
 Krnl Code:&gt;0000000000124c2a: b9810025          ogr     %r2,%r5
           0000000000124c2e: 41343000           la      %r3,0(%r4,%r3)
           0000000000124c32: a716fffa           brct    %r1,124c26
           0000000000124c36: b9010022           lngr    %r2,%r2
           0000000000124c3a: e3d0f0800004       lg      %r13,128(%r15)
           0000000000124c40: eb22003f000c       srlg    %r2,%r2,63
[ 2150.713198] Call Trace:
[ 2150.713223] ([&lt;00000000002312c4&gt;] page_referenced_one+0x6c/0x27c)
[ 2150.713749]  [&lt;0000000000233812&gt;] page_referenced+0x32a/0x410
[...]

CC: Alex Graf &lt;agraf@suse.de&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 87cac8f879a5ecd7109dbe688087e8810b3364eb upstream.

Newer kernels (linux-next with the transparent huge page patches)
use rrbm if the feature is announced via feature bit 66.
RRBM will cause intercepts, so KVM does not handle it right now,
causing an illegal instruction in the guest.
The  easy solution is to disable the feature bit for the guest.

This fixes bugs like:
Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
illegal operation: 0001 [#1] SMP
Modules linked in: virtio_balloon virtio_net ipv6 autofs4
CPU: 0 Not tainted 3.5.4 #1
Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
     R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
     00000000003cc000 0000000000000004 0000000000000000 0000000079800000
     0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
     0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
     000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
 Krnl Code:&gt;0000000000124c2a: b9810025          ogr     %r2,%r5
           0000000000124c2e: 41343000           la      %r3,0(%r4,%r3)
           0000000000124c32: a716fffa           brct    %r1,124c26
           0000000000124c36: b9010022           lngr    %r2,%r2
           0000000000124c3a: e3d0f0800004       lg      %r13,128(%r15)
           0000000000124c40: eb22003f000c       srlg    %r2,%r2,63
[ 2150.713198] Call Trace:
[ 2150.713223] ([&lt;00000000002312c4&gt;] page_referenced_one+0x6c/0x27c)
[ 2150.713749]  [&lt;0000000000233812&gt;] page_referenced+0x32a/0x410
[...]

CC: Alex Graf &lt;agraf@suse.de&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/gup: add missing TASK_SIZE check to get_user_pages_fast()</title>
<updated>2012-12-06T11:20:05+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-10-22T13:49:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=63d0412fb38ee2f13e7d68b1f2b136ea91d52bc2'/>
<id>63d0412fb38ee2f13e7d68b1f2b136ea91d52bc2</id>
<content type='text'>
commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream.

When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.

Reviewed-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream.

When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.

Reviewed-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/signal: set correct address space control</title>
<updated>2012-12-06T11:20:04+00:00</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2012-11-07T09:44:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b6908c5824b42966d700c10a9c4241b5f82119e1'/>
<id>b6908c5824b42966d700c10a9c4241b5f82119e1</id>
<content type='text'>
commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream.

If user space is running in primary mode it can switch to secondary
or access register mode, this is used e.g. in the clock_gettime code
of the vdso. If a signal is delivered to the user space process while
it has been running in access register mode the signal handler is
executed in access register mode as well which will result in a crash
most of the time.

Set the address space control bits in the PSW to the default for the
execution of the signal handler and make sure that the previous
address space control is restored on signal return. Take care
that user space can not switch to the kernel address space by
modifying the registers in the signal frame.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2:
 - Adjust filename
 - The RI bit is not included in PSW_MASK_USER]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream.

If user space is running in primary mode it can switch to secondary
or access register mode, this is used e.g. in the clock_gettime code
of the vdso. If a signal is delivered to the user space process while
it has been running in access register mode the signal handler is
executed in access register mode as well which will result in a crash
most of the time.

Set the address space control bits in the PSW to the default for the
execution of the signal handler and make sure that the previous
address space control is restored on signal return. Take care
that user space can not switch to the kernel address space by
modifying the registers in the signal frame.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2:
 - Adjust filename
 - The RI bit is not included in PSW_MASK_USER]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390: fix linker script for 31 bit builds</title>
<updated>2012-10-30T23:26:52+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-10-18T09:11:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=adb0f4a1995dc2a166313ac5ddc0a2f4a5c2d4c6'/>
<id>adb0f4a1995dc2a166313ac5ddc0a2f4a5c2d4c6</id>
<content type='text'>
commit c985cb37f1b39c2c8035af741a2a0b79f1fbaca7 upstream.

Because of a change in the s390 arch backend of binutils (commit 23ecd77
"Pick the default arch depending on the target size" in binutils repo)
31 bit builds will fail since the linker would now try to create 64 bit
binary output.
Fix this by setting OUTPUT_ARCH to s390:31-bit instead of s390.
Thanks to Andreas Krebbel for figuring out the issue.

Fixes this build error:

  LD      init/built-in.o
s390x-4.7.2-ld: s390:31-bit architecture of input file
 `arch/s390/kernel/head.o' is incompatible with s390:64-bit output

Cc: Andreas Krebbel &lt;Andreas.Krebbel@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c985cb37f1b39c2c8035af741a2a0b79f1fbaca7 upstream.

Because of a change in the s390 arch backend of binutils (commit 23ecd77
"Pick the default arch depending on the target size" in binutils repo)
31 bit builds will fail since the linker would now try to create 64 bit
binary output.
Fix this by setting OUTPUT_ARCH to s390:31-bit instead of s390.
Thanks to Andreas Krebbel for figuring out the issue.

Fixes this build error:

  LD      init/built-in.o
s390x-4.7.2-ld: s390:31-bit architecture of input file
 `arch/s390/kernel/head.o' is incompatible with s390:64-bit output

Cc: Andreas Krebbel &lt;Andreas.Krebbel@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/compat: fix mmap compat system calls</title>
<updated>2012-08-19T17:15:35+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-08-08T07:32:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b2ef34da747ee998a41e6b081987480777b7c23a'/>
<id>b2ef34da747ee998a41e6b081987480777b7c23a</id>
<content type='text'>
commit e85871218513c54f7dfdb6009043cb638f2fecbe upstream.

The native 31 bit and the compat behaviour for the mmap system calls differ:

In native 31 bit mode the passed in address for the mmap system call will be
unmodified passed to sys_mmap_pgoff().
In compat mode however the passed in address will be modified with
compat_ptr() which masks out the most significant bit.

The result is that in native 31 bit mode each mmap request (with MAP_FIXED)
will fail where the most significat bit is set, while in compat mode it
may succeed.

This odd behaviour was introduced with d3815898 "[S390] mmap: add missing
compat_ptr conversion to both mmap compat syscalls".

To restore a consistent behaviour accross native and compat mode this
patch functionally reverts the above mentioned commit.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e85871218513c54f7dfdb6009043cb638f2fecbe upstream.

The native 31 bit and the compat behaviour for the mmap system calls differ:

In native 31 bit mode the passed in address for the mmap system call will be
unmodified passed to sys_mmap_pgoff().
In compat mode however the passed in address will be modified with
compat_ptr() which masks out the most significant bit.

The result is that in native 31 bit mode each mmap request (with MAP_FIXED)
will fail where the most significat bit is set, while in compat mode it
may succeed.

This odd behaviour was introduced with d3815898 "[S390] mmap: add missing
compat_ptr conversion to both mmap compat syscalls".

To restore a consistent behaviour accross native and compat mode this
patch functionally reverts the above mentioned commit.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/compat: fix compat wrappers for process_vm system calls</title>
<updated>2012-08-19T17:15:35+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-08-07T07:48:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0b3160dde9f41e7bbc125ab8d00a6e367e956854'/>
<id>0b3160dde9f41e7bbc125ab8d00a6e367e956854</id>
<content type='text'>
commit 82aabdb6f1eb61e0034ec23901480f5dd23db7c4 upstream.

The compat wrappers incorrectly called the non compat versions of
the system process_vm system calls.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 82aabdb6f1eb61e0034ec23901480f5dd23db7c4 upstream.

The compat wrappers incorrectly called the non compat versions of
the system process_vm system calls.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/mm: fix fault handling for page table walk case</title>
<updated>2012-08-09T23:24:58+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-07-27T07:45:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c929e9bc8cadd3dd1e914498e3905d3d4cc43f85'/>
<id>c929e9bc8cadd3dd1e914498e3905d3d4cc43f85</id>
<content type='text'>
commit 008c2e8f247f0a8db1e8e26139da12f3a3abcda0 upstream.

Make sure the kernel does not incorrectly create a SIGBUS signal during
user space accesses:

For user space accesses in the switched addressing mode case the kernel
may walk page tables and access user address space via the kernel
mapping. If a page table entry is invalid the function __handle_fault()
gets called in order to emulate a page fault and trigger all the usual
actions like paging in a missing page etc. by calling handle_mm_fault().

If handle_mm_fault() returns with an error fixup handling is necessary.
For the switched addressing mode case all errors need to be mapped to
-EFAULT, so that the calling uaccess function can return -EFAULT to
user space.

Unfortunately the __handle_fault() incorrectly calls do_sigbus() if
VM_FAULT_SIGBUS is set. This however should only happen if a page fault
was triggered by a user space instruction. For kernel mode uaccesses
the correct action is to only return -EFAULT.
So user space may incorrectly see SIGBUS signals because of this bug.

For current machines this would only be possible for the switched
addressing mode case in conjunction with futex operations.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: do_exception() and do_sigbus() parameters differ]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 008c2e8f247f0a8db1e8e26139da12f3a3abcda0 upstream.

Make sure the kernel does not incorrectly create a SIGBUS signal during
user space accesses:

For user space accesses in the switched addressing mode case the kernel
may walk page tables and access user address space via the kernel
mapping. If a page table entry is invalid the function __handle_fault()
gets called in order to emulate a page fault and trigger all the usual
actions like paging in a missing page etc. by calling handle_mm_fault().

If handle_mm_fault() returns with an error fixup handling is necessary.
For the switched addressing mode case all errors need to be mapped to
-EFAULT, so that the calling uaccess function can return -EFAULT to
user space.

Unfortunately the __handle_fault() incorrectly calls do_sigbus() if
VM_FAULT_SIGBUS is set. This however should only happen if a page fault
was triggered by a user space instruction. For kernel mode uaccesses
the correct action is to only return -EFAULT.
So user space may incorrectly see SIGBUS signals because of this bug.

For current machines this would only be possible for the switched
addressing mode case in conjunction with futex operations.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: do_exception() and do_sigbus() parameters differ]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/mm: downgrade page table after fork of a 31 bit process</title>
<updated>2012-08-09T23:24:54+00:00</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2012-07-26T06:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f22027d0e8398e377a42a62241137b77a35eda97'/>
<id>f22027d0e8398e377a42a62241137b77a35eda97</id>
<content type='text'>
commit 0f6f281b731d20bfe75c13f85d33f3f05b440222 upstream.

The downgrade of the 4 level page table created by init_new_context is
currently done only in start_thread31. If a 31 bit process forks the
new mm uses a 4 level page table, including the task size of 2&lt;&lt;42
that goes along with it. This is incorrect as now a 31 bit process
can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade
after fork.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0f6f281b731d20bfe75c13f85d33f3f05b440222 upstream.

The downgrade of the 4 level page table created by init_new_context is
currently done only in start_thread31. If a 31 bit process forks the
new mm uses a 4 level page table, including the task size of 2&lt;&lt;42
that goes along with it. This is incorrect as now a 31 bit process
can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade
after fork.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/idle: fix sequence handling vs cpu hotplug</title>
<updated>2012-08-02T13:37:50+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-07-13T13:45:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=88981d86ee7c3d3bcfe75f359363658478667fa5'/>
<id>88981d86ee7c3d3bcfe75f359363658478667fa5</id>
<content type='text'>
commit 0008204ffe85d23382d6fd0f971f3f0fbe70bae2 upstream.

The s390 idle accounting code uses a sequence counter which gets used
when the per cpu idle statistics get updated and read.

One assumption on read access is that only when the sequence counter is
even and did not change while reading all values the result is valid.
On cpu hotplug however the per cpu data structure gets initialized via
a cpu hotplug notifier on CPU_ONLINE.
CPU_ONLINE however is too late, since the onlined cpu is already running
and might access the per cpu data. Worst case is that the data structure
gets initialized while an idle thread is updating its idle statistics.
This will result in an uneven sequence counter after an update.

As a result user space tools like top, which access /proc/stat in order
to get idle stats, will busy loop waiting for the sequence counter to
become even again, which will never happen until the queried cpu will
update its idle statistics again. And even then the sequence counter
will only have an even value for a couple of cpu cycles.

Fix this by moving the initialization of the per cpu idle statistics
to cpu_init(). I prefer that solution in favor of changing the
notifier to CPU_UP_PREPARE, which would be a different solution to
the problem.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0008204ffe85d23382d6fd0f971f3f0fbe70bae2 upstream.

The s390 idle accounting code uses a sequence counter which gets used
when the per cpu idle statistics get updated and read.

One assumption on read access is that only when the sequence counter is
even and did not change while reading all values the result is valid.
On cpu hotplug however the per cpu data structure gets initialized via
a cpu hotplug notifier on CPU_ONLINE.
CPU_ONLINE however is too late, since the onlined cpu is already running
and might access the per cpu data. Worst case is that the data structure
gets initialized while an idle thread is updating its idle statistics.
This will result in an uneven sequence counter after an update.

As a result user space tools like top, which access /proc/stat in order
to get idle stats, will busy loop waiting for the sequence counter to
become even again, which will never happen until the queried cpu will
update its idle statistics again. And even then the sequence counter
will only have an even value for a couple of cpu cycles.

Fix this by moving the initialization of the per cpu idle statistics
to cpu_init(). I prefer that solution in favor of changing the
notifier to CPU_UP_PREPARE, which would be a different solution to
the problem.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390/pfault: fix task state race</title>
<updated>2012-05-30T23:43:28+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2012-05-09T07:37:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=559dd7d5b6fee151313910afbe3f19ceaf18dc9b'/>
<id>559dd7d5b6fee151313910afbe3f19ceaf18dc9b</id>
<content type='text'>
commit d5e50a51ccbda36b379aba9d1131a852eb908dda upstream.

When setting the current task state to TASK_UNINTERRUPTIBLE this can
race with a different cpu. The other cpu could set the task state after
it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which
would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again.

This race was always present in the pfault interrupt code but didn't
cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug
vs missing completion interrupts" which relied on the fact that after
setting the task state to TASK_UNINTERRUPTIBLE the task would really
sleep.
Since this is not necessarily the case the result may be a list corruption
of the pfault_list or, as observed, a use-after-free bug while trying to
access the task_struct of a task which terminated itself already.

To fix this, we need to get a reference of the affected task when receiving
the initial pfault interrupt and add special handling if we receive yet
another initial pfault interrupt when the task is already enqueued in the
pfault list.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Reviewed-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d5e50a51ccbda36b379aba9d1131a852eb908dda upstream.

When setting the current task state to TASK_UNINTERRUPTIBLE this can
race with a different cpu. The other cpu could set the task state after
it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which
would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again.

This race was always present in the pfault interrupt code but didn't
cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug
vs missing completion interrupts" which relied on the fact that after
setting the task state to TASK_UNINTERRUPTIBLE the task would really
sleep.
Since this is not necessarily the case the result may be a list corruption
of the pfault_list or, as observed, a use-after-free bug while trying to
access the task_struct of a task which terminated itself already.

To fix this, we need to get a reference of the affected task when receiving
the initial pfault interrupt and add special handling if we receive yet
another initial pfault interrupt when the task is already enqueued in the
pfault list.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Reviewed-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
