<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/arc/include/asm, branch v3.10.32</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ARC: Workaround spinlock livelock in SMP SystemC simulation</title>
<updated>2013-10-18T14:45:45+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-09-25T11:23:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0c06a0a693a5baaeacdb4c9485d5d6d490ea8a23'/>
<id>0c06a0a693a5baaeacdb4c9485d5d6d490ea8a23</id>
<content type='text'>
commit 6c00350b573c0bd3635436e43e8696951dd6e1b6 upstream.

Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6c00350b573c0bd3635436e43e8696951dd6e1b6 upstream.

Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Fix 32-bit wrap around in access_ok()</title>
<updated>2013-10-18T14:45:45+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-09-26T13:20:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a683a93b1ce0b86944a51a1b8f787aa684836edb'/>
<id>a683a93b1ce0b86944a51a1b8f787aa684836edb</id>
<content type='text'>
commit 0752adfda15f0eca9859a76da3db1800e129ad43 upstream.

Anton reported

 | LTP tests syscalls/process_vm_readv01 and process_vm_writev01 fail
 | similarly in one testcase test_iov_invalid -&gt; lvec-&gt;iov_base.
 | Testcase expects errno EFAULT and return code -1,
 | but it gets return code 1 and ERRNO is 0 what means success.

Essentially test case was passing a pointer of -1 which access_ok()
was not catching. It was doing [@addr + @sz &lt;= TASK_SIZE] which would
pass for @addr == -1

Fixed that by rewriting as [@addr &lt;= TASK_SIZE - @sz]

Reported-by: Anton Kolesov &lt;Anton.Kolesov@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0752adfda15f0eca9859a76da3db1800e129ad43 upstream.

Anton reported

 | LTP tests syscalls/process_vm_readv01 and process_vm_writev01 fail
 | similarly in one testcase test_iov_invalid -&gt; lvec-&gt;iov_base.
 | Testcase expects errno EFAULT and return code -1,
 | but it gets return code 1 and ERRNO is 0 what means success.

Essentially test case was passing a pointer of -1 which access_ok()
was not catching. It was doing [@addr + @sz &lt;= TASK_SIZE] which would
pass for @addr == -1

Fixed that by rewriting as [@addr &lt;= TASK_SIZE - @sz]

Reported-by: Anton Kolesov &lt;Anton.Kolesov@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Fix __udelay calculation</title>
<updated>2013-10-18T14:45:45+00:00</updated>
<author>
<name>Mischa Jonker</name>
<email>mjonker@synopsys.com</email>
</author>
<published>2013-08-30T09:56:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8036c31c84117707d4132cd199d997d7ed41427c'/>
<id>8036c31c84117707d4132cd199d997d7ed41427c</id>
<content type='text'>
commit 7efd0da2d17360e1cef91507dbe619db0ee2c691 upstream.

Cast usecs to u64, to ensure that the (usecs * 4295 * HZ)
multiplication is 64 bit.

Initially, the (usecs * 4295 * HZ) part was done as a 32 bit
multiplication, with the result casted to 64 bit. This led to some bits
falling off, causing a "DMA initialization error" in the stmmac Ethernet
driver, due to a premature timeout.

Signed-off-by: Mischa Jonker &lt;mjonker@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7efd0da2d17360e1cef91507dbe619db0ee2c691 upstream.

Cast usecs to u64, to ensure that the (usecs * 4295 * HZ)
multiplication is 64 bit.

Initially, the (usecs * 4295 * HZ) part was done as a 32 bit
multiplication, with the result casted to 64 bit. This led to some bits
falling off, causing a "DMA initialization error" in the stmmac Ethernet
driver, due to a premature timeout.

Signed-off-by: Mischa Jonker &lt;mjonker@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: SMP failed to boot due to missing IVT setup</title>
<updated>2013-10-18T14:45:45+00:00</updated>
<author>
<name>Noam Camus</name>
<email>noamc@ezchip.com</email>
</author>
<published>2013-09-12T07:37:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=98f745546bd27e54fe0bed1e9c900301428de9d5'/>
<id>98f745546bd27e54fe0bed1e9c900301428de9d5</id>
<content type='text'>
commit c3567f8a359b7917dcffa442301f88ed0a75211f upstream.

Commit 05b016ecf5e7a "ARC: Setup Vector Table Base in early boot" moved
the Interrupt vector Table setup out of arc_init_IRQ() which is called
for all CPUs, to entry point of boot cpu only, breaking booting of others.

Fix by adding the same to entry point of non-boot CPUs too.

read_arc_build_cfg_regs() printing IVT Base Register didn't help the
casue since it prints a synthetic value if zero which is totally bogus,
so fix that to print the exact Register.

[vgupta: Remove the now stale comment from header of arc_init_IRQ and
also added the commentary for halt-on-reset]

Cc: Gilad Ben-Yossef &lt;gilad@benyossef.com&gt;
Signed-off-by: Noam Camus &lt;noamc@ezchip.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c3567f8a359b7917dcffa442301f88ed0a75211f upstream.

Commit 05b016ecf5e7a "ARC: Setup Vector Table Base in early boot" moved
the Interrupt vector Table setup out of arc_init_IRQ() which is called
for all CPUs, to entry point of boot cpu only, breaking booting of others.

Fix by adding the same to entry point of non-boot CPUs too.

read_arc_build_cfg_regs() printing IVT Base Register didn't help the
casue since it prints a synthetic value if zero which is totally bogus,
so fix that to print the exact Register.

[vgupta: Remove the now stale comment from header of arc_init_IRQ and
also added the commentary for halt-on-reset]

Cc: Gilad Ben-Yossef &lt;gilad@benyossef.com&gt;
Signed-off-by: Noam Camus &lt;noamc@ezchip.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: gdbserver breakage in Big-Endian configuration #2</title>
<updated>2013-08-29T16:47:29+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>Vineet.Gupta1@synopsys.com</email>
</author>
<published>2013-08-20T08:08:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=16a06df257641d466895454f0320b8ccb1583b21'/>
<id>16a06df257641d466895454f0320b8ccb1583b21</id>
<content type='text'>
[Based on mainline commit 352c1d95e3220d0: "ARC: stop using
pt_regs-&gt;orig_r8"]

Stop using orig_r8 as it could get clobbered by ST in trap_with_param,
and further it is semantically not needed either.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Based on mainline commit 352c1d95e3220d0: "ARC: stop using
pt_regs-&gt;orig_r8"]

Stop using orig_r8 as it could get clobbered by ST in trap_with_param,
and further it is semantically not needed either.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: gdbserver breakage in Big-Endian configuration #1</title>
<updated>2013-08-29T16:47:29+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>Vineet.Gupta1@synopsys.com</email>
</author>
<published>2013-08-20T08:08:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9b2c750d8e81da33cdf6e16db911faa2008652cd'/>
<id>9b2c750d8e81da33cdf6e16db911faa2008652cd</id>
<content type='text'>
[Based on mainline commit 502a0c775c7f0a: "ARC: pt_regs update #5"]

gdbserver needs @stop_pc, served by ptrace, but fetched from pt_regs
differently, based on in_brkpt_traps(), which in turn relies on
additional machine state in pt_regs-&gt;event bitfield.

        unsigned long orig_r8:16, event:16;

For big endian config, this macro was returning false, despite being in
breakpoint Trap exception, causing wrong @stop_pc to be returned to gdb.

Issue #1: In BE, @event above is at offset 2 in word, while a STW insn
          at offset 0 was used to update it. Resort to using ST insn
	  which updates the half-word at right location.

Issue #2: The union involving bitfields causes all the members to be
	  laid out at offset 0. So with fix #1 above, ASM was now
	  updating at offset 2, "C" code was still referencing at
	  offset 0. Fixed by wrapping bitfield in a struct.

Reported-by: Noam Camus &lt;noamc@ezchip.com&gt;
Tested-by: Anton Kolesov &lt;akolesov@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Based on mainline commit 502a0c775c7f0a: "ARC: pt_regs update #5"]

gdbserver needs @stop_pc, served by ptrace, but fetched from pt_regs
differently, based on in_brkpt_traps(), which in turn relies on
additional machine state in pt_regs-&gt;event bitfield.

        unsigned long orig_r8:16, event:16;

For big endian config, this macro was returning false, despite being in
breakpoint Trap exception, causing wrong @stop_pc to be returned to gdb.

Issue #1: In BE, @event above is at offset 2 in word, while a STW insn
          at offset 0 was used to update it. Resort to using ST insn
	  which updates the half-word at right location.

Issue #2: The union involving bitfields causes all the members to be
	  laid out at offset 0. So with fix #1 above, ASM was now
	  updating at offset 2, "C" code was still referencing at
	  offset 0. Fixed by wrapping bitfield in a struct.

Reported-by: Noam Camus &lt;noamc@ezchip.com&gt;
Tested-by: Anton Kolesov &lt;akolesov@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: lazy dcache flush broke gdb in non-aliasing configs</title>
<updated>2013-05-25T08:45:55+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-05-25T08:34:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7bb66f6e6eecdd8e10ed3a63bd28c1e9105adc79'/>
<id>7bb66f6e6eecdd8e10ed3a63bd28c1e9105adc79</id>
<content type='text'>
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.

So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.

So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Use enough bits for determining page's cache color</title>
<updated>2013-05-23T08:55:09+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-05-19T08:36:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=006dfb3c9c44192f06093d65b3a876fa5ad1319a'/>
<id>006dfb3c9c44192f06093d65b3a876fa5ad1319a</id>
<content type='text'>
The current code uses 2 bits for determining page's dcache color, thus
sorting pages into 4 bins, whereas the aliasing dcache really has 2 bins
(8k page, 64k dcache - 4 way-set-assoc).
This can cause extraneous flushes - e.g. color 0 and 2.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current code uses 2 bits for determining page's dcache color, thus
sorting pages into 4 bins, whereas the aliasing dcache really has 2 bins
(8k page, 64k dcache - 4 way-set-assoc).
This can cause extraneous flushes - e.g. color 0 and 2.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Brown paper bag bug in macro for checking cache color</title>
<updated>2013-05-23T08:54:52+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-05-22T13:08:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e87974dec5ec25a8a4852d9292db6be659164e6'/>
<id>3e87974dec5ec25a8a4852d9292db6be659164e6</id>
<content type='text'>
The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()

The intention was to have the equivalent of following:

	if (a || (1 ? b : 0))

but we ended up with following:

	if (a || 1 ? b : 0)

And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of &lt;a&gt;

Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
   for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
   be missed if U/K mappings were congruent.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()

The intention was to have the equivalent of following:

	if (a || (1 ? b : 0))

but we ended up with following:

	if (a || 1 ? b : 0)

And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of &lt;a&gt;

Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
   for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
   be missed if U/K mappings were congruent.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: copy_(to|from)_user() to honor usermode-access permissions</title>
<updated>2013-05-23T05:03:03+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2013-05-21T09:55:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a950549c675f2c8c504469dec7d780da8a6433dc'/>
<id>a950549c675f2c8c504469dec7d780da8a6433dc</id>
<content type='text'>
This manifested as grep failing psuedo-randomly:

--------------&gt;8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
    inet 127.0.0.1/8 scope host lo
--------------&gt;8---------------------

ARC700 MMU provides fully orthogonal permission bits per page:
Ur, Uw, Ux, Kr, Kw, Kx

The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.

1. Read access to an anon mapped page in libc .bss: write-protected
   zero_page mapped: TLB Entry installed with Ur + K[rwx]

2. grep calls libc:getc() -&gt; buffered read layer calls read(2) with the
   internal read buffer in same .bss page.
   The read() call is on STDIN which has been redirected to a pipe.
   read(2) =&gt; sys_read() =&gt; pipe_read() =&gt; copy_to_user()

3. Since page has Kernel-write permission (despite being user-mode
   write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
   Exception (page-fault for ARC). core-MM is unaware that kernel
   erroneously wrote to the reserved read-only zero-page (BUG #1)

4. Control returns to userspace which now does a write to same .bss page
   Since Linux MM is not aware that page has been modified by kernel, it
   simply reassigns a new writable zero-init page to mapping, loosing the
   prior write by kernel - effectively zero'ing out the libc read buffer
   under the hood - hence grep doesn't see right data (BUG #2)

The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.

The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.

Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.

With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.

Reported-by: Anton Kolesov &lt;akolesov@synopsys.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.9
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This manifested as grep failing psuedo-randomly:

--------------&gt;8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
    inet 127.0.0.1/8 scope host lo
--------------&gt;8---------------------

ARC700 MMU provides fully orthogonal permission bits per page:
Ur, Uw, Ux, Kr, Kw, Kx

The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.

1. Read access to an anon mapped page in libc .bss: write-protected
   zero_page mapped: TLB Entry installed with Ur + K[rwx]

2. grep calls libc:getc() -&gt; buffered read layer calls read(2) with the
   internal read buffer in same .bss page.
   The read() call is on STDIN which has been redirected to a pipe.
   read(2) =&gt; sys_read() =&gt; pipe_read() =&gt; copy_to_user()

3. Since page has Kernel-write permission (despite being user-mode
   write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
   Exception (page-fault for ARC). core-MM is unaware that kernel
   erroneously wrote to the reserved read-only zero-page (BUG #1)

4. Control returns to userspace which now does a write to same .bss page
   Since Linux MM is not aware that page has been modified by kernel, it
   simply reassigns a new writable zero-init page to mapping, loosing the
   prior write by kernel - effectively zero'ing out the libc read buffer
   under the hood - hence grep doesn't see right data (BUG #2)

The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.

The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.

Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.

With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.

Reported-by: Anton Kolesov &lt;akolesov@synopsys.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.9
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
