<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux, branch v3.2.46</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>macvlan: fix passthru mode race between dev removal  and rx path</title>
<updated>2013-05-30T13:35:14+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@resnulli.us</email>
</author>
<published>2013-05-09T04:23:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=24ac9e8939ffcbfed67eea6e82c7eb99c34d0b55'/>
<id>24ac9e8939ffcbfed67eea6e82c7eb99c34d0b55</id>
<content type='text'>
[ Upstream commit 233c7df0821c4190e2d3f4be0f2ca0ab40a5ed8c, note
  that I had to add list_first_or_null_rcu to rculist.h in order
  to accomodate this fix. ]

Currently, if macvlan in passthru mode is created and data are rxed and
you remove this device, following panic happens:

NULL pointer dereference at 0000000000000198
IP: [&lt;ffffffffa0196058&gt;] macvlan_handle_frame+0x153/0x1f7 [macvlan]

I'm using following script to trigger this:
&lt;script&gt;
while [ 1 ]
do
	ip link add link e1 name macvtap0 type macvtap mode passthru
	ip link set e1 up
	ip link set macvtap0 up
	IFINDEX=`ip link |grep macvtap0 | cut -f 1 -d ':'`
	cat /dev/tap$IFINDEX  &gt;/dev/null &amp;
	ip link del dev macvtap0
done
&lt;/script&gt;

I run this script while "ping -f" is running on another machine to send
packets to e1 rx.

Reason of the panic is that list_first_entry() is blindly called in
macvlan_handle_frame() even if the list was empty. vlan is set to
incorrect pointer which leads to the crash.

I'm fixing this by protecting port-&gt;vlans list by rcu and by preventing
from getting incorrect pointer in case the list is empty.

Introduced by: commit eb06acdc85585f2 "macvlan: Introduce 'passthru' mode to takeover the underlying device"

Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 233c7df0821c4190e2d3f4be0f2ca0ab40a5ed8c, note
  that I had to add list_first_or_null_rcu to rculist.h in order
  to accomodate this fix. ]

Currently, if macvlan in passthru mode is created and data are rxed and
you remove this device, following panic happens:

NULL pointer dereference at 0000000000000198
IP: [&lt;ffffffffa0196058&gt;] macvlan_handle_frame+0x153/0x1f7 [macvlan]

I'm using following script to trigger this:
&lt;script&gt;
while [ 1 ]
do
	ip link add link e1 name macvtap0 type macvtap mode passthru
	ip link set e1 up
	ip link set macvtap0 up
	IFINDEX=`ip link |grep macvtap0 | cut -f 1 -d ':'`
	cat /dev/tap$IFINDEX  &gt;/dev/null &amp;
	ip link del dev macvtap0
done
&lt;/script&gt;

I run this script while "ping -f" is running on another machine to send
packets to e1 rx.

Reason of the panic is that list_first_entry() is blindly called in
macvlan_handle_frame() even if the list was empty. vlan is set to
incorrect pointer which leads to the crash.

I'm fixing this by protecting port-&gt;vlans list by rcu and by preventing
from getting incorrect pointer in case the list is empty.

Introduced by: commit eb06acdc85585f2 "macvlan: Introduce 'passthru' mode to takeover the underlying device"

Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>if_cablemodem.h: Add parenthesis around ioctl macros</title>
<updated>2013-05-30T13:35:13+00:00</updated>
<author>
<name>Josh Boyer</name>
<email>jwboyer@redhat.com</email>
</author>
<published>2013-05-08T09:45:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=00c76e26b7e25c4f0790b9d4d54253cbf24dce1b'/>
<id>00c76e26b7e25c4f0790b9d4d54253cbf24dce1b</id>
<content type='text'>
[ Upstream commit 4f924b2aa4d3cb30f07e57d6b608838edcbc0d88 ]

Protect the SIOCGCM* ioctl macros with parenthesis.

Reported-by: Paul Wouters &lt;pwouters@redhat.com&gt;
Signed-off-by: Josh Boyer &lt;jwboyer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4f924b2aa4d3cb30f07e57d6b608838edcbc0d88 ]

Protect the SIOCGCM* ioctl macros with parenthesis.

Reported-by: Paul Wouters &lt;pwouters@redhat.com&gt;
Signed-off-by: Josh Boyer &lt;jwboyer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, efivars: firmware bug workarounds should be in platform code</title>
<updated>2013-05-30T13:35:10+00:00</updated>
<author>
<name>Matt Fleming</name>
<email>matt.fleming@intel.com</email>
</author>
<published>2013-03-25T09:14:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=649aeb46c8ff0aa527e73e8f847f9895c835d8cb'/>
<id>649aeb46c8ff0aa527e73e8f847f9895c835d8cb</id>
<content type='text'>
commit a6e4d5a03e9e3587e88aba687d8f225f4f04c792 upstream.

Let's not burden ia64 with checks in the common efivars code that we're not
writing too much data to the variable store. That kind of thing is an x86
firmware bug, plain and simple.

efi_query_variable_store() provides platforms with a wrapper in which they can
perform checks and workarounds for EFI variable storage bugs.

Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Matthew Garrett &lt;mjg59@srcf.ucam.org&gt;
Signed-off-by: Matt Fleming &lt;matt.fleming@intel.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a6e4d5a03e9e3587e88aba687d8f225f4f04c792 upstream.

Let's not burden ia64 with checks in the common efivars code that we're not
writing too much data to the variable store. That kind of thing is an x86
firmware bug, plain and simple.

efi_query_variable_store() provides platforms with a wrapper in which they can
perform checks and workarounds for EFI variable storage bugs.

Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Matthew Garrett &lt;mjg59@srcf.ucam.org&gt;
Signed-off-by: Matt Fleming &lt;matt.fleming@intel.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: fix false timeouts when using wait_event_timeout()</title>
<updated>2013-05-30T13:35:05+00:00</updated>
<author>
<name>Imre Deak</name>
<email>imre.deak@intel.com</email>
</author>
<published>2013-05-24T22:55:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=13fd0ed7f75373d6189c2c8e3dec2601d87c945f'/>
<id>13fd0ed7f75373d6189c2c8e3dec2601d87c945f</id>
<content type='text'>
commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e upstream.

Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak &lt;imre.deak@intel.com&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: "Paul E.  McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e upstream.

Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak &lt;imre.deak@intel.com&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: "Paul E.  McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>virtio_console: fix uapi header</title>
<updated>2013-05-30T13:35:03+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2013-05-17T01:14:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0cbb6f3def0ceedcbef5af335c756cd427d9e8cc'/>
<id>0cbb6f3def0ceedcbef5af335c756cd427d9e8cc</id>
<content type='text'>
commit 6407d75afd08545f2252bb39806ffd3f10c7faac upstream.

uapi should use __u32 not u32.
Fix a macro in virtio_console.h which uses u32.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6407d75afd08545f2252bb39806ffd3f10c7faac upstream.

uapi should use __u32 not u32.
Fix a macro in virtio_console.h which uses u32.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: don't reset nf_trace in nf_reset()</title>
<updated>2013-05-13T14:02:36+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2013-04-05T18:42:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=08d849ac4e7717d1d9ad4a442432d176df44ae43'/>
<id>08d849ac4e7717d1d9ad4a442432d176df44ae43</id>
<content type='text'>
[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ]

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ]

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: count hw_addr syncs so that unsync works  properly.</title>
<updated>2013-05-13T14:02:35+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vyasevic@redhat.com</email>
</author>
<published>2013-04-02T21:10:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c8b2a1ad3630f35394bc23595cfa075d60c81f18'/>
<id>c8b2a1ad3630f35394bc23595cfa075d60c81f18</id>
<content type='text'>
[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ]

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich &lt;vyasevic@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ]

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich &lt;vyasevic@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vm: add vm_iomap_memory() helper function</title>
<updated>2013-05-13T14:02:33+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-04-16T20:45:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4d3c99057cd2c18272dbb730cd31f1cd817d7379'/>
<id>4d3c99057cd2c18272dbb730cd31f1cd817d7379</id>
<content type='text'>
commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e upstream.

Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e upstream.

Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc: sysv shared memory limited to 8TiB</title>
<updated>2013-05-13T14:02:30+00:00</updated>
<author>
<name>Robin Holt</name>
<email>holt@sgi.com</email>
</author>
<published>2013-05-01T02:15:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5a467624b1c06db5a47bf2fe58e41347686af512'/>
<id>5a467624b1c06db5a47bf2fe58e41347686af512</id>
<content type='text'>
commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns-&gt;shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) &gt;&gt; PAGE_SHIFT;
...
 474         if (ns-&gt;shm_tot + numpages &gt; ns-&gt;shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Reported-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns-&gt;shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) &gt;&gt; PAGE_SHIFT;
...
 474         if (ns-&gt;shm_tot + numpages &gt; ns-&gt;shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Reported-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: fix race between jbd2_journal_remove_checkpoint and -&gt;j_commit_callback</title>
<updated>2013-05-13T14:02:14+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2013-04-04T02:06:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3464fc47e51e73a67f3797bc91ccca894d52db98'/>
<id>3464fc47e51e73a67f3797bc91ccca894d52db98</id>
<content type='text'>
commit 794446c6946513c684d448205fbd76fa35f38b72 upstream.

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&amp;journal-&gt;j_list_lock);
                                         -&gt;jbd2_journal_remove_checkpoint()
					   -&gt;jbd2_journal_free_transaction();
					     -&gt;kmem_cache_free(transaction)
  -&gt;j_commit_callback(journal, transaction);
    -&gt; USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev-&gt;next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
 [&lt;ffffffff8106fb0d&gt;] warn_slowpath_common+0xad/0xf0
 [&lt;ffffffff8106fc06&gt;] warn_slowpath_fmt+0x46/0x50
 [&lt;ffffffff813637e9&gt;] ? ext4_journal_commit_callback+0x99/0xc0
 [&lt;ffffffff8148cae0&gt;] __list_del_entry+0x1c0/0x250
 [&lt;ffffffff813637bf&gt;] ext4_journal_commit_callback+0x6f/0xc0
 [&lt;ffffffff813ca336&gt;] jbd2_journal_commit_transaction+0x23a6/0x2570
 [&lt;ffffffff8108aa42&gt;] ? try_to_del_timer_sync+0x82/0xa0
 [&lt;ffffffff8108b491&gt;] ? del_timer_sync+0x91/0x1e0
 [&lt;ffffffff813d3ecf&gt;] kjournald2+0x19f/0x6a0
 [&lt;ffffffff810ad630&gt;] ? wake_up_bit+0x40/0x40
 [&lt;ffffffff813d3d30&gt;] ? bit_spin_lock+0x80/0x80
 [&lt;ffffffff810ac6be&gt;] kthread+0x10e/0x120
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff818ff6ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
[bwh: Backported to 3.2: s/jbd2_journal_free_transaction/kfree/]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 794446c6946513c684d448205fbd76fa35f38b72 upstream.

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&amp;journal-&gt;j_list_lock);
                                         -&gt;jbd2_journal_remove_checkpoint()
					   -&gt;jbd2_journal_free_transaction();
					     -&gt;kmem_cache_free(transaction)
  -&gt;j_commit_callback(journal, transaction);
    -&gt; USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev-&gt;next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
 [&lt;ffffffff8106fb0d&gt;] warn_slowpath_common+0xad/0xf0
 [&lt;ffffffff8106fc06&gt;] warn_slowpath_fmt+0x46/0x50
 [&lt;ffffffff813637e9&gt;] ? ext4_journal_commit_callback+0x99/0xc0
 [&lt;ffffffff8148cae0&gt;] __list_del_entry+0x1c0/0x250
 [&lt;ffffffff813637bf&gt;] ext4_journal_commit_callback+0x6f/0xc0
 [&lt;ffffffff813ca336&gt;] jbd2_journal_commit_transaction+0x23a6/0x2570
 [&lt;ffffffff8108aa42&gt;] ? try_to_del_timer_sync+0x82/0xa0
 [&lt;ffffffff8108b491&gt;] ? del_timer_sync+0x91/0x1e0
 [&lt;ffffffff813d3ecf&gt;] kjournald2+0x19f/0x6a0
 [&lt;ffffffff810ad630&gt;] ? wake_up_bit+0x40/0x40
 [&lt;ffffffff813d3d30&gt;] ? bit_spin_lock+0x80/0x80
 [&lt;ffffffff810ac6be&gt;] kthread+0x10e/0x120
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff818ff6ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
[bwh: Backported to 3.2: s/jbd2_journal_free_transaction/kfree/]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
