<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include, branch v3.9.2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/radeon: add new richland pci ids</title>
<updated>2013-05-11T14:18:35+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2013-04-25T18:06:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0496ac0e46866ec684e2eb105033f4b03b7016a0'/>
<id>0496ac0e46866ec684e2eb105033f4b03b7016a0</id>
<content type='text'>
commit 62d1f92e06aef9665d71ca7e986b3047ecf0b3c7 upstream.

Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 62d1f92e06aef9665d71ca7e986b3047ecf0b3c7 upstream.

Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>drm/radeon: add some new SI PCI ids</title>
<updated>2013-05-11T14:18:33+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2013-04-25T17:55:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f8080cb410b0abcbebefe881ce4e522a6e1f2adc'/>
<id>f8080cb410b0abcbebefe881ce4e522a6e1f2adc</id>
<content type='text'>
commit 18932a28419596bc9403770f5d8a108c5433fe59 upstream.

Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 18932a28419596bc9403770f5d8a108c5433fe59 upstream.

Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>drm/prime: keep a reference from the handle to exported dma-buf (v6)</title>
<updated>2013-05-11T14:18:26+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@gmail.com</email>
</author>
<published>2013-04-21T23:54:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c492a74718dde721fe992f77cec7d0f467188449'/>
<id>c492a74718dde721fe992f77cec7d0f467188449</id>
<content type='text'>
commit 219b47339ced80ca580bb6ce7d1636166984afa7 upstream.

Currently we have a problem with this:
1. i915: create gem object
2. i915: export gem object to prime
3. radeon: import gem object
4. close prime fd
5. radeon: unref object
6. i915: unref object

i915 has an imported object reference in its file priv, that isn't
cleaned up properly until fd close. The reference gets added at step 2,
but at step 6 we don't have enough info to clean it up.

The solution is to take a reference on the dma-buf when we export it,
and drop the reference when the gem handle goes away.

So when we export a dma_buf from a gem object, we keep track of it
with the handle, we take a reference to the dma_buf. When we close
the handle (i.e. userspace is finished with the buffer), we drop
the reference to the dma_buf, and it gets collected.

This patch isn't meant to fix any other problem or bikesheds, and it doesn't
fix any races with other scenarios.

v1.1: move export symbol line back up.

v2: okay I had to do a bit more, as the first patch showed a leak
on one of my tests, that I found using the dma-buf debugfs support,
the problem case is exporting a buffer twice with the same handle,
we'd add another export handle for it unnecessarily, however
we now fail if we try to export the same object with a different gem handle,
however I'm not sure if that is a case I want to support, and I've
gotten the code to WARN_ON if we hit something like that.

v2.1: rebase this patch, write better commit msg.
v3: cleanup error handling, track import vs export in linked list,
these two patches were separate previously, but seem to work better
like this.
v4: danvet is correct, this code is no longer useful, since the buffer
better exist, so remove it.
v5: always take a reference to the dma buf object, import or export.
(Imre Deak contributed this originally)
v6: square the circle, remove import vs export tracking now
that there is no difference

Reviewed-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 219b47339ced80ca580bb6ce7d1636166984afa7 upstream.

Currently we have a problem with this:
1. i915: create gem object
2. i915: export gem object to prime
3. radeon: import gem object
4. close prime fd
5. radeon: unref object
6. i915: unref object

i915 has an imported object reference in its file priv, that isn't
cleaned up properly until fd close. The reference gets added at step 2,
but at step 6 we don't have enough info to clean it up.

The solution is to take a reference on the dma-buf when we export it,
and drop the reference when the gem handle goes away.

So when we export a dma_buf from a gem object, we keep track of it
with the handle, we take a reference to the dma_buf. When we close
the handle (i.e. userspace is finished with the buffer), we drop
the reference to the dma_buf, and it gets collected.

This patch isn't meant to fix any other problem or bikesheds, and it doesn't
fix any races with other scenarios.

v1.1: move export symbol line back up.

v2: okay I had to do a bit more, as the first patch showed a leak
on one of my tests, that I found using the dma-buf debugfs support,
the problem case is exporting a buffer twice with the same handle,
we'd add another export handle for it unnecessarily, however
we now fail if we try to export the same object with a different gem handle,
however I'm not sure if that is a case I want to support, and I've
gotten the code to WARN_ON if we hit something like that.

v2.1: rebase this patch, write better commit msg.
v3: cleanup error handling, track import vs export in linked list,
these two patches were separate previously, but seem to work better
like this.
v4: danvet is correct, this code is no longer useful, since the buffer
better exist, so remove it.
v5: always take a reference to the dma buf object, import or export.
(Imre Deak contributed this originally)
v6: square the circle, remove import vs export tracking now
that there is no difference

Reviewed-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix max discard sectors limit</title>
<updated>2013-05-11T14:18:24+00:00</updated>
<author>
<name>James Bottomley</name>
<email>JBottomley@Parallels.com</email>
</author>
<published>2013-04-24T14:52:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e7fc743df596240d2b69dbf0ef3cd3fb7da08e1'/>
<id>7e7fc743df596240d2b69dbf0ef3cd3fb7da08e1</id>
<content type='text'>
commit 871dd9286e25330c8a581e5dacfa8b1dfe1dd641 upstream.

linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
commit 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9
(block: add plug for blkdev_issue_discard )

For example,
1) DISCARD rq-1 with size size 4GB
2) DISCARD rq-2 with size size 1GB

If these 2 discard requests get merged, final request size will be 5GB.

In this case, request's __data_len field may overflow as it can store
max 4GB(unsigned int).

This issue was observed while doing mkfs.f2fs on 5GB SD card:
https://lkml.org/lkml/2013/4/1/292

Info: sector size = 512
Info: total sectors = 11370496 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
[  257.789764] blk_update_request: bio idx 0 &gt;= vcnt 0

mkfs process gets stuck in D state and I see the following in the dmesg:

[  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
[  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
[  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656
[  257.789764] blk_update_request: bio idx 0 &gt;= vcnt 0
[  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
[  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
[  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656

This patch fixes this issue.

Reported-by: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Signed-off-by: James Bottomley &lt;JBottomley@Parallels.com&gt;
Signed-off-by: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Tested-by: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 871dd9286e25330c8a581e5dacfa8b1dfe1dd641 upstream.

linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
commit 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9
(block: add plug for blkdev_issue_discard )

For example,
1) DISCARD rq-1 with size size 4GB
2) DISCARD rq-2 with size size 1GB

If these 2 discard requests get merged, final request size will be 5GB.

In this case, request's __data_len field may overflow as it can store
max 4GB(unsigned int).

This issue was observed while doing mkfs.f2fs on 5GB SD card:
https://lkml.org/lkml/2013/4/1/292

Info: sector size = 512
Info: total sectors = 11370496 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
[  257.789764] blk_update_request: bio idx 0 &gt;= vcnt 0

mkfs process gets stuck in D state and I see the following in the dmesg:

[  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
[  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
[  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656
[  257.789764] blk_update_request: bio idx 0 &gt;= vcnt 0
[  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
[  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
[  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656

This patch fixes this issue.

Reported-by: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Signed-off-by: James Bottomley &lt;JBottomley@Parallels.com&gt;
Signed-off-by: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Tested-by: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>hugetlbfs: fix mmap failure in unaligned size request</title>
<updated>2013-05-11T14:18:19+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-05-07T23:18:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d65bdce33740070101dd6fb366f1bac003a9a49'/>
<id>7d65bdce33740070101dd6fb366f1bac003a9a49</id>
<content type='text'>
commit af73e4d9506d3b797509f3c030e7dcd554f7d9c4 upstream.

The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.

This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds
alignment code in caller side.  And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: &lt;iceman_dvd@yahoo.com&gt;
Cc: Steven Truelove &lt;steven.truelove@utoronto.ca&gt;
Cc: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit af73e4d9506d3b797509f3c030e7dcd554f7d9c4 upstream.

The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.

This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds
alignment code in caller side.  And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: &lt;iceman_dvd@yahoo.com&gt;
Cc: Steven Truelove &lt;steven.truelove@utoronto.ca&gt;
Cc: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: fix race between jbd2_journal_remove_checkpoint and -&gt;j_commit_callback</title>
<updated>2013-05-08T03:33:14+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2013-04-04T02:06:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ec60dced3d110599d5f8e4d0b9d2b0cc60ded9bc'/>
<id>ec60dced3d110599d5f8e4d0b9d2b0cc60ded9bc</id>
<content type='text'>
commit 794446c6946513c684d448205fbd76fa35f38b72 upstream.

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&amp;journal-&gt;j_list_lock);
                                         -&gt;jbd2_journal_remove_checkpoint()
					   -&gt;jbd2_journal_free_transaction();
					     -&gt;kmem_cache_free(transaction)
  -&gt;j_commit_callback(journal, transaction);
    -&gt; USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev-&gt;next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
 [&lt;ffffffff8106fb0d&gt;] warn_slowpath_common+0xad/0xf0
 [&lt;ffffffff8106fc06&gt;] warn_slowpath_fmt+0x46/0x50
 [&lt;ffffffff813637e9&gt;] ? ext4_journal_commit_callback+0x99/0xc0
 [&lt;ffffffff8148cae0&gt;] __list_del_entry+0x1c0/0x250
 [&lt;ffffffff813637bf&gt;] ext4_journal_commit_callback+0x6f/0xc0
 [&lt;ffffffff813ca336&gt;] jbd2_journal_commit_transaction+0x23a6/0x2570
 [&lt;ffffffff8108aa42&gt;] ? try_to_del_timer_sync+0x82/0xa0
 [&lt;ffffffff8108b491&gt;] ? del_timer_sync+0x91/0x1e0
 [&lt;ffffffff813d3ecf&gt;] kjournald2+0x19f/0x6a0
 [&lt;ffffffff810ad630&gt;] ? wake_up_bit+0x40/0x40
 [&lt;ffffffff813d3d30&gt;] ? bit_spin_lock+0x80/0x80
 [&lt;ffffffff810ac6be&gt;] kthread+0x10e/0x120
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff818ff6ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 794446c6946513c684d448205fbd76fa35f38b72 upstream.

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&amp;journal-&gt;j_list_lock);
                                         -&gt;jbd2_journal_remove_checkpoint()
					   -&gt;jbd2_journal_free_transaction();
					     -&gt;kmem_cache_free(transaction)
  -&gt;j_commit_callback(journal, transaction);
    -&gt; USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev-&gt;next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
 [&lt;ffffffff8106fb0d&gt;] warn_slowpath_common+0xad/0xf0
 [&lt;ffffffff8106fc06&gt;] warn_slowpath_fmt+0x46/0x50
 [&lt;ffffffff813637e9&gt;] ? ext4_journal_commit_callback+0x99/0xc0
 [&lt;ffffffff8148cae0&gt;] __list_del_entry+0x1c0/0x250
 [&lt;ffffffff813637bf&gt;] ext4_journal_commit_callback+0x6f/0xc0
 [&lt;ffffffff813ca336&gt;] jbd2_journal_commit_transaction+0x23a6/0x2570
 [&lt;ffffffff8108aa42&gt;] ? try_to_del_timer_sync+0x82/0xa0
 [&lt;ffffffff8108b491&gt;] ? del_timer_sync+0x91/0x1e0
 [&lt;ffffffff813d3ecf&gt;] kjournald2+0x19f/0x6a0
 [&lt;ffffffff810ad630&gt;] ? wake_up_bit+0x40/0x40
 [&lt;ffffffff813d3d30&gt;] ? bit_spin_lock+0x80/0x80
 [&lt;ffffffff810ac6be&gt;] kthread+0x10e/0x120
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff818ff6ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff810ac5b0&gt;] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4/jbd2: don't wait (forever) for stale tid caused by wraparound</title>
<updated>2013-05-08T03:33:14+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2013-04-04T02:02:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bf170962d68d20f75c1db40c3fb3727c479424e3'/>
<id>bf170962d68d20f75c1db40c3fb3727c479424e3</id>
<content type='text'>
commit d76a3a77113db020d9bb1e894822869410450bd9 upstream.

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time.  To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started.  This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reported-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Reported-by: George Barnett &lt;gbarnett@atlassian.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d76a3a77113db020d9bb1e894822869410450bd9 upstream.

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time.  To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started.  This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reported-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Reported-by: George Barnett &lt;gbarnett@atlassian.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ipc: sysv shared memory limited to 8TiB</title>
<updated>2013-05-08T03:33:14+00:00</updated>
<author>
<name>Robin Holt</name>
<email>holt@sgi.com</email>
</author>
<published>2013-05-01T02:15:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c10d1bc5fb9601b375ded6fe4b383284bf86e07f'/>
<id>c10d1bc5fb9601b375ded6fe4b383284bf86e07f</id>
<content type='text'>
commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns-&gt;shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) &gt;&gt; PAGE_SHIFT;
...
 474         if (ns-&gt;shm_tot + numpages &gt; ns-&gt;shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Reported-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns-&gt;shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) &gt;&gt; PAGE_SHIFT;
...
 474         if (ns-&gt;shm_tot + numpages &gt; ns-&gt;shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Reported-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>exec: do not abuse -&gt;cred_guard_mutex in threadgroup_lock()</title>
<updated>2013-05-08T03:33:12+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-04-30T22:28:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0565d71131f1e4d6a7d3eaf338dab5431a4fbc81'/>
<id>0565d71131f1e4d6a7d3eaf338dab5431a4fbc81</id>
<content type='text'>
commit e56fb2874015370e3b7f8d85051f6dce26051df9 upstream.

threadgroup_lock() takes signal-&gt;cred_guard_mutex to ensure that
thread_group_leader() is stable.  This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

	do_execve:		cred_guard_mutex -&gt; i_mutex
	cgroup_mount:		i_mutex -&gt; cgroup_mutex
	attach_task_by_pid:	cgroup_mutex -&gt; cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
-&gt;cred_guard_mutex.

Note that de_thread() can't sleep with -&gt;group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e56fb2874015370e3b7f8d85051f6dce26051df9 upstream.

threadgroup_lock() takes signal-&gt;cred_guard_mutex to ensure that
thread_group_leader() is stable.  This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

	do_execve:		cred_guard_mutex -&gt; i_mutex
	cgroup_mount:		i_mutex -&gt; cgroup_mutex
	attach_task_by_pid:	cgroup_mutex -&gt; cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
-&gt;cred_guard_mutex.

Note that de_thread() can't sleep with -&gt;group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: fix broken file xattrs</title>
<updated>2013-05-08T03:33:11+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2013-04-19T06:09:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ae37359690c7d3dad9504041c0c414e004c9a275'/>
<id>ae37359690c7d3dad9504041c0c414e004c9a275</id>
<content type='text'>
commit 712317ad97f41e738e1a19aa0a6392a78a84094e upstream.

We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.

For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.

Alexey Kodanev reported a crash, which can be reproduced:

  # mount -t cgroup -o xattr /sys/fs/cgroup
  # mkdir /sys/fs/cgroup/test
  # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
  # rmdir /sys/fs/cgroup/test
  # umount /sys/fs/cgroup
  oops!

In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.

tj: Dropped unused local variable @cft from cgroup_diput().

Reported-by: Alexey Kodanev &lt;alexey.kodanev@oracle.com&gt;
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 712317ad97f41e738e1a19aa0a6392a78a84094e upstream.

We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.

For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.

Alexey Kodanev reported a crash, which can be reproduced:

  # mount -t cgroup -o xattr /sys/fs/cgroup
  # mkdir /sys/fs/cgroup/test
  # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
  # rmdir /sys/fs/cgroup/test
  # umount /sys/fs/cgroup
  oops!

In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.

tj: Dropped unused local variable @cft from cgroup_diput().

Reported-by: Alexey Kodanev &lt;alexey.kodanev@oracle.com&gt;
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
