<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/cgroup.c, branch v2.6.34.15</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()</title>
<updated>2014-02-10T21:11:06+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-11-19T16:13:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6a5af3b8e1b4f0ca35c39c08c2571ed00074c298'/>
<id>6a5af3b8e1b4f0ca35c39c08c2571ed00074c298</id>
<content type='text'>
commit 175431635ec09b1d1bba04979b006b99e8305a83 upstream.

cgroup_create_dir() does weird dancing with dentry refcnt.  On
success, it gets and then puts it achieving nothing.  On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

  ------------[ cut here ]------------
  kernel BUG at /work/os/work/fs/dcache.c:552!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU 2
  Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
  RIP: 0010:[&lt;ffffffff811d9c0c&gt;]  [&lt;ffffffff811d9c0c&gt;] dput+0x1dc/0x1e0
  RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
  RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
  RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
  R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
  R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
  FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
  Stack:
   ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
   ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
   ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
  Call Trace:
   [&lt;ffffffff811cc889&gt;] done_path_create+0x19/0x50
   [&lt;ffffffff811d1fc9&gt;] sys_mkdirat+0x59/0x80
   [&lt;ffffffff811d2009&gt;] sys_mkdir+0x19/0x20
   [&lt;ffffffff81be1e02&gt;] system_call_fastpath+0x16/0x1b
  Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 &lt;0f&gt; 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
  RIP  [&lt;ffffffff811d9c0c&gt;] dput+0x1dc/0x1e0
   RSP &lt;ffff88001a3ebef8&gt;
  ---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 175431635ec09b1d1bba04979b006b99e8305a83 upstream.

cgroup_create_dir() does weird dancing with dentry refcnt.  On
success, it gets and then puts it achieving nothing.  On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

  ------------[ cut here ]------------
  kernel BUG at /work/os/work/fs/dcache.c:552!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU 2
  Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
  RIP: 0010:[&lt;ffffffff811d9c0c&gt;]  [&lt;ffffffff811d9c0c&gt;] dput+0x1dc/0x1e0
  RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
  RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
  RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
  R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
  R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
  FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
  Stack:
   ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
   ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
   ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
  Call Trace:
   [&lt;ffffffff811cc889&gt;] done_path_create+0x19/0x50
   [&lt;ffffffff811d1fc9&gt;] sys_mkdirat+0x59/0x80
   [&lt;ffffffff811d2009&gt;] sys_mkdir+0x19/0x20
   [&lt;ffffffff81be1e02&gt;] system_call_fastpath+0x16/0x1b
  Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 &lt;0f&gt; 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
  RIP  [&lt;ffffffff811d9c0c&gt;] dput+0x1dc/0x1e0
   RSP &lt;ffff88001a3ebef8&gt;
  ---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroups: alloc_css_id() increments hierarchy depth</title>
<updated>2010-07-05T18:22:46+00:00</updated>
<author>
<name>Greg Thelen</name>
<email>gthelen@google.com</email>
</author>
<published>2010-06-04T21:15:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=181e8191c6f3ba70136fe83af6dfa5b90789cd42'/>
<id>181e8191c6f3ba70136fe83af6dfa5b90789cd42</id>
<content type='text'>
commit 94b3dd0f7bb393d93e84a173b1df9b8b64c83ac4 upstream.

Child groups should have a greater depth than their parents.  Prior to
this change, the parent would incorrectly report zero memory usage for
child cgroups when use_hierarchy is enabled.

test script:
  mount -t cgroup none /cgroups -o memory
  cd /cgroups
  mkdir cg1

  echo 1 &gt; cg1/memory.use_hierarchy
  mkdir cg1/cg11

  echo $$ &gt; cg1/cg11/tasks
  dd if=/dev/zero of=/tmp/foo bs=1M count=1

  echo
  echo CHILD
  grep cache cg1/cg11/memory.stat

  echo
  echo PARENT
  grep cache cg1/memory.stat

  echo $$ &gt; tasks
  rmdir cg1/cg11 cg1
  cd /
  umount /cgroups

Using fae9c79, a recent patch that changed alloc_css_id() depth computation,
the parent incorrectly reports zero usage:
  root@ubuntu:~# ./test
  1+0 records in
  1+0 records out
  1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s

  CHILD
  cache 1048576
  total_cache 1048576

  PARENT
  cache 0
  total_cache 0

With this patch, the parent correctly includes child usage:
  root@ubuntu:~# ./test
  1+0 records in
  1+0 records out
  1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s

  CHILD
  cache 1052672
  total_cache 1052672

  PARENT
  cache 0
  total_cache 1052672

Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 94b3dd0f7bb393d93e84a173b1df9b8b64c83ac4 upstream.

Child groups should have a greater depth than their parents.  Prior to
this change, the parent would incorrectly report zero memory usage for
child cgroups when use_hierarchy is enabled.

test script:
  mount -t cgroup none /cgroups -o memory
  cd /cgroups
  mkdir cg1

  echo 1 &gt; cg1/memory.use_hierarchy
  mkdir cg1/cg11

  echo $$ &gt; cg1/cg11/tasks
  dd if=/dev/zero of=/tmp/foo bs=1M count=1

  echo
  echo CHILD
  grep cache cg1/cg11/memory.stat

  echo
  echo PARENT
  grep cache cg1/memory.stat

  echo $$ &gt; tasks
  rmdir cg1/cg11 cg1
  cd /
  umount /cgroups

Using fae9c79, a recent patch that changed alloc_css_id() depth computation,
the parent incorrectly reports zero usage:
  root@ubuntu:~# ./test
  1+0 records in
  1+0 records out
  1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s

  CHILD
  cache 1048576
  total_cache 1048576

  PARENT
  cache 0
  total_cache 0

With this patch, the parent correctly includes child usage:
  root@ubuntu:~# ./test
  1+0 records in
  1+0 records out
  1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s

  CHILD
  cache 1052672
  total_cache 1052672

  PARENT
  cache 0
  total_cache 1052672

Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: fix css_is_ancestor() RCU locking</title>
<updated>2010-05-12T00:33:42+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2010-05-11T21:06:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=747388d78a0ae768fd82b55c4ed38aa646a72364'/>
<id>747388d78a0ae768fd82b55c4ed38aa646a72364</id>
<content type='text'>
Some callers (in memcontrol.c) calls css_is_ancestor() without
rcu_read_lock.  Because css_is_ancestor() has to access RCU protected
data, it should be under rcu_read_lock().

This makes css_is_ancestor() itself does safe access to RCU protected
area.  (At least, "root" can have refcnt==0 if it's not an ancestor of
"child".  So, we need rcu_read_lock().)

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some callers (in memcontrol.c) calls css_is_ancestor() without
rcu_read_lock.  Because css_is_ancestor() has to access RCU protected
data, it should be under rcu_read_lock().

This makes css_is_ancestor() itself does safe access to RCU protected
area.  (At least, "root" can have refcnt==0 if it's not an ancestor of
"child".  So, we need rcu_read_lock().)

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: fix css_id() RCU locking for real</title>
<updated>2010-05-12T00:33:42+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2010-05-11T21:06:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f0f15464185a92f9d8791ad231bcd7bf6df54e4'/>
<id>7f0f15464185a92f9d8791ad231bcd7bf6df54e4</id>
<content type='text'>
Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be
called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
message.  But Andrew Morton pointed out that the fix doesn't seems sane
and it was just for hidining lockdep messages.

This is a patch for do proper things.  Checking again, all places,
accessing without rcu_read_lock, that commit fixies was intentional....
all callers of css_id() has reference count on it.  So, it's not necessary
to be under rcu_read_lock().

Considering again, we can use rcu_dereference_check for css_id().  We know
css-&gt;id is valid if css-&gt;refcnt &gt; 0.  (css-&gt;id never changes and freed
after css-&gt;refcnt going to be 0.)

This patch makes use of rcu_dereference_check() in css_id/depth and remove
unnecessary rcu-read-lock added by the commit.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be
called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
message.  But Andrew Morton pointed out that the fix doesn't seems sane
and it was just for hidining lockdep messages.

This is a patch for do proper things.  Checking again, all places,
accessing without rcu_read_lock, that commit fixies was intentional....
all callers of css_id() has reference count on it.  So, it's not necessary
to be under rcu_read_lock().

Considering again, we can use rcu_dereference_check for css_id().  We know
css-&gt;id is valid if css-&gt;refcnt &gt; 0.  (css-&gt;id never changes and freed
after css-&gt;refcnt going to be 0.)

This patch makes use of rcu_dereference_check() in css_id/depth and remove
unnecessary rcu-read-lock added by the commit.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Fix an RCU warning in alloc_css_id()</title>
<updated>2010-05-04T16:25:00+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2010-04-22T09:30:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fae9c791703606636c1220e47f6690660042ce7f'/>
<id>fae9c791703606636c1220e47f6690660042ce7f</id>
<content type='text'>
With CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o memory xxx /mnt
  # mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css-&gt;id.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o memory xxx /mnt
  # mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css-&gt;id.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Fix an RCU warning in cgroup_path()</title>
<updated>2010-05-04T16:24:59+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2010-04-22T09:29:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9a9686b634acc5cb6b7c601c171ae64af0318a24'/>
<id>9a9686b634acc5cb6b7c601c171ae64af0318a24</id>
<content type='text'>
with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o debug xxx /mnt
  # cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o debug xxx /mnt
  # cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroups: remove duplicate include</title>
<updated>2010-03-24T23:31:19+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2010-03-23T20:35:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9d34706f42f9b8c15185423d9af98d37ba21d011'/>
<id>9d34706f42f9b8c15185423d9af98d37ba21d011</id>
<content type='text'>
commit e6a1105b ("cgroups: subsystem module loading interface") and commit
c50cc752 ("sched, cgroups: Fix module export") result in duplicate
including of module.h

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e6a1105b ("cgroups: subsystem module loading interface") and commit
c50cc752 ("sched, cgroups: Fix module export") result in duplicate
including of module.h

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroups: remove events before destroying subsystem state objects</title>
<updated>2010-03-12T23:52:37+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill@shutemov.name</email>
</author>
<published>2010-03-10T23:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a0a4db548edcce067c1201ef25cf2bc29f32dca4'/>
<id>a0a4db548edcce067c1201ef25cf2bc29f32dca4</id>
<content type='text'>
Events should be removed after rmdir of cgroup directory, but before
destroying subsystem state objects.  Let's take reference to cgroup
directory dentry to do that.

Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hioryu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Events should be removed after rmdir of cgroup directory, but before
destroying subsystem state objects.  Let's take reference to cgroup
directory dentry to do that.

Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hioryu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroups: fix race between userspace and kernelspace</title>
<updated>2010-03-12T23:52:37+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill@shutemov.name</email>
</author>
<published>2010-03-10T23:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4ab78683c17d739c2a2077141dcf81a02b7fb57e'/>
<id>4ab78683c17d739c2a2077141dcf81a02b7fb57e</id>
<content type='text'>
Notify userspace about cgroup removing only after rmdir of cgroup
directory to avoid race between userspace and kernelspace.

eventfd are used to notify about two types of event:
 - control file-specific, like crossing memory threshold;
 - cgroup removing.

To understand what really happen, userspace can check if the cgroup still
exists.  To avoid race beetween userspace and kernelspace we have to
notify userspace about cgroup removing only after rmdir of cgroup
directory.

Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Notify userspace about cgroup removing only after rmdir of cgroup
directory to avoid race between userspace and kernelspace.

eventfd are used to notify about two types of event:
 - control file-specific, like crossing memory threshold;
 - cgroup removing.

To understand what really happen, userspace can check if the cgroup still
exists.  To avoid race beetween userspace and kernelspace we have to
notify userspace about cgroup removing only after rmdir of cgroup
directory.

Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: implement eventfd-based generic API for notifications</title>
<updated>2010-03-12T23:52:37+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill@shutemov.name</email>
</author>
<published>2010-03-10T23:22:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0dea116876eefc9c7ca9c5d74fe665481e499fa3'/>
<id>0dea116876eefc9c7ca9c5d74fe665481e499fa3</id>
<content type='text'>
This patchset introduces eventfd-based API for notifications in cgroups
and implements memory notifications on top of it.

It uses statistics in memory controler to track memory usage.

Output of time(1) on building kernel on tmpfs:

Root cgroup before changes:
	make -j2  506.37 user 60.93s system 193% cpu 4:52.77 total
Non-root cgroup before changes:
	make -j2  507.14 user 62.66s system 193% cpu 4:54.74 total
Root cgroup after changes (0 thresholds):
	make -j2  507.13 user 62.20s system 193% cpu 4:53.55 total
Non-root cgroup after changes (0 thresholds):
	make -j2  507.70 user 64.20s system 193% cpu 4:55.70 total
Root cgroup after changes (1 thresholds, never crossed):
	make -j2  506.97 user 62.20s system 193% cpu 4:53.90 total
Non-root cgroup after changes (1 thresholds, never crossed):
	make -j2  507.55 user 64.08s system 193% cpu 4:55.63 total

This patch:

Introduce the write-only file "cgroup.event_control" in every cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "&lt;event_fd&gt; &lt;control_fd&gt; &lt;args&gt;" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

If you need notification functionality for a control file you have to
implement callbacks register_event() and unregister_event() in the
struct cftype.

[kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Vladislav Buzov &lt;vbuzov@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Alexander Shishkin &lt;virtuoso@slind.org&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patchset introduces eventfd-based API for notifications in cgroups
and implements memory notifications on top of it.

It uses statistics in memory controler to track memory usage.

Output of time(1) on building kernel on tmpfs:

Root cgroup before changes:
	make -j2  506.37 user 60.93s system 193% cpu 4:52.77 total
Non-root cgroup before changes:
	make -j2  507.14 user 62.66s system 193% cpu 4:54.74 total
Root cgroup after changes (0 thresholds):
	make -j2  507.13 user 62.20s system 193% cpu 4:53.55 total
Non-root cgroup after changes (0 thresholds):
	make -j2  507.70 user 64.20s system 193% cpu 4:55.70 total
Root cgroup after changes (1 thresholds, never crossed):
	make -j2  506.97 user 62.20s system 193% cpu 4:53.90 total
Non-root cgroup after changes (1 thresholds, never crossed):
	make -j2  507.55 user 64.08s system 193% cpu 4:55.63 total

This patch:

Introduce the write-only file "cgroup.event_control" in every cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "&lt;event_fd&gt; &lt;control_fd&gt; &lt;args&gt;" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

If you need notification functionality for a control file you have to
implement callbacks register_event() and unregister_event() in the
struct cftype.

[kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Dan Malek &lt;dan@embeddedalley.com&gt;
Cc: Vladislav Buzov &lt;vbuzov@embeddedalley.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Alexander Shishkin &lt;virtuoso@slind.org&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
