<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/security/device_cgroup.c, branch v3.14.57</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>device_cgroup: check if exception removal is allowed</title>
<updated>2014-06-07T17:28:19+00:00</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-05-05T15:18:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7ce60f41fa20ab7b08eef3d972804f7efe7938c3'/>
<id>7ce60f41fa20ab7b08eef3d972804f7efe7938c3</id>
<content type='text'>
commit d2c2b11cfa134f4fbdcc34088824da26a084d8de upstream.

[PATCH v3 1/2] device_cgroup: check if exception removal is allowed

When the device cgroup hierarchy was introduced in
	bd2953ebbb53 - devcg: propagate local changes down the hierarchy

a specific case was overlooked. Consider the hierarchy bellow:

	A	default policy: ALLOW, exceptions will deny access
	 \
	  B	default policy: ALLOW, exceptions will deny access

There's no need to verify when an new exception is added to B because
in this case exceptions will deny access to further devices, which is
always fine. Hierarchy in device cgroup only makes sure B won't have
more access than A.

But when an exception is removed (by writing devices.allow), it isn't
checked if the user is in fact removing an inherited exception from A,
thus giving more access to B.

Example:

	# echo 'a' &gt;A/devices.allow
	# echo 'c 1:3 rw' &gt;A/devices.deny
	# echo $$ &gt;A/B/tasks
	# echo &gt;/dev/null
	-bash: /dev/null: Operation not permitted
	# echo 'c 1:3 w' &gt;A/B/devices.allow
	# echo &gt;/dev/null
	#

This shouldn't be allowed and this patch fixes it by making sure to never allow
exceptions in this case to be removed if the exception is partially or fully
present on the parent.

v3: missing '*' in function description
v2: improved log message and formatting fixes

Cc: cgroups@vger.kernel.org
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d2c2b11cfa134f4fbdcc34088824da26a084d8de upstream.

[PATCH v3 1/2] device_cgroup: check if exception removal is allowed

When the device cgroup hierarchy was introduced in
	bd2953ebbb53 - devcg: propagate local changes down the hierarchy

a specific case was overlooked. Consider the hierarchy bellow:

	A	default policy: ALLOW, exceptions will deny access
	 \
	  B	default policy: ALLOW, exceptions will deny access

There's no need to verify when an new exception is added to B because
in this case exceptions will deny access to further devices, which is
always fine. Hierarchy in device cgroup only makes sure B won't have
more access than A.

But when an exception is removed (by writing devices.allow), it isn't
checked if the user is in fact removing an inherited exception from A,
thus giving more access to B.

Example:

	# echo 'a' &gt;A/devices.allow
	# echo 'c 1:3 rw' &gt;A/devices.deny
	# echo $$ &gt;A/B/tasks
	# echo &gt;/dev/null
	-bash: /dev/null: Operation not permitted
	# echo 'c 1:3 w' &gt;A/B/devices.allow
	# echo &gt;/dev/null
	#

This shouldn't be allowed and this patch fixes it by making sure to never allow
exceptions in this case to be removed if the exception is partially or fully
present on the parent.

v3: missing '*' in function description
v2: improved log message and formatting fixes

Cc: cgroups@vger.kernel.org
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>device_cgroup: rework device access check and exception checking</title>
<updated>2014-06-07T17:28:19+00:00</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-04-21T16:13:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d838324fa8a4a328fa760066beba3a7dcb24832'/>
<id>6d838324fa8a4a328fa760066beba3a7dcb24832</id>
<content type='text'>
commit 79d719749d23234e9b725098aa49133f3ef7299d upstream.

Whenever a device file is opened and checked against current device
cgroup rules, it uses the same function (may_access()) as when a new
exception rule is added by writing devices.{allow,deny}. And in both
cases, the algorithm is the same, doesn't matter the behavior.

First problem is having device access to be considered the same as rule
checking. Consider the following structure:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: allow, exceptions disallow access)

A new exception is added to B by writing devices.deny:

	c 12:34 rw

When checking if that exception is allowed in may_access():

	if (dev_cgroup-&gt;behavior == DEVCG_DEFAULT_ALLOW) {
		if (behavior == DEVCG_DEFAULT_ALLOW) {
			/* the exception will deny access to certain devices */
			return true;

Which is ok, since B is not getting more privileges than A, it doesn't
matter and the rule is accepted

Now, consider it's a device file open check and the process belongs to
cgroup B. The access will be generated as:

	behavior: allow
	exception: c 12:34 rw

The very same chunk of code will allow it, even if there's an explicit
exception telling to do otherwise.

A simple test case:

	# mkdir new_group
	# cd new_group
	# echo $$ &gt;tasks
	# echo "c 1:3 w" &gt;devices.deny
	# echo &gt;/dev/null
	# echo $?
	0

This is a serious bug and was introduced on

	c39a2a3018f8 devcg: prepare may_access() for hierarchy support

To solve this problem, the device file open function was split from the
new exception check.

Second problem is how exceptions are processed by may_access(). The
first part of the said function tries to match fully with an existing
exception:

	list_for_each_entry_rcu(ex, &amp;dev_cgroup-&gt;exceptions, list) {
		if ((refex-&gt;type &amp; DEV_BLOCK) &amp;&amp; !(ex-&gt;type &amp; DEV_BLOCK))
			continue;
		if ((refex-&gt;type &amp; DEV_CHAR) &amp;&amp; !(ex-&gt;type &amp; DEV_CHAR))
			continue;
		if (ex-&gt;major != ~0 &amp;&amp; ex-&gt;major != refex-&gt;major)
			continue;
		if (ex-&gt;minor != ~0 &amp;&amp; ex-&gt;minor != refex-&gt;minor)
			continue;
		if (refex-&gt;access &amp; (~ex-&gt;access))
			continue;
		match = true;
		break;
	}

That means the new exception should be contained into an existing one to
be considered a match:

	New exception		Existing	match?	notes
	b 12:34 rwm		b 12:34 rwm	yes
	b 12:34 r		b *:34 rw	yes
	b 12:34 rw		b 12:34 w	no	extra "r"
	b *:34 rw		b 12:34 rw	no	too broad "*"
	b *:34 rw		b *:34 rwm	yes

Which is fine in some cases. Consider:

	A	(default behavior: deny, exceptions allow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case the full match makes sense, the new exception cannot add
more access than the parent allows

But this doesn't always work, consider:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case, a new exception in B shouldn't match any of the exceptions
in A, after all you can't allow something that was forbidden by A. But
consider this scenario:

	New exception	Existing in A	match?	outcome
	b 12:34 rw	b 12:34 r	no	exception is accepted

Because the new exception has "w" as extra, it doesn't match, so it'll
be added to B's exception list.

The same problem can happen during a file access check. Consider a
cgroup with allow as default behavior:

	Access		Exception	match?
	b 12:34 rw	b 12:34 r	no

In this case, the access didn't match any of the exceptions in the
cgroup, which is required since exceptions will disallow access.

To solve this problem, two new functions were created to match an
exception either fully or partially. In the example above, a partial
check will be performed and it'll produce a match since at least
"b 12:34 r" from "b 12:34 rw" access matches.

Cc: cgroups@vger.kernel.org
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 79d719749d23234e9b725098aa49133f3ef7299d upstream.

Whenever a device file is opened and checked against current device
cgroup rules, it uses the same function (may_access()) as when a new
exception rule is added by writing devices.{allow,deny}. And in both
cases, the algorithm is the same, doesn't matter the behavior.

First problem is having device access to be considered the same as rule
checking. Consider the following structure:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: allow, exceptions disallow access)

A new exception is added to B by writing devices.deny:

	c 12:34 rw

When checking if that exception is allowed in may_access():

	if (dev_cgroup-&gt;behavior == DEVCG_DEFAULT_ALLOW) {
		if (behavior == DEVCG_DEFAULT_ALLOW) {
			/* the exception will deny access to certain devices */
			return true;

Which is ok, since B is not getting more privileges than A, it doesn't
matter and the rule is accepted

Now, consider it's a device file open check and the process belongs to
cgroup B. The access will be generated as:

	behavior: allow
	exception: c 12:34 rw

The very same chunk of code will allow it, even if there's an explicit
exception telling to do otherwise.

A simple test case:

	# mkdir new_group
	# cd new_group
	# echo $$ &gt;tasks
	# echo "c 1:3 w" &gt;devices.deny
	# echo &gt;/dev/null
	# echo $?
	0

This is a serious bug and was introduced on

	c39a2a3018f8 devcg: prepare may_access() for hierarchy support

To solve this problem, the device file open function was split from the
new exception check.

Second problem is how exceptions are processed by may_access(). The
first part of the said function tries to match fully with an existing
exception:

	list_for_each_entry_rcu(ex, &amp;dev_cgroup-&gt;exceptions, list) {
		if ((refex-&gt;type &amp; DEV_BLOCK) &amp;&amp; !(ex-&gt;type &amp; DEV_BLOCK))
			continue;
		if ((refex-&gt;type &amp; DEV_CHAR) &amp;&amp; !(ex-&gt;type &amp; DEV_CHAR))
			continue;
		if (ex-&gt;major != ~0 &amp;&amp; ex-&gt;major != refex-&gt;major)
			continue;
		if (ex-&gt;minor != ~0 &amp;&amp; ex-&gt;minor != refex-&gt;minor)
			continue;
		if (refex-&gt;access &amp; (~ex-&gt;access))
			continue;
		match = true;
		break;
	}

That means the new exception should be contained into an existing one to
be considered a match:

	New exception		Existing	match?	notes
	b 12:34 rwm		b 12:34 rwm	yes
	b 12:34 r		b *:34 rw	yes
	b 12:34 rw		b 12:34 w	no	extra "r"
	b *:34 rw		b 12:34 rw	no	too broad "*"
	b *:34 rw		b *:34 rwm	yes

Which is fine in some cases. Consider:

	A	(default behavior: deny, exceptions allow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case the full match makes sense, the new exception cannot add
more access than the parent allows

But this doesn't always work, consider:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case, a new exception in B shouldn't match any of the exceptions
in A, after all you can't allow something that was forbidden by A. But
consider this scenario:

	New exception	Existing in A	match?	outcome
	b 12:34 rw	b 12:34 r	no	exception is accepted

Because the new exception has "w" as extra, it doesn't match, so it'll
be added to B's exception list.

The same problem can happen during a file access check. Consider a
cgroup with allow as default behavior:

	Access		Exception	match?
	b 12:34 rw	b 12:34 r	no

In this case, the access didn't match any of the exceptions in the
cgroup, which is required since exceptions will disallow access.

To solve this problem, two new functions were created to match an
exception either fully or partially. In the example above, a partial
check will be performed and it'll produce a match since at least
"b 12:34 r" from "b 12:34 rw" access matches.

Cc: cgroups@vger.kernel.org
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: replace cftype-&gt;read_seq_string() with cftype-&gt;seq_show()</title>
<updated>2013-12-05T17:28:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-12-05T17:28:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2da8ca822d49c8b8781800ad155aaa00e7bb5f1a'/>
<id>2da8ca822d49c8b8781800ad155aaa00e7bb5f1a</id>
<content type='text'>
In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs.  This patch
replaces cftype-&gt;read_seq_string() with cftype-&gt;seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.

The conversions are mechanical.  As -&gt;seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively.  In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.

This patch does not introduce any behavior changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Neil Horman &lt;nhorman@tuxdriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs.  This patch
replaces cftype-&gt;read_seq_string() with cftype-&gt;seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.

The conversions are mechanical.  As -&gt;seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively.  In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.

This patch does not introduce any behavior changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Neil Horman &lt;nhorman@tuxdriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>device_cgroup: remove can_attach</title>
<updated>2013-10-24T10:56:56+00:00</updated>
<author>
<name>Serge Hallyn</name>
<email>serge.hallyn@ubuntu.com</email>
</author>
<published>2013-10-22T23:34:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=73ba353471e0b692f398f3d63018b7f46ccf1d3e'/>
<id>73ba353471e0b692f398f3d63018b7f46ccf1d3e</id>
<content type='text'>
It is really only wanting to duplicate a check which is already done by the
cgroup subsystem.

With this patch, user jdoe still cannot move pid 1 into a devices cgroup
he owns, but now he can move his own other tasks into devices cgroups.

Signed-off-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Aristeu Rozanski &lt;aris@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is really only wanting to duplicate a check which is already done by the
cgroup subsystem.

With this patch, user jdoe still cannot move pid 1 into a devices cgroup
he owns, but now he can move his own other tasks into devices cgroups.

Signed-off-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Aristeu Rozanski &lt;aris@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: make css_for_each_descendant() and friends include the origin css in the iteration</title>
<updated>2013-08-09T00:11:27+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bd8815a6d802fc16a7a106e170593aa05dc17e72'/>
<id>bd8815a6d802fc16a7a106e170593aa05dc17e72</id>
<content type='text'>
Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration.  The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.

While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.

The conversions are mostly straight-forward.  If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration.  If not, if (pos == origin) continue; is added.  Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest.  While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration.  The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.

While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.

The conversions are mostly straight-forward.  If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration.  If not, if (pos == origin) continue; is added.  Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest.  While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup</title>
<updated>2013-08-09T00:11:25+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=492eb21b98f88e411a8bb43d6edcd7d7022add10'/>
<id>492eb21b98f88e411a8bb43d6edcd7d7022add10</id>
<content type='text'>
cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API.  For hierarchy iterators, this is beneficial because

* In most cases, css is the only thing subsystems care about anyway.

* On the planned unified hierarchy, iterations for different
  subsystems will need to skip over different subtrees of the
  hierarchy depending on which subsystems are enabled on each cgroup.
  Passing around css makes it unnecessary to explicitly specify the
  subsystem in question as css is intersection between cgroup and
  subsystem

* For the planned unified hierarchy, css's would need to be created
  and destroyed dynamically independent from cgroup hierarchy.  Having
  cgroup core manage css iteration makes enforcing deref rules a lot
  easier.

Most subsystem conversions are straight-forward.  Noteworthy changes
are

* blkio: cgroup_to_blkcg() is no longer used.  Removed.

* freezer: cgroup_freezer() is no longer used.  Removed.

* devices: cgroup_to_devcgroup() is no longer used.  Removed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API.  For hierarchy iterators, this is beneficial because

* In most cases, css is the only thing subsystems care about anyway.

* On the planned unified hierarchy, iterations for different
  subsystems will need to skip over different subtrees of the
  hierarchy depending on which subsystems are enabled on each cgroup.
  Passing around css makes it unnecessary to explicitly specify the
  subsystem in question as css is intersection between cgroup and
  subsystem

* For the planned unified hierarchy, css's would need to be created
  and destroyed dynamically independent from cgroup hierarchy.  Having
  cgroup core manage css iteration makes enforcing deref rules a lot
  easier.

Most subsystem conversions are straight-forward.  Noteworthy changes
are

* blkio: cgroup_to_blkcg() is no longer used.  Removed.

* freezer: cgroup_freezer() is no longer used.  Removed.

* devices: cgroup_to_devcgroup() is no longer used.  Removed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: pass around cgroup_subsys_state instead of cgroup in file methods</title>
<updated>2013-08-09T00:11:24+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=182446d087906de40e514573a92a97b203695f71'/>
<id>182446d087906de40e514573a92a97b203695f71</id>
<content type='text'>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup.  cftypes for the cgroup core files don't have their subsytem
pointer set.  These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
  of @cgroup for consistency.  This will make the code look simpler
  too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
  vmpressure while mem_cgroup_from_cont() can be made static.
  Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left.  Removed.

* cpuacct: cgroup_ca() doesn't have any user left.  Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
  Removed.

* net_cls: cgrp_cls_state() doesn't have any user left.  Removed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup.  cftypes for the cgroup core files don't have their subsytem
pointer set.  These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
  of @cgroup for consistency.  This will make the code look simpler
  too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
  vmpressure while mem_cgroup_from_cont() can be made static.
  Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left.  Removed.

* cpuacct: cgroup_ca() doesn't have any user left.  Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
  Removed.

* net_cls: cgrp_cls_state() doesn't have any user left.  Removed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods</title>
<updated>2013-08-09T00:11:23+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=eb95419b023abacb415e2a18fea899023ce7624d'/>
<id>eb95419b023abacb415e2a18fea899023ce7624d</id>
<content type='text'>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
  unbound from cgroups and thus css's (cgroup_subsys_state) may be
  created and destroyed dynamically over the lifetime of a cgroup,
  which is different from the current state where all css's are
  allocated and destroyed together with the associated cgroup.  This
  in turn means that cgroup_css() should be synchronized and may
  return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
  hierarchy means that the task and descendant iterators should behave
  differently depending on the specific subsystem the iteration is
  being performed for.

* In majority of the cases, subsystems only care about its part in the
  cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
  often obtain the matching css pointer from the cgroup and don't
  bother with the cgroup pointer itself.  Passing around css fits
  much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup.  The conversions are mostly straight-forward.  A few
noteworthy changes are

* -&gt;css_alloc() now takes css of the parent cgroup rather than the
  pointer to the new cgroup as the css for the new cgroup doesn't
  exist yet.  Knowing the parent css is enough for all the existing
  subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
  dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp-&gt;subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that -&gt;css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too.  Suggested
    by Li Zefan.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
  unbound from cgroups and thus css's (cgroup_subsys_state) may be
  created and destroyed dynamically over the lifetime of a cgroup,
  which is different from the current state where all css's are
  allocated and destroyed together with the associated cgroup.  This
  in turn means that cgroup_css() should be synchronized and may
  return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
  hierarchy means that the task and descendant iterators should behave
  differently depending on the specific subsystem the iteration is
  being performed for.

* In majority of the cases, subsystems only care about its part in the
  cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
  often obtain the matching css pointer from the cgroup and don't
  bother with the cgroup pointer itself.  Passing around css fits
  much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup.  The conversions are mostly straight-forward.  A few
noteworthy changes are

* -&gt;css_alloc() now takes css of the parent cgroup rather than the
  pointer to the new cgroup as the css for the new cgroup doesn't
  exist yet.  Knowing the parent css is enough for all the existing
  subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
  dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp-&gt;subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that -&gt;css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too.  Suggested
    by Li Zefan.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: add css_parent()</title>
<updated>2013-08-09T00:11:23+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6387698699afd72d6304566fb6ccf84bffe07c56'/>
<id>6387698699afd72d6304566fb6ccf84bffe07c56</id>
<content type='text'>
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css.  cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent.  The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup-&gt;parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
  parent_ca().  The only difference between the two was NULL test on
  cgroup-&gt;parent which is now embedded in css_parent() making the
  distinction moot.  Note that eventually a css-&gt;parent field will be
  added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css.  cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent.  The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup-&gt;parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
  parent_ca().  The only difference between the two was NULL test on
  cgroup-&gt;parent which is now embedded in css_parent() making the
  distinction moot.  Note that eventually a css-&gt;parent field will be
  added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: add/update accessors which obtain subsys specific data from css</title>
<updated>2013-08-09T00:11:23+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-09T00:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a7c6d554aa01236ac2a9f851ab0f75704f76dfa2'/>
<id>a7c6d554aa01236ac2a9f851ab0f75704f76dfa2</id>
<content type='text'>
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure.  Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast.  As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.

All accessors explicitly handle NULL input and return NULL in those
cases.  While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.

* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
  accessor.  Added.

* memory, hugetlb and devices already had one but didn't explicitly
  handle NULL input.  Updated.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure.  Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast.  As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.

All accessors explicitly handle NULL input and return NULL in those
cases.  While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.

* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
  accessor.  Added.

* memory, hugetlb and devices already had one but didn't explicitly
  handle NULL input.  Updated.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
