<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/raid, branch v2.6.27.50</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>md: delay notification of 'active_idle' to the recovery thread</title>
<updated>2008-07-23T20:09:48+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2008-07-23T20:09:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d8e64406a037a64444175730294e449c9e21f5ec'/>
<id>d8e64406a037a64444175730294e449c9e21f5ec</id>
<content type='text'>
sysfs_notify might sleep, so do not call it from md_safemode_timeout.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sysfs_notify might sleep, so do not call it from md_safemode_timeout.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Protect access to mddev-&gt;disks list using RCU</title>
<updated>2008-07-21T07:05:25+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2008-07-21T07:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4b80991c6cb9efa607bc4fd6f3ecdf5511c31bb0'/>
<id>4b80991c6cb9efa607bc4fd6f3ecdf5511c31bb0</id>
<content type='text'>
All modifications and most access to the mddev-&gt;disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that -&gt;same_set is sometimes used for lists other than
mddev-&gt;list (e.g. candidates).  In these cases rcu is not needed.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All modifications and most access to the mddev-&gt;disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that -&gt;same_set is sometimes used for lists other than
mddev-&gt;list (e.g. candidates).  In these cases rcu is not needed.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: only count actual openers as access which prevent a 'stop'</title>
<updated>2008-07-21T07:05:25+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2008-07-21T07:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f2ea68cf42aafdd93393b6b8b20fc3c2b5f4390c'/>
<id>f2ea68cf42aafdd93393b6b8b20fc3c2b5f4390c</id>
<content type='text'>
Open isn't the only thing that increments -&gt;active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Open isn't the only thing that increments -&gt;active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: linear: Make array_size sector-based and rename it to array_sectors.</title>
<updated>2008-07-21T07:05:25+00:00</updated>
<author>
<name>Andre Noll</name>
<email>maan@systemlinux.org</email>
</author>
<published>2008-07-21T07:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d6e2215052810678bc9782fd980b52706fc71f50'/>
<id>d6e2215052810678bc9782fd980b52706fc71f50</id>
<content type='text'>
Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Make mddev-&gt;array_size sector-based.</title>
<updated>2008-07-21T07:05:22+00:00</updated>
<author>
<name>Andre Noll</name>
<email>maan@systemlinux.org</email>
</author>
<published>2008-07-21T07:05:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f233ea5c9e0d8b95e4283bf6a3436b88f6fd3586'/>
<id>f233ea5c9e0d8b95e4283bf6a3436b88f6fd3586</id>
<content type='text'>
This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Remove some unused macros.</title>
<updated>2008-07-11T12:02:23+00:00</updated>
<author>
<name>Andre Noll</name>
<email>maan@systemlinux.org</email>
</author>
<published>2008-07-11T12:02:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e93a89251d4ed7bd4475db62616ccd03ddfd01a'/>
<id>7e93a89251d4ed7bd4475db62616ccd03ddfd01a</id>
<content type='text'>
Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Turn rdev-&gt;sb_offset into a sector-based quantity.</title>
<updated>2008-07-11T12:02:23+00:00</updated>
<author>
<name>Andre Noll</name>
<email>maan@systemlinux.org</email>
</author>
<published>2008-07-11T12:02:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0f420358e3a2abc028320ace7783e2e38cae77bf'/>
<id>0f420358e3a2abc028320ace7783e2e38cae77bf</id>
<content type='text'>
Rename it to sb_start to make sure all users have been converted.

Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename it to sb_start to make sure all users have been converted.

Signed-off-by: Andre Noll &lt;maan@systemlinux.org&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: resolve external metadata handling deadlock in md_allow_write</title>
<updated>2008-07-01T00:18:19+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2008-06-28T04:44:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b5470dc5fc18a8ff6517c3bb538d1479e58ecb02'/>
<id>b5470dc5fc18a8ff6517c3bb538d1479e58ecb02</id>
<content type='text'>
md_allow_write() marks the metadata dirty while holding mddev-&gt;lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
md_allow_write() marks the metadata dirty while holding mddev-&gt;lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states</title>
<updated>2008-06-27T22:32:06+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2008-06-27T22:32:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d8ee0728b5b30d7a6f62c399a95e953616d31f23'/>
<id>d8ee0728b5b30d7a6f62c399a95e953616d31f23</id>
<content type='text'>
From: Dan Williams &lt;dan.j.williams@intel.com&gt;

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases.  This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain.  The presence of the prexor operation is tracked in
sh-&gt;reconstruct_state.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
From: Dan Williams &lt;dan.j.williams@intel.com&gt;

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases.  This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain.  The presence of the prexor operation is tracked in
sh-&gt;reconstruct_state.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: replace STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} with 'reconstruct_states'</title>
<updated>2008-06-27T22:32:05+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2008-06-27T22:32:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=600aa10993012ff2dd5617720dac081e4f992017'/>
<id>600aa10993012ff2dd5617720dac081e4f992017</id>
<content type='text'>
From: Dan Williams &lt;dan.j.williams@intel.com&gt;

Track the state of reconstruct operations (recalculating the parity block
usually due to incoming writes, or as part of array expansion)  Reduces the
scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether
a reconstruct operation has been requested via the ops_request field of struct
stripe_head_state.

This is the final step in the removal of ops.{pending,ack,complete,count}, i.e.
the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do
not track the state of the operation.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
From: Dan Williams &lt;dan.j.williams@intel.com&gt;

Track the state of reconstruct operations (recalculating the parity block
usually due to incoming writes, or as part of array expansion)  Reduces the
scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether
a reconstruct operation has been requested via the ops_request field of struct
stripe_head_state.

This is the final step in the removal of ops.{pending,ack,complete,count}, i.e.
the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do
not track the state of the operation.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
