linux-toradex.git/fs/exofs/ore_raid.c, branch v4.2.1

Boaz Harrosh - Fix broken email address

2014-10-19T17:22:32+00:00

I no longer have access to the Panasas email.
So change to an email that can always reach me.

Signed-off-by: Boaz Harrosh

fs/exofs/ore_raid.c: replace count*size kzalloc by kcalloc

2014-08-08T22:57:24+00:00

kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick 
Acked-by: Boaz Harrosh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ore: Support for raid 6

2014-05-22T11:48:15+00:00

This simple patch adds support for raid6 to the ORE.
Most operations and calculations where already for the general
case. Only things left:
* call async_gen_syndrome() in the case of raid6
  (NOTE that the raid6 math is the one supported by the Linux Kernel
   see: crypto/async_tx/async_pq.c)
* call _ore_add_parity_unit() twice with only last call generating
  the redundancy pages.

* Fix couple BUGS in old code
  a. In reads when parity==2 it can happen that per_dev->length=0
     but per_dev->offset was set and adjusted by _ore_add_sg_seg().
     Don't let it be overwritten.
  b. The all 'cur_comp > starting_dev' thing to determine if:
       "per_dev->offset is in the current stripe number or the
       next one."
     Was a complete raid5/4 accident. When parity==2 this is not
     at all true usually. All we need to do is increment si->ob_offset
     once we pass by the first parity device.
     (This also greatly simplifies the code, amen)
  c. Calculation of si->dev rotation can overflow when parity==2.

* Then last enable raid6 in ore_verify_layout()

I want to deeply thank Daniel Gryniewicz who found first all the
bugs in the old raid code, and inspired these patches:
	Inspired-by Daniel Gryniewicz 

Signed-off-by: Boaz Harrosh

ore: Remove redundant dev_order(), more cleanups

2014-05-22T11:46:15+00:00

Two cleanups:
* si->cur_comp, si->cur_pg where always calculated after
  the call to ore_calc_stripe_info() with the help of
  _dev_order(...). But these are already calculated by
  ore_calc_stripe_info() and can be just set there.
  (This is left over from the time that si->cur_comp, si->cur_pg
   were only used by raid code, but now the main loop manages
   them anyway even though they are ultimately not used in
   none raid code)

* si->cur_comp - For the very last stripe case, was set inside
  _ore_add_parity_unit(). This is not clear and will be wrong
  for coming raid6 so move this to only caller. Now si->cur_comp
  is only manipulated within _prepare_for_striping(), always next
  to the manipulation of cur_dev.
  Which is much easier to understand and follow.

Signed-off-by: Boaz Harrosh

ore: (trivial) reformat some code

2014-05-22T11:45:21+00:00

rearrange some source lines. Nothing changed.

Signed-off-by: Boaz Harrosh

fs: Mark functions as static in exofs/ore_raid.c

2014-04-03T08:40:55+00:00

Mark functions as static in exofs/ore_raid.c because they are not used
outside this file.

This also eliminates the following warning in exofs/ore_raid.c:
fs/exofs/ore_raid.c:24:14: warning: no previous prototype for _raid_page_alloc [-Wmissing-prototypes]
fs/exofs/ore_raid.c:29:6: warning: no previous prototype for _raid_page_free [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria 
Reviewed-by: Josh Triplett 
Signed-off-by: Boaz Harrosh

block: Add bio_for_each_segment_all()

2013-03-23T21:26:28+00:00

__bio_for_each_segment() iterates bvecs from the specified index
instead of bio->bv_idx.  Currently, the only usage is to walk all the
bvecs after the bio has been advanced by specifying 0 index.

For immutable bvecs, we need to split these apart;
bio_for_each_segment() is going to have a different implementation.
This will also help document the intent of code that's using it -
bio_for_each_segment_all() is only legal to use for code that owns the
bio.

Signed-off-by: Kent Overstreet 
CC: Jens Axboe 
CC: Neil Brown 
CC: Boaz Harrosh

ore: signedness bug in _sp2d_min_pg()

2012-10-03T20:51:51+00:00

This for loop doesn't work correctly when "p" is unsigned.

Signed-off-by: Dan Carpenter

ore: Unlock r4w pages in exact reverse order of locking

2012-07-20T08:49:25+00:00

The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.

I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.

Signed-off-by: Boaz Harrosh

ore: Fix NFS crash by supporting any unaligned RAID IO

2012-07-20T08:45:28+00:00

In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree 
Signed-off-by: Boaz Harrosh