summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2011-07-13cfq-iosched: fix a rcu warningShaohua Li
commit 3181faa85bda3dc3f5e630a1846526c9caaa38e3 upstream. I got a rcu warnning at boot. the ioc->ioc_data is rcu_deferenced, but doesn't hold rcu_read_lock. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13cfq-iosched: fix locking around ioc->ioc_data assignmentJens Axboe
commit ab4bd22d3cce6977dc039664cc2d052e3147d662 upstream. Since we are modifying this RCU pointer, we need to hold the lock protecting it around it. This fixes a potential reuse and double free of a cfq io_context structure. The bug has been in CFQ for a long time, it hit very few people but those it did hit seemed to see it a lot. Tracked in RH bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=577968 Credit goes to Paul Bolle for figuring out that the issue was around the one-hit ioc->ioc_data cache. Thanks to his hard work the issue is now fixed. Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23block: export blk_{get,put}_queue()Jens Axboe
commit d86e0e83b32bc84600adb0b6ea1fce389b266682 upstream. We need them in SCSI to fix a bug, but currently they are not exported to modules. Export them. Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23block: add proper state guards to __elv_next_requestJames Bottomley
commit 0a58e077eb600d1efd7e54ad9926a75a39d7f8ae upstream. blk_cleanup_queue() calls elevator_exit() and after this, we can't touch the elevator without oopsing. __elv_next_request() must check for this state because in the refcounted queue model, we can still call it after blk_cleanup_queue() has been called. This was reported as causing an oops attributable to scsi. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09block, blk-sysfs: Fix an err return path in blk_register_queue()Liu Yuan
commit ed5302d3c25006a9edc7a7fbea97a30483f89ef7 upstream. We do not call blk_trace_remove_sysfs() in err return path if kobject_add() fails. This path fixes it. Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits insteadMartin K. Petersen
commit e692cb668fdd5a712c6ed2a2d6f2a36ee83997b4 upstream. When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09block: check for proper length of iov entries in blk_rq_map_user_iov()Jens Axboe
commit 9284bcf4e335e5f18a8bc7b26461c33ab60d0689 upstream. Ensure that we pass down properly validated iov segments before calling into the mapping or copy functions. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09block: take care not to overflow when calculating total iov lengthJens Axboe
commit 9f864c80913467312c7b8690e41fb5ebd1b50e92 upstream. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09block: Ensure physical block size is unsigned intMartin K. Petersen
commit 892b6f90db81cccb723d5d92f4fddc2d68b206e1 upstream. Physical block size was declared unsigned int to accomodate the maximum size reported by READ CAPACITY(16). Make sure we use the right type in the related functions. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28bsg: fix incorrect device_status valueFUJITA Tomonori
commit 478971600e47cb83ff2d3c63c5c24f2b04b0d6a1 upstream. bsg incorrectly returns sg's masked_status value for device_status. [jejb: fix up expression logic] Reported-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timerRichard Kennedy
commit a534dbe96e9929c7245924d8252d89048c23d569 upstream. blk_rq_timed_out_timer() relied on blk_add_timer() never returning a timer value of zero, but commit 7838c15b8dd18e78a523513749e5b54bda07b0cb removed the code that bumped this value when it was zero. Therefore when jiffies is near wrap we could get unlucky & not set the timeout value correctly. This patch uses a flag to indicate that the timeout value was set and so handles jiffies wrap correctly, and it keeps all the logic in one function so should be easier to maintain in the future. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34Martin K. Petersen
block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34 The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this. (Upstream commit 9504e0864b58b4a304820dcf3755f1da80d5e72f) The top device misalignment flag would not be set if the added bottom device was already misaligned as opposed to causing a stacking failure. Also massage the reporting so that an error is only returned if adding the bottom device caused the misalignment. I.e. don't return an error if the top is already flagged as misaligned. (Upstream commit fe0b393f2c0a0d23a9bc9ed7dc51a1ee511098bd) lcm() was defined to take integer-sized arguments. The supplied arguments are multiplied, however, causing us to overflow given sufficiently large input. That in turn led to incorrect optimal I/O size reporting in some cases. Switch lcm() over to unsigned long similar to gcd() and move the function from blk-settings.c to lib. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25block: bdev_stack_limits wrapperMartin K. Petersen
commit 17be8c245054b9c7786545af3ba3ca4e54cd4ad9 upstream. DM does not want to know about partition offsets. Add a partition-aware wrapper that DM can use when stacking block devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-03cfq-iosched: limit coop preemptionShaohua Li
CFQ has an optimization for cooperated applications. if several io-context have close requests, they will get boost. But the optimization get abused. Considering thread a, b, which work on one file. a reads sectors s, s+2, s+4, ...; b reads sectors s+1, s+3, s +5, ... Both a and b are sequential read, so they can open idle window. a reads a sector s and goes to idle window and wakeup b. b reads sector s+1, since in current implementation, cfq_should_preempt() thinks a and b are cooperators, b will preempt a. b then reads sector s+1 and goes to idle window and wakeup a. for the same reason, a will preempt b and reads s+2. a and b will continue the circle. The circle will be very long, and a and b will occupy whole disk queue. Other applications will nearly have no chance to run. Fix this limiting coop preempt until a queue is scheduled normally again. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03cfq-iosched: fix bad return value cfq_should_preempt()Jens Axboe
Commit a6151c3a5c8e1ff5a28450bc8d6a99a2a0add0a7 inadvertently reversed a preempt condition check, potentially causing a performance regression. Make the meta check correct again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-24block: silently error unsupported empty barriers tooMark McLoughlin
With 2.6.32-rc5 in a KVM guest using dm and virtio_blk, we see the following errors: end_request: I/O error, dev vda, sector 0 end_request: I/O error, dev vda, sector 0 The errors go away if dm stops submitting empty barriers, by reverting: commit 52b1fd5a27c625c78373e024bf570af3c9d44a79 Author: Mikulas Patocka <mpatocka@redhat.com> dm: send empty barriers to targets in dm_flush We should silently error all barriers, even empty barriers, on devices like virtio_blk which don't support them. See also: https://bugzilla.redhat.com/514901 Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-12blk-settings: fix function parameter kernel-doc notationRandy Dunlap
Fix kernel-doc notation in blk-settings.c::blk_queue_max_discard_sectors(). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-09elv_iosched_store(): fix strstrip() misuseKOSAKI Motohiro
elv_iosched_store() ignore the return value of strstrip(). It makes small inconsistent behavior. This patch fixes it. <before> ==================================== # cd /sys/block/{blockdev}/queue case1: # echo "anticipatory" > scheduler # cat scheduler noop [anticipatory] deadline cfq case2: # echo "anticipatory " > scheduler # cat scheduler noop [anticipatory] deadline cfq case3: # echo " anticipatory" > scheduler bash: echo: write error: Invalid argument <after> ==================================== # cd /sys/block/{blockdev}/queue case1: # echo "anticipatory" > scheduler # cat scheduler noop [anticipatory] deadline cfq case2: # echo "anticipatory " > scheduler # cat scheduler noop [anticipatory] deadline cfq case3: # echo " anticipatory" > scheduler noop [anticipatory] deadline cfq Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-08cfq-iosched: avoid probable slice overrun when idlingCorrado Zoccolo
If the average think time is larger than the remaining time slice for any given queue, don't allow it to idle. A succesful idle also means that we need to dispatch and complete a request, so if we don't even have time left for the idle process, we would overrun the slice in any case. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-07cfq-iosched: apply bool value where we return 0/1Jens Axboe
Saves 16 bytes of text, woohoo. But the more important point is that it makes the code more readable when returning bool for 0/1 cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-07cfq-iosched: fix think time allowed for seekersCorrado Zoccolo
CFQ enables idle only for processes that think less than the allowed idle time. Since idle time is lower for seeky queues, we should use the correct value in the comparison. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: fix the slice residual signJens Axboe
We should subtract the slice residual from the rb tree key, since a negative residual count indicates that the cfqq overran its slice the last time. Hence we want to add the overrun time, to position it a bit further away in the service tree. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: abstract out the 'may this cfqq dispatch' logicJens Axboe
Makes the whole thing easier to read, cfq_dispatch_requests() was a bit messy before. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06block: use proper BLK_RW_ASYNC in blk_queue_start_tag()Jens Axboe
Makes it easier to read than the 0. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06block: Seperate read and write statistics of in_flight requests v2Nikanth Karthikesan
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read and write statistics of in_flight requests. And exported the number of read and write requests in progress seperately through sysfs. But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange output from "iostat -kx 2". Global values for service time and utilization were garbage. For interval values, utilization was always 100%, and service time is higher than normal. So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b The problem was in part_round_stats_single(), I missed the following: if (now == part->stamp) return; - if (part->in_flight) { + if (part_in_flight(part)) { __part_stat_add(cpu, part, time_in_queue, part_in_flight(part) * (now - part->stamp)); __part_stat_add(cpu, part, io_ticks, (now - part->stamp)); With this chunk included, the reported regression gets fixed. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> -- Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05block: get rid of kblock_schedule_delayed_work()Jens Axboe
It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix possible problem with jiffies wraparoundCorrado Zoccolo
The RR service tree is indexed by a key that is relative to current jiffies. This can cause problems on jiffies wraparound. The patch fixes it using time_before comparison, and changing the add_front path to use a relative number, too. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix issue with rq-rq merging and fifo list orderingJens Axboe
cfq uses rq->start_time as the fifo indicator, but that field may get modified prior to cfq doing it's fifo list adjustment when a request gets merged with another request. This can cause the fifo list to become unordered. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04Revert "Seperate read and write statistics of in_flight requests"Jens Axboe
This reverts commit a9327cac440be4d8333bba975cbbf76045096275. Corrado Zoccolo <czoccolo@gmail.com> reports: "with 2.6.32-rc1 I started getting the following strange output from "iostat -kx 2": Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10,70 0,00 3,16 15,75 0,00 70,38 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 18,22 0,00 0,67 0,01 14,77 0,02 43,94 0,01 10,53 39043915,03 2629219,87 sdb 60,89 9,68 50,79 3,04 1724,43 50,52 65,95 0,70 13,06 488437,47 2629219,87 avg-cpu: %user %nice %system %iowait %steal %idle 2,72 0,00 0,74 0,00 0,00 96,53 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 6,68 0,00 0,99 0,00 0,00 92,33 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 4,40 0,00 0,73 1,47 0,00 93,40 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 4,00 0,00 3,00 0,00 28,00 18,67 0,06 19,50 333,33 100,00 Global values for service time and utilization are garbage. For interval values, utilization is always 100%, and service time is higher than normal. I bisected it down to: [a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write statistics of in_flight requests and verified that reverting just that commit indeed solves the issue on 2.6.32-rc1." So until this is debugged, revert the bad commit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04cfq-iosched: don't delay async queue if it hasn't dispatched at allJens Axboe
We cannot delay for the first dispatch of the async queue if it hasn't dispatched at all, since that could present a local user DoS attack vector using an app that just did slow timed sync reads while filling memory. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03block: Topology ioctlsMartin K. Petersen
Not all users of the topology information want to use libblkid. Provide the topology information through bdev ioctls. Also clarify sector size comments for existing BLK ioctls. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: use assigned slice sync value, not defaultJens Axboe
We should use the sysfs modified slice sync value, in case it differs from the default. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'Jens Axboe
Don't think that's necessarily a perfect description of what this option fiddles with, but it's probably better than 'desktop'. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: implement slower async initiate and queue ramp upJens Axboe
This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: delay async IO dispatch, if sync IO was just doneVivek Goyal
o Do not allow more than max_dispatch requests from an async queue, if some sync request has finished recently. This is in the hope that sync activity is still going on in the system and we might receive a sync request soon. Most likely from a sync queue which finished a request and we did not enable idling on it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-02cfq-iosched: add a knob for desktop interactivenessJens Axboe
This is basically identical to what Vivek Goyal posted, but combined into one and labelled 'desktop' instead of 'fairness'. The goal is to continue to improve on the latency side of things as it relates to interactiveness, keeping the questionable bits under this sysfs tunable so it would be easy for throughput-only people to turn off. Apart from adding the interactive sysfs knob, it also adds the behavioural change of allowing slice idling even if the hardware does tagged command queuing. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01Add a tracepoint for block request remappingJun'ichi Nomura
Since 2.6.31 now has request-based device-mapper, it's useful to have a tracepoint for request-remapping as well as bio-remapping. This patch adds a tracepoint for request-remapping, trace_block_rq_remap(). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: allow large discard requestsChristoph Hellwig
Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: use normal I/O path for discard requestsChristoph Hellwig
prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfsZdenek Kabelac
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82. Release kobject also in case the request_fn is NULL. Problem was noticed via kmemleak backtrace when some sysfs entries were note properly destroyed during device removal: unreferenced object 0xffff88001aa76640 (size 80): comm "lvcreate", pid 2120, jiffies 4294885144 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e...... 90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S..... backtrace: [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60 [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0 [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120 [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0 [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0 [<ffffffff81197d93>] sysfs_create_group+0x13/0x20 [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20 [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0 [<ffffffff812447e4>] add_disk+0x94/0x160 [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod] [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod] [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod] [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod] [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0 [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: Do not clamp max_hw_sectors for stacking devicesMartin K. Petersen
Stacking devices do not have an inherent max_hw_sector limit. Set the default to INT_MAX so we are bounded only by capabilities of the underlying storage. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: Set max_sectors correctly for stacking devicesMartin K. Petersen
The topology changes unintentionally caused SAFE_MAX_SECTORS to be set for stacking devices. Set the default limit to BLK_DEF_MAX_SECTORS and provide SAFE_MAX_SECTORS in blk_queue_make_request() for legacy hw drivers that depend on the old behavior. Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-19Driver-Core: extend devnode callbacks to provide permissionsKay Sievers
This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev debugfs: Modify default debugfs directory for debugging pktcdvd. debugfs: Modified default dir of debugfs for debugging UHCI. debugfs: Change debugfs directory of IWMC3200 debugfs: Change debuhgfs directory of trace-events-sample.h debugfs: Fix mount directory of debugfs by default in events.txt hpilo: add poll f_op hpilo: add interrupt handler hpilo: staging for interrupt handling driver core: platform_device_add_data(): use kmemdup() Driver core: Add support for compatibility classes uio: add generic driver for PCI 2.3 devices driver-core: move dma-coherent.c from kernel to driver/base mem_class: fix bug mem_class: use minor as index instead of searching the array driver model: constify attribute groups UIO: remove 'default n' from Kconfig Driver core: Add accessor for device platform data Driver core: move dev_get/set_drvdata to drivers/base/dd.c Driver core: add new device to bus's list before probing
2009-09-15driver model: constify attribute groupsDavid Brownell
Let attribute group vectors be declared "const". We'd like to let most attribute metadata live in read-only sections... this is a start. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-14Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14block: don't assume device has a request list backing in nr_requests storeJens Axboe
Stacked devices do not. For now, just error out with -EINVAL. Later we could make the limit apply on stacked devices too, for throttling reasons. This fixes 5a54cd13353bb3b88887604e2c980aa01e314309 and should go into 2.6.31 stable as well. Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14block: Optimal I/O limit wrapperMartin K. Petersen
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>