| Age | Commit message (Collapse) | Author |
|
commit 93f78da405685a756beeaeae4b5e41fcec39eab3 upstream.
This reverts commit c9e587abfdec2c2aaa55fab83bcb4972e2f84f9b, and the
subsequent commits that fixed it up:
- afa9b649 "fbcon: prevent cursor disappearance after switching to 512
character font"
- d850a2fa "vt/fbcon: fix background color on line feed"
- 7fe3915a "vt/fbcon: update scrl_erase_char after 256/512-glyph font
switch"
by request of Alan Cox. Quoth Alan:
"Unfortunately it's wrong and its been causing breakages because
various apps like ncurses expect our previous (and correct)
behaviour."
Alexander sent out a similar patch.
Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Tested-by: Jan Engelhardt <jengelh@medozas.de>
Cc: Alexander V. Lukyanov <lav@netis.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Dual 16950 Serial adapter
commit 39aced68d664291db3324d0fcf0985ab5626aac2 upstream.
The PCI-card identified as "Oxford Semiconductor Ltd EXSYS EX-41092 Dual
16950 Serial adapter" is only usable with other devices (i.e. not the same
card) after doing a "setserial /dev/ttyS<n> baud_base 115200". This
baud_base should be default for this card.
Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 97c44836cdec1ea713a15d84098a1a908157e68f upstream.
This patch makes the ROM reading code return an error to user space if
the size of the ROM read is equal to 0.
The patch also emits a warnings if the contents of the ROM are invalid,
and documents the effects of the "enable" file on ROM reading.
Signed-off-by: Timothy S. Nelson <wayland@wayland.id.au>
Acked-by: Alex Villacis-Lasso <a_villacis@palosanto.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 777c6c5f1f6e757ae49ecca2ed72d6b1f523c007 upstream.
With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.
Interruptible waiters don't do that when aborting due to a signal. And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.
This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently. The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.
Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock. Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.
Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.
[akpm@linux-foundation.org: coding-style fixes]
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Mentored-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7f9a50a5b89b87f8e754f59ae9968da28be618a5 upstream.
Impact: fix spurious BUG_ON() triggered under load
module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
can give false results, for example).
Almost noone should be using __module_get, but that's another issue.
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 57064d213d2e44654d4f13c66df135b5e7389a26 upstream.
This patch adds the Intel Tigerpoint LPC Controller DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9b22ea560957de1484e6b3e8538f7eef202e3596 upstream.
The changes to deliver hardware accelerated VLAN packets to packet
sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
The __vlan_hwaccel_rx() function is called directly from the drivers
RX function, for non-NAPI drivers that means its still in RX IRQ
context:
[ 27.779463] ------------[ cut here ]------------
[ 27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
...
[ 27.782520] [<c0264755>] netif_nit_deliver+0x5b/0x75
[ 27.782590] [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
[ 27.782664] [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
[ 27.782738] [<c0155b17>] handle_IRQ_event+0x23/0x51
[ 27.782808] [<c015692e>] handle_edge_irq+0xc2/0x102
[ 27.782878] [<c0105fd5>] do_IRQ+0x4d/0x64
Split hardware accelerated VLAN reception into two parts to fix this:
- __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
device lookup, then calls netif_receive_skb()/netif_rx()
- vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
in softirq context, performs the real reception and delivery to
packet sockets.
Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a229fc61ef0ee3c30fd193beee0eeb87410227f1 upstream.
bsg.h in current form is perfectly suitable for user-mode
consumption. It is needed together with scsi/sg.h for applications
that want to interface with the bsg driver.
Currently the few projects that use it would copy it over into
the projects. But that is not acceptable for projects that need
to provide source and devel packages for distros.
This should also be submitted to stable 2.6.28 and 2.6.27 since bsg had
a stable API since these Kernels and distro users will need the header
for these kernels a swell
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9df04e1f25effde823a600e755b51475d438f56b upstream.
Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances. That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.
Allowing top 4% of low memory (per user) to be allocated in epoll watches,
we have:
LOMEM MAX_WATCHES (per user)
512MB ~178000
1GB ~356000
2GB ~712000
A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us. No more max_user_instances
limits then.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Bron Gondwana <brong@fastmail.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e65f0f8271b1b0452334e5da37fd35413a000de4 upstream.
Add support for Sealevel Systems Model 7803 COMM+8
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e4d866cdea24543ee16ce6d07d80c513e86ba983 upstream.
It supports VX855 and future chips whose IDE controller uses PCI ID 0x0571.
Signed-off-by: Joseph Chan <josephchan@via.com.tw>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b94b898f3107046b5c97c556e23529283ea5eadd upstream.
On Vortex86SX with IDE controller revision 0x11 ultra DMA must be
disabled. This patch was tested by DMP and seems to work.
It is a cleaned up version of their older Kernel patch:
http://www.dmp.com.tw/tech/vortex86sx/patch-2.6.24-DMP.gz
Tested-by: Shawn Lin <shawn@dmp.com.tw>
Signed-off-by: Brandon Philips <bphilips@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 856bf4d717feb8c55d4e2f817b71ebb70cfbc67b upstream.
s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.
This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync. This was replaced
by __put_super_and_need_restart.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4f5a99d64c17470a784a6c68064207d82e3e74a5 upstream.
Remove WB_SYNC_HOLD. The primary motiviation is the design of my
anti-starvation code for fsync. It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes. It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.
Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback. In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.
It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise? For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.
Fixing the existing data integrity problem will come next.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 25ff1c316f6a763f1eefe7f8984b2d8c03888432 upstream.
This patch (as1189d) adds some hacks to usb-storage for dealing with
the growing problems involving bad capacity values and last-sector
accesses:
A new flag, US_FL_CAPACITY_OK, is created to indicate that
the device is known to report its capacity correctly. An
unusual_devs entry for Linux's own File-backed Storage Gadget
is added with this flag set, since g_file_storage always
reports the correct capacity and since the capacity need
not be even (it is determined by the size of the backing
file).
An entry in unusual_devs.h which has only the CAPACITY_OK
flag set shouldn't prejudice libusual, since the device will
work perfectly well with either usb-storage or ub. So a
new macro, COMPLIANT_DEV, is added to let libusual know
about these entries.
When a last-sector access fails three times in a row and
neither the FIX_CAPACITY nor the CAPACITY_OK flag is set,
we assume the last-sector bug is present. We replace the
existing status and sense data with values that will cause
the SCSI core to fail the access immediately rather than
retry indefinitely. This should fix the difficulties
people have been having with Nokia phones.
This version of the patch differs from the version accepted into the
mainline only in that it does not trigger a WARN() when an
odd-numbered last-sector access succeeds. In a stable kernel series
we don't want to go around spamming users' logs and consoles for no
good reason.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e8c82c2e23e3527e0c9dc195e432c16784d270fa upstream.
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.
This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.
Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
Assembly of find_get_pages,
before:
.L220:
movq (%rbx), %rax #* ivtmp.1162, tmp82
movq (%rax), %rdi #, prephitmp.1149
.L218:
testb $1, %dil #, prephitmp.1149
jne .L217 #,
testq %rdi, %rdi # prephitmp.1149
je .L203 #,
cmpq $-1, %rdi #, prephitmp.1149
je .L217 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L218 #,
after:
.L212:
movq (%rbx), %rax #* ivtmp.1109, tmp81
movq (%rax), %rdi #, ret
testb $1, %dil #, ret
jne .L211 #,
testq %rdi, %rdi # ret
je .L197 #,
cmpq $-1, %rdi #, ret
je .L211 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L212 #,
(notice the obvious infinite loop in the first example, if page->count remains 0)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 18e6959c385f3edf3991fa6662a53dac4eb10d5b upstream.
This assertion is incorrect for lockless pagecache. By definition if we
have an unpinned page that we are trying to take a speculative reference
to, it may become the tail of a compound page at any time (if it is
freed, then reallocated as a compound page).
It was still a valid assertion for the vmscan.c LRU isolation case, but
it doesn't seem incredibly helpful... if somebody wants it, they can
put it back directly where it applies in the vmscan code.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 54566b2c1594c2326a645a3551f9d989f7ba3c5e upstream.
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2b66421995d2e93c9d1a0111acf2581f8529c6e5 upstream.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d4e82042c4cfa87a7d51710b71f568fe80132551 upstream.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ee6a093222549ac0c72cfd296c69fa5e7d6daa34 upstream.
This enables the use of syscall wrappers to do proper sign extension
for 64-bit programs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1a94bc34768e463a93cb3751819709ab0ea80a01 upstream.
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
By selecting HAVE_SYSCALL_WRAPPERS architectures can activate
system call wrappers in order to sign extend system call arguments.
All architectures where the ABI defines that the caller of a function
has to perform sign extension probably need this.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e55380edf68796d75bf41391a781c68ee678587d upstream.
This way it matches the generic system call name convention.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2ed7c03ec17779afb4fcfa3b8c61df61bd4879ba upstream.
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4c696ba7982501d43dea11dbbaabd2aa8a19cc42 upstream.
Move declarations to correct header file.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4ae8978cf92a96257cd8998a49e781be83571d64 upstream.
The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.
For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
currently 'u32', it should be '__s32' . That is Robert's suggestion, and
is consistent with the other declarations of watch descriptors in the
kernel source, in particular, the inotify_event structure in
include/linux/inotify.h:
struct inotify_event {
__s32 wd; /* watch descriptor */
__u32 mask; /* watch mask */
__u32 cookie; /* cookie to synchronize two events */
__u32 len; /* length (including nulls) of name */
char name[0]; /* stub for possible name */
};
The patch makes the changes needed for inotify_rm_watch().
Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Robert Love <rlove@google.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1c5745aa380efb6417b5681104b007c8612fb496 upstream.
Redo:
5b7dba4: sched_clock: prevent scd->clock from moving backwards
which had to be reverted due to s2ram hangs:
ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards"
... this time with resume restoring GTOD later in the sequence
taken into account as well.
The "timekeeping_suspended" flag is not very nice but we cannot call into
GTOD before it has been properly resumed and the scheduler will run very
early in the resume sequence.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d253eee20195b25e298bf162a6e72f14bf4803e5 upstream.
Due to a wrong safety check in af_can.c it was not possible to filter
for SFF frames with a specific CAN identifier without getting the
same selected CAN identifier from a received EFF frame also.
This fix has a minimum (but user visible) impact on the CAN filter
API and therefore the CAN version is set to a new date.
Indeed the 'old' API is still working as-is. But when now setting
CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic
than before - but still the stuff that you expected to get for your
defined filter ...
Thanks to Kurt Van Dijck for pointing at this issue and for the review.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b563cf59c4d67da7d671788a9848416bfa4180ab upstream.
PnP encodes the resource type directly as its struct resource->flags value
which is an unsigned long. Make it so...
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit bf2a9a39639b8b51377905397a5005f444e9a892 upstream.
binfmt_script and binfmt_misc disallow recursion to avoid stack overflow
using sh_bang and misc_bang. It causes problem in some cases:
$ echo '#!/bin/ls' > /tmp/t0
$ echo '#!/tmp/t0' > /tmp/t1
$ echo '#!/tmp/t1' > /tmp/t2
$ chmod +x /tmp/t*
$ /tmp/t2
zsh: exec format error: /tmp/t2
Similar problem with binfmt_misc.
This patch introduces field 'recursion_depth' into struct linux_binprm to
track recursion level in binfmt_misc and binfmt_script. If recursion
level more then BINPRM_MAX_RECURSION it generates -ENOEXEC.
[akpm@linux-foundation.org: make linux_binprm.recursion_depth a uint]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4afe978530702c934dfdb11f54073136818b2119 upstream.
When a checkpointing IO fails, current JBD code doesn't check the error
and continue journaling. This means latest metadata can be lost from both
the journal and filesystem.
This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint(). To achieve this, we
need to do:
1. don't remove the failed buffer from the checkpoint list where in
the case of __try_to_free_cp_buf() because it may be released or
overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
prevent the journaled contents from being cleaned. For safety,
don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
that ext3 don't clear the needs_recovery flag, otherwise the
journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
__journal_drop_transaction() and journal_abort() (a race issue
between journal_flush() and __log_wait_for_space()
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f2f1fa78a155524b849edf359e42a3001ea652c0 upstream.
There's no point in having too short SG_IO timeouts, since if the
command does end up timing out, we'll end up through the reset sequence
that is several seconds long in order to abort the command that timed
out.
As a result, shorter timeouts than a few seconds simply do not make
sense, as the recovery would be longer than the timeout itself.
Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT.
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
trimed down version of commit 05496769e5da83ce22ed97345afd9c7b71d6bd24 upstream.
Some devices such as "cciss/c0d0p9" will cause jbd2 setup and teardown
failures when /proc filenames are created with embedded slashes. This
is a slimmed down version of commit 05496769, with the stack reduction
aspects of the patch omitted to meet the -stable criteria.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ac70a964b0e22a95af3628c344815857a01461b7 upstream.
Some recent Seagate harddrives have firmware bug which causes FLUSH
CACHE to timeout under certain circumstances if NCQ is being used.
This can be worked around by disabling NCQ and fixed by updating the
firmware. Implement ATA_HORKAGE_FIRMWARE_UPDATE and blacklist these
devices.
The wiki page has been updated to contain information on this issue.
http://ata.wiki.kernel.org/index.php/Known_issues
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 58319b802a614f10f1b5238fbde7a4b2e9a60069 upstream.
Now that the PCI core manages the 'name' for each individual
hotplug driver, and all drivers (except rpaphp) have been converted
to use hotplug_slot_name(), there is no need for the PCI hotplug
core to drag around its own copy of name either.
Cc: kristen.c.accardi@intel.com
Cc: matthew@wil.cx
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0ad772ec464d3fcf9d210836b97e654f393606c4 upstream
In preparation for cleaning up the various hotplug drivers
such that they don't have to manage their own 'name' parameters
anymore, we provide the following convenience functions:
pci_slot_name()
hotplug_slot_name()
These helpers will be used by individual hotplug drivers.
Cc: kristen.c.accardi@intel.com
Cc: matthew@wil.cx
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 828f37683e6d3ab5912989df0d04201db7ad798e upstream.
Slot detection drivers can co-exist with hotplug drivers. The names
of the detected/claimed slots may be different depending on module
load order.
For legacy reasons, we need to allow hotplug drivers to override
the slot name if a detection driver is loaded first (and they find
the same slots).
Creating and overriding slot names should be an atomic operation,
otherwise you get a locking nightmare as various drivers race to
call pci_create_slot().
pci_create_slot() is already serialized by grabbing the pci_bus_sem.
We update the API and add a 'hotplug' param, which is:
set if the caller is a hotplug driver
NULL if the caller is a detection driver
pci_create_slot() does not actually use the 'hotplug' parameter in this
patch. A later patch will add the logic that uses it.
Cc: kristen.c.accardi@intel.com
Cc: matthew@wil.cx
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1359f2701b96abd9bb69c1273fb995a093b6409a upstream.
Update pci_hp_register() to take a const char *name parameter.
The motivation for this is to clean up the individual hotplug
drivers so that each one does not have to manage its own name.
The PCI core should be the place where we manage the name.
We update the interface and all callsites first, in a
"no functional change" manner, and clean up the drivers later.
Cc: kristen.c.accardi@intel.com
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6ff2d39b91aec3dcae951afa982059e3dd9b49dc upstream.
2nd part of the fixes needed for
http://bugzilla.kernel.org/show_bug.cgi?id=11796.
When the idr tree is either grown or shrunk, then the update to the number
of layers and the top pointer were not atomic. This race caused crashes.
The attached patch fixes that by replicating the layers counter in each
layer, thus idr_find doesn't need idp->layers anymore.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Clement Calmels <cboulte@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8f7b0ba1c853919b85b54774775f567f30006107 upstream.
Inotify watch removals suck violently.
To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex. That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount. We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.
Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done. Cleanup is just deactivate_super().
However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords. That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.
We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable. OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e. that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().
So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong. We had
to drop ih->mutex before we could grab ->s_umount. So the watch
could've been gone already.
That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer. If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address. That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that. Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.
So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check. If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.
And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7ef9964e6d1b911b78709f144000aacadd0ebc21 upstream.
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface. Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds. To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced. A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:
max_user_instances = Maximum number of devices - per user
max_user_watches = Maximum number of "watched" fds - per user
The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM. As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users. The
default value for "max_user_instances" is set to 128, that should be
enough too.
This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
listed, so that should be ok.
[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 352d026338378b1f13f044e33c1047da6e470056 upstream.
This patch (as1155) fixes a bug in usbcore. When interfaces are
deleted, either because the device was disconnected or because of a
configuration change, the extra attribute files and child endpoint
devices may get left behind. This is because the core removes them
before calling device_del(). But during device_del(), after the
driver is unbound the core will reinstall altsetting 0 and recreate
those extra attributes and children.
The patch prevents this by adding a flag to record when the interface
is in the midst of being unregistered. When the flag is set, the
attribute files and child devices will not be created.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8677142710516d986d932d6f1fba7be8382c1fec upstream
backported by Nikanth Karthikesan <knikanth@suse.de> to the 2.6.27.y tree.
block: fix nr_phys_segments miscalculation bug
This fixes the bug reported by Nikanth Karthikesan <knikanth@suse.de>:
http://lkml.org/lkml/2008/10/2/203
The root cause of the bug is that blk_phys_contig_segment
miscalculates q->max_segment_size.
blk_phys_contig_segment checks:
req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size
But blk_recalc_rq_segments might expect that req->biotail and the
previous bio in the req are supposed be merged into one
segment. blk_recalc_rq_segments might also expect that next_req->bio
and the next bio in the next_req are supposed be merged into one
segment. In such case, we merge two requests that can't be merged
here. Later, blk_rq_map_sg gives more segments than it should.
We need to keep track of segment size in blk_recalc_rq_segments and
use it to see if two requests can be merged. This patch implements it
in the similar way that we used to do for hw merging (virtual
merging).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 467622ef2acb01986eab37ef96c3632b3ea35999 upstream
For "unlock" cycles to 16bit devices in 8bit compatibility mode we need
to use the byte addresses 0xaaa and 0x555. These effectively match
the word address 0x555 and 0x2aa, except the latter has its low bit set.
Most chips don't care about the value of the 'A-1' pin in x8 mode,
but some -- like the ST M29W320D -- do. So we need to be careful to
set it where appropriate.
cfi_send_gen_cmd is only ever passed addresses where the low byte
is 0x00, 0x55 or 0xaa. Of those, only addresses ending 0xaa are
affected by this patch, by masking in the extra low bit when the device
is known to be in compatibility mode.
[dwmw2: Do it only when (cmd_ofs & 0xff) == 0xaa]
v4: Fix stupid typo in cfi_build_cmd_addr that failed to compile
I'm writing this patch way to late at night.
v3: Bring all of the work back into cfi_build_cmd_addr
including calling of map_bankwidth(map) and cfi_interleave(cfi)
So every caller doesn't need to.
v2: Only modified the address if we our device_type is larger than our
bus width.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f8d570a4745835f2238a33b537218a1bb03fc671 and
3b53fbf4314594fa04544b02b2fc6e607912da18 upstream (because once wasn't
good enough...)
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This one fixes http://bugzilla.kernel.org/show_bug.cgi?id=11602.
A more generic fix for drives which cannot autoclose tray will follow.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
[bart: add an extra parentheses for consistency with the rest of kernel code]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
include/linux/stacktrace.h:13: warning:
'struct task_struct' declared inside parameter list
(This might be a hard error on sparc64, which uses this header and has
-Werror)
Reported-by: "Randy.Dunlap" <rdunlap@xenotime.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The previous patch db203d53d474aa068984e409d807628f5841da1b ("mm:
tiny-shmem fix lock ordering: mmap_sem vs i_mutex") to fix the lock
ordering in tiny-shmem breaks shared anonymous and IPC memory on NOMMU
architectures because it was using the expanding truncate to signal ramfs
to allocate a physically contiguous RAM backing the inode (otherwise it is
unusable for "memory mapping" it to userspace).
However do_truncate is what caused the lock ordering error, due to it
taking i_mutex. In this case, we can actually just call ramfs directly to
allocate memory for the mapping, rather than go via truncate.
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Impact: per CPU hrtimers can be migrated from a dead CPU
The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.
Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Impact: during migration active hrtimers can be seen as inactive
The migration code removes the hrtimers from the queues of the dead
CPU and sets the state temporary to INACTIVE. The enqueue code sets it
to ACTIVE/PENDING again.
Prevent that the wrong state can be seen by using a separate migration
state bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|