Age | Commit message (Collapse) | Author |
|
commit 22e7478ddbcb670e33fab72d0bbe7c394c3a2c84 upstream.
Prior to commit 0e4f6a791b1e (Fix reiserfs_file_release()), reiserfs
truncates serialized on i_mutex. They mostly still do, with the exception
of reiserfs_file_release. That blocks out other writers via the tailpack
mutex and the inode openers counter adjusted in reiserfs_file_open.
However, NFS will call reiserfs_setattr without having called ->open, so
we end up with a race when nfs is calling ->setattr while another
process is releasing the file. Ultimately, it triggers the
BUG_ON(inode->i_size != new_file_size) check in maybe_indirect_to_direct.
The solution is to pull the lock into reiserfs_setattr to encompass the
truncate_setsize call as well.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6663a4fa6711050036562ddfd2086edf735fae21 upstream.
Commit 59a53afe70fd530040bdc69581f03d880157f15a "powerpc: Don't setup
CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs
that aren't presently running shall have status of disabled, with
enable-method being used to determine whether the CPU can be enabled.
Fix by checking for spin-table, which is currently the only supported
enable-method.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit dd58a092c4202f2bd490adab7285b3ff77f8e467 upstream.
The Vector Crypto category instructions are supported by current POWER8
chips, advertise them to userspace using a specific bit to properly
differentiate with chips of the same architecture level that might not
have them.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 59a53afe70fd530040bdc69581f03d880157f15a upstream.
OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
node.
Unfortunatley Linux doesn't check this property and will put the bad CPU in the
present map. This has caused hangs on booting when we try to unsplit the core.
This patch checks the CPU is avaliable via this status property before putting
it in the present map.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit b69a1da94f3d1589d1942b5d1b384d8cfaac4500 upstream.
Commit cd64d1697cf0 ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.
Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3df48c981d5a9610e02e9270b1bc4274fb536710 upstream.
In commit 330a1eb "Core EBB support for 64-bit book3s" I messed up
clear_task_ebb(). It clears some but not all of the task's Event Based
Branch (EBB) registers when we duplicate a task struct.
That allows a child task to observe the EBBHR & EBBRR of its parent,
which it should not be able to do.
Fix it by clearing EBBHR & EBBRR.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6e0fdf9af216887e0032c19d276889aad41cad00 upstream.
Commit b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling
perf_event_do_pending") added a check for CONFIG_PMAC were a check for
CONFIG_PPC_PMAC was clearly intended.
Fixes: b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5d73320a96fcce80286f1447864c481b5f0b96fa upstream.
commit 8f9c0119d7ba (compat: fs: Generic compat_sys_sendfile
implementation) changed the PowerPC 64bit sendfile call from
sys_sendile64 to sys_sendfile.
Unfortunately this broke sendfile of lengths greater than 2G because
sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
fixes the bug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c4cad90f9e9dcb85afc5e75a02ae3522ed077296 upstream.
We had a mix & match of flags used when creating legacy ports
depending on where we found them in the device-tree. Among others
we were missing UPF_SKIP_TEST for some kind of ISA ports which is
a problem as quite a few UARTs out there don't support the loopback
test (such as a lot of BMCs).
Let's pick the set of flags used by the SoC code and generalize it
which means autoconf, no loopback test, irq maybe shared and fixed
port.
Sending to stable as the lack of UPF_SKIP_TEST is breaking
serial on some machines so I want this back into distros
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 09567e7fd44291bfc08accfdd67ad8f467842332 upstream.
We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.
The bug is as follows:
1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
2. Slice code converts the slice psize to 16M.
3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
synchronises the mm->context with the paca->context. So the paca slice
mask is updated to include the 16M slice.
3. Either:
* mmap() fails because there are no huge pages available.
* mmap() succeeds and the mapping is then munmapped.
In both cases the slice psize remains at 16M in both the paca & mm.
4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
5. The slice psize is converted back to 64K. Because of the check on line 539
of slice.c we DO NOT update the paca->context. The paca slice mask is now
out of sync with the mm slice mask.
6. User/kernel accesses 0xa0000000.
7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
hash. Although the THP mapping maps 16M the hashing is done using 64K
as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
over the code at line 1125. Resulting in the mismatch between the
paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
which succeeds.
17. Goto 14.
We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.
The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.
We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.
Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.
Thanks to Dave Jones for trinity, which originally found this bug.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 54f112a3837d4e7532bbedbbbf27c0de277be510 upstream.
In pseries_eeh_get_state(), EEH_STATE_UNAVAILABLE is always
overwritten by EEH_STATE_NOT_SUPPORT because of the missed
"break" there. The patch fixes the issue.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 18dd78c427513fb0f89365138be66e6ee8700d1b upstream.
NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation.
We're still having some problems with data corruption when multiple
clients are appending to a file and those clients are being granted
write delegations on open.
To reproduce:
Client A:
vi /mnt/`hostname -s`
while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
Client B:
vi /mnt/`hostname -s`
while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
What's happening is that in nfs_update_inode() we're recognizing that
the file size has changed and we're setting NFS_INO_INVALID_DATA
accordingly, but then we ignore the cache_validity flags in
nfs_write_pageuptodate() because we have a delegation. As a result,
in nfs_updatepage() we're extending the write to cover the full page
even though we've not read in the data to begin with.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a914722f333b3359d2f4f12919380a334176bb89 upstream.
Otherwise the kernel oopses when remounting with IPv6 server because
net is dereferenced in dev_get_by_name.
Use net ns of current thread so that dev_get_by_name does not operate on
foreign ns. Changing the address is prohibited anyway so this should not
affect anything.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 43b6535e717d2f656f71d9bd16022136b781c934 upstream.
Fix a bug, whereby nfs_update_inode() was declaring the inode to be
up to date despite not having checked all the attributes.
The bug occurs because the temporary variable in which we cache
the validity information is 'sanitised' before reapplying to
nfsi->cache_validity.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 12337901d654415d9f764b5f5ba50052e9700f37 upstream.
Note nobody's ever noticed because the typical client probably never
requests FILES_AVAIL without also requesting something else on the list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6df200f5d5191bdde4d2e408215383890f956781 upstream.
Return the NULL pointer when the allocation fails.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c789102c20bbbdda6831a273e046715be9d6af79 upstream.
If the accept() call fails, we need to put the module reference.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 60e1751cb52cc6d1ae04b6bd3c2b96e770b5823f upstream.
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
triggers a use-after-free. __fput() invokes f_op->release() before it
invokes cdev_put(). Make sure that the ib_umad_device structure is
freed by the cdev_put() call instead of f_op->release(). This avoids
that changing the port mode from IB into Ethernet and back to IB
followed by restarting opensmd triggers the following kernel oops:
general protection fault: 0000 [#1] PREEMPT SMP
RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170
Call Trace:
[<ffffffff81190f20>] cdev_put+0x20/0x30
[<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
[<ffffffff8118e35e>] ____fput+0xe/0x10
[<ffffffff810723bc>] task_work_run+0xac/0xe0
[<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
[<ffffffff814b8398>] int_signal+0x12/0x17
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8ec0a0e6b58218bdc1db91dd70ebfcd6ad8dd6cd upstream.
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
or if nonseekable_open() fails.
Avoid leaking a kref count, that sm_sem is kept down and also that the
IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
nonseekable_open() fails.
Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL.
Moving the kref_get() call from the start of ib_umad_*open() to the
end is safe since it is the responsibility of the caller of these
functions to ensure that the cdev pointer remains valid until at least
when these functions return.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
[ nonseekable_open() can't actually fail, but.... - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 024ca90151f5e4296d30f72c13ff9a075e23c9ec upstream.
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 7e6d3e5c70f13874fb06e6b67696ed90ce79bd48 upstream.
This patch addresses an issue where the legacy diagpacket is sent in
from the user, but the driver operates on only the extended
diagpkt. This patch specifically initializes the extended diagpkt
based on the legacy packet.
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 911eccd284d13d78c92ec4f1f1092c03457d732a upstream.
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE.
As of the dual port qib QDR card, this is not necessarily correct.
Change to use the port as specified in the call.
Reported-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 43bc889380c2ad9aa230eccc03a15cc52cf710d4 upstream.
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be
aligned on a 8 bytes multiple, while for i386, such padding is not
added.
Tool pahole could be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_srq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
# +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
# --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
# @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq {
# __u64 db_addr; /* 8 8 */
# __u32 flags; /* 16 4 */
#
# - /* size: 20, cachelines: 1, members: 3 */
# - /* last cacheline: 20 bytes */
# + /* size: 24, cachelines: 1, members: 3 */
# + /* padding: 4 */
# + /* last cacheline: 24 bytes */
# };
ABI disagreement will make an x86_64 kernel try to read past
the buffer provided by an i386 binary.
When boundary check will be implemented, the x86_64 kernel will
refuse to read past the i386 userspace provided buffer and the
uverb will fail.
Anyway, if the structure lay in memory on a page boundary and
next page is not mapped, ib_copy_from_udata() will fail and the
uverb will fail.
This patch makes create_srq_user() takes care of the input
data size to handle the case where no padding was provided.
This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq
as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a8237b32a3faab155a5dc8f886452147ce73da3e upstream.
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a
8 bytes multiple, while for i386, such padding is not added.
The tool pahole can be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_cq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
# +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
# --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
# @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq {
# __u64 db_addr; /* 8 8 */
# __u32 cqe_size; /* 16 4 */
#
# - /* size: 20, cachelines: 1, members: 3 */
# - /* last cacheline: 20 bytes */
# + /* size: 24, cachelines: 1, members: 3 */
# + /* padding: 4 */
# + /* last cacheline: 24 bytes */
# };
This ABI disagreement will make an x86_64 kernel try to read past the
buffer provided by an i386 binary.
When boundary check will be implemented, a x86_64 kernel will refuse
to read past the i386 userspace provided buffer and the uverb will
fail.
Anyway, if the structure lies in memory on a page boundary and next
page is not mapped, ib_copy_from_udata() will fail when trying to read
the 4 bytes of padding and the uverb will fail.
This patch makes create_cq_user() takes care of the input data size to
handle the case where no padding is provided.
This way, x86_64 kernel will be able to handle struct
mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
with the watchdog
commit a9e0436b303e94ba57d3bd4b1fcbeaa744b7ebeb upstream.
Use the prescaler index, rather than its value, to configure the watchdog.
This will prevent a mismatch with the prescaler used to calculate the cycles.
Signed-off-by: Per Gundberg <per.gundberg@icomera.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Michael Brunner <michael.brunner@kontron.com>
Tested-by: Michael Brunner <michael.brunner@kontron.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 23afeb613ec0e10aecfae7838a14d485db62ac52 upstream.
On some AR934x based systems, where the frequency of
the AHB bus is relatively high, the built-in watchdog
causes a spurious restart when it gets enabled.
The possible cause of these restarts is that the timeout
value written into the TIMER register does not reaches
the hardware in time.
Add an explicit delay into the ath79_wdt_enable function
to avoid the spurious restarts.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 938626d96a3ffb9eb54552bb0d3a4f2b30ffdeb0 upstream.
Implementation of ->set_timeout() is supposed to set 'timeout' field of 'struct
watchdog_device' passed to it. sp805 was rather setting this in a local
variable. Fix it.
Reported-by: Arun Ramamurthy <arun.ramamurthy@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 72abc8f4b4e8574318189886de627a2bfe6cd0da upstream.
I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
shows like this:
[ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297)
[ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1
[ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24)
[ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28)
[ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs])
[ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs])
[ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8)
[ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544)
[ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398)
[ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8)
[ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238)
[ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c)
[ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188)
[ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs])
[ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs])
[ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418)
[ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530)
[ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54)
[ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184)
[ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac)
[ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0)
[ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40)
After analyzing the code, I found a condition that may cause this failed
in correct operations. Thus, I think this assertion is wrong and should be
removed.
Suppose there are two clean znodes and one dirty znode in TNC. So the
per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode
is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops
on this znode. We clear COW bit and DIRTY bit in write_index() without
@tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the
comments in write_index() shows, if another process hold @tnc_mutex and
dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1).
We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in
free_obsolete_znodes() to keep it right.
If shrink_tnc() performs between decrease and increase, it will release
other 2 clean znodes it holds and found @clean_zn_cnt is less than zero
(1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will
soon correct @clean_zn_cnt and no harm to fs in this case, I think this
assertion could be removed.
2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2
Thread A (commit) Thread B (write or others) Thread C (shrinker)
->write_index
->clear_bit(DIRTY_NODE)
->clear_bit(COW_ZNODE)
@clean_zn_cnt == 2
->mutex_locked(&tnc_mutex)
->dirty_cow_znode
->!ubifs_zn_cow(znode)
->!test_and_set_bit(DIRTY_NODE)
->atomic_dec(&clean_zn_cnt)
->mutex_unlocked(&tnc_mutex)
@clean_zn_cnt == 1
->mutex_locked(&tnc_mutex)
->shrink_tnc
->destroy_tnc_subtree
->atomic_sub(&clean_zn_cnt, 2)
->ubifs_assert <- hit
->mutex_unlocked(&tnc_mutex)
@clean_zn_cnt == -1
->mutex_lock(&tnc_mutex)
->free_obsolete_znodes
->atomic_inc(&clean_zn_cnt)
->mutux_unlock(&tnc_mutex)
@clean_zn_cnt == 0 (correct after shrink)
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 691a7c6f28ac90cccd0dbcf81348ea90b211bdd0 upstream.
There is a race condition in UBIFS:
Thread A (mmap) Thread B (fsync)
->__do_fault ->write_cache_pages
-> ubifs_vm_page_mkwrite
-> budget_space
-> lock_page
-> release/convert_page_budget
-> SetPagePrivate
-> TestSetPageDirty
-> unlock_page
-> lock_page
-> TestClearPageDirty
-> ubifs_writepage
-> do_writepage
-> release_budget
-> ClearPagePrivate
-> unlock_page
-> !(ret & VM_FAULT_LOCKED)
-> lock_page
-> set_page_dirty
-> ubifs_set_page_dirty
-> TestSetPageDirty (set page dirty without budgeting)
-> unlock_page
This leads to situation where we have a diry page but no budget allocated for
this page, so further write-back may fail with -ENOSPC.
In this fix we return from page_mkwrite without performing unlock_page. We
return VM_FAULT_LOCKED instead. After doing this, the race above will not
happen.
Signed-off-by: hujianyang <hujianyang@huawei.com>
Tested-by: Laurence Withers <lwithers@guralp.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ab6c15bc6620ebe220970cc040b29bcb2757f373 upstream.
Previously, the lower limit for the MIPS SC initialization loop was
set incorrectly allowing one extra loop leading to writes
beyond the MSC ioremap'd space. More precisely, the value of the 'imp'
in the last loop increased beyond the msc_irqmap_t boundaries and
as a result of which, the 'n' variable was loaded with an incorrect
value. This value was used later on to calculate the offset in the
MSC01_IC_SUP which led to random crashes like the following one:
CPU 0 Unable to handle kernel paging request at virtual address e75c0200,
epc == 8058dba4, ra == 8058db90
[...]
Call Trace:
[<8058dba4>] init_msc_irqs+0x104/0x154
[<8058b5bc>] arch_init_irq+0xd8/0x154
[<805897b0>] start_kernel+0x220/0x36c
Kernel panic - not syncing: Attempted to kill the idle task!
This patch fixes the problem
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7118/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 91ad11d7cc6f4472ebf177a6252fbf0fd100d798 upstream.
On MIPS calls to _mcount in modules generate 2 instructions to load
the _mcount address (and therefore 2 relocations). The mcount_loc
table should only reference the first of these, so the second is
filtered out by checking the relocation offset and ignoring ones that
immediately follow the previous one seen.
However if a module has an _mcount call at offset 0, the second
relocation would not be filtered out due to old_r_offset == 0
being taken to mean that the current relocation is the first one
seen, and both would end up in the mcount_loc table.
This results in ftrace_make_nop() patching both (adjacent)
instructions to branches over the _mcount call sequence like so:
0xffffffffc08a8000: 04 00 00 10 b 0xffffffffc08a8014
0xffffffffc08a8004: 04 00 00 10 b 0xffffffffc08a8018
0xffffffffc08a8008: 2d 08 e0 03 move at,ra
...
The second branch is in the delay slot of the first, which is
defined to be unpredictable - on the platform on which this bug was
encountered, it triggers a reserved instruction exception.
Fix by initializing old_r_offset to ~0 and using that instead of 0
to determine whether the current relocation is the first seen.
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7098/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit af5ded8ccf21627f9614afc03b356712666ed225 upstream.
In module exit, dfs_parent and it's subtree were removed before
unregistering with pci. When debugfs entry for each device is attempted
to remove in pci_remove() context, they don't exist, as dfs_parent and
its children were already ripped apart.
Modified to first unregister with pci and then remove dfs_parent.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 670a641420a3d9586eebe7429dfeec4e7ed447aa upstream.
Increased timeout for STANDBY IMMEDIATE command to 2 minutes.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d1e714db8129a1d3670e449b87719c78e2c76f9f upstream.
A hardware quirk in P320h/P420m interfere with PCIe transactions on some
AMD chipsets, making P320h/P420m unusable. This workaround is to disable
ERO and NoSnoop bits in the parent and root complex for normal
functioning of these devices
NOTE: This workaround is specific to AMD chipset with a PCIe upstream
device with device id 0x5aXX
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 67ebd8140dc8923c65451fa0f6a8eee003c4dcd3 upstream.
3448a19da479 "vgaarb: use bridges to control VGA routing where possible"
added the "flags & PCI_VGA_STATE_CHANGE_DECODES" condition to an existing
WARN_ON(), but used bitwise AND (&) instead of logical AND (&&), so the
condition is never true. Replace with logical AND.
Found by Coverity (CID 142811).
Fixes: 3448a19da479 "vgaarb: use bridges to control VGA routing where possible"
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Airlie <airlied@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 7c82126a94e69bbbac586f0249e7ef11e681246c upstream.
After a CPU upgrade while keeping the same mainboard, we faced "spurious
interrupt" problems again.
It turned out that the new CPU also featured a new GPU with a different PCI
ID.
Add this PCI ID to the quirk table. Probably all other Intel GPU PCI IDs
are affected, too, but I don't want to add them without a test system.
See f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") for some history.
[bhelgaas: add f67fd55fa96f reference, stable tag]
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit fb4f8f568a9def02240ef9bf7aabd246dc63a081 upstream.
The touchpad on the GIGABYTE U2442 not only stops communicating when we try
to set bit 3 (enable real hardware resolution) of reg_10, but on some BIOS
versions also when we set bit 1 (enable two finger mode auto correct).
I've asked the original reporter of:
https://bugzilla.kernel.org/show_bug.cgi?id=61151
To check that not setting bit 1 does not lead to any adverse effects on his
model / BIOS revision, and it does not, so this commit fixes the touchpad
not working on these versions by simply never setting bit 1 for laptop
models with the no_hw_res quirk.
Reported-and-tested-by: James Lademann <jwlademann@gmail.com>
Tested-by: Philipp Wolfer <ph.wolfer@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit cd9e83e2754465856097f31c7ab933ce74c473f8 upstream.
At least the Dell Vostro 5470 elantech *clickpad* reports right button
clicks when clicked in the right bottom area:
https://bugzilla.redhat.com/show_bug.cgi?id=1103528
This is different from how (elantech) clickpads normally operate, normally
no matter where the user clicks on the pad the pad always reports a left
button event, since there is only 1 hardware button beneath the path.
It looks like Dell has put 2 buttons under the pad, one under each bottom
corner, causing this.
Since this however still clearly is a real clickpad hardware-wise, we still
want to report it as such to userspace, so that things like finger movement
in the bottom area can be properly ignored as it should be on clickpads.
So deal with this weirdness by simply mapping a right click to a left click
on elantech clickpads. As an added advantage this is something which we can
simply do on all elantech clickpads, so no need to add special quirks for
this weird model.
Reported-and-tested-by: Elder Marco <eldermarco@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 81a9c5e72bdf7109a65102ca61d8cbd722cf4021 upstream.
On uniprocessor preemptible kernel, target core deadlocks on unload. The
following events happen:
* iscsit_del_np is called
* it calls send_sig(SIGINT, np->np_thread, 1);
* the scheduler switches to the np_thread
* the np_thread is woken up, it sees that kthread_should_stop() returns
false, so it doesn't terminate
* the np_thread clears signals with flush_signals(current); and goes back
to sleep in iscsit_accept_np
* the scheduler switches back to iscsit_del_np
* iscsit_del_np calls kthread_stop(np->np_thread);
* the np_thread is waiting in iscsit_accept_np and it doesn't respond to
kthread_stop
The deadlock could be resolved if the administrator sends SIGINT signal to
the np_thread with killall -INT iscsi_np
The reproducible deadlock was introduced in commit
db6077fd0b7dd41dc6ff18329cec979379071f87, but the thread-stopping code was
racy even before.
This patch fixes the problem. Using kthread_should_stop to stop the
np_thread is unreliable, so we test np_thread_state instead. If
np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 683497566d48f86e04d026de1ee658dd74fc1077 upstream.
This patch adds a explicit memset to the login response PDU
exception path in iscsit_tx_login_rsp().
This addresses a regression bug introduced in commit baa4d64b
where the initiator would end up not receiving the login
response and associated status class + detail, before closing
the login connection.
Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Tested-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 97c99b47ac58bacb7c09e1f47d5d184434f6b06a upstream.
This patch changes iscsit_check_dataout_hdr() to dump the incoming
Data-Out payload when the received ITT is not associated with a
WRITE, instead of calling iscsit_reject_cmd() for the non WRITE
ITT descriptor.
This addresses a bug where an initiator sending an Data-Out for
an ITT associated with a READ would end up generating a reject
for the READ, eventually resulting in list corruption.
Reported-by: Santosh Kulkarni <santosh.kulkarni@calsoftinc.com>
Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 83ff42fcce070801a3aa1cd6a3269d7426271a8d upstream.
This patch fixes a left-over se_lun->lun_sep pointer OOPs when one
of the /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT/lun/$LUN/alua*
attributes is accessed after the $DEVICE symlink has been removed.
To address this bug, go ahead and clear se_lun->lun_sep memory in
core_dev_unexport(), so that the existing checks for show/store
ALUA attributes in target_core_fabric_configfs.c work as expected.
Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 27e35715df54cbc4f2d044f681802ae30479e7fb upstream.
When the rtmutex fast path is enabled the slow unlock function can
create the following situation:
spin_lock(foo->m->wait_lock);
foo->m->owner = NULL;
rt_mutex_lock(foo->m); <-- fast path
free = atomic_dec_and_test(foo->refcnt);
rt_mutex_unlock(foo->m); <-- fast path
if (free)
kfree(foo);
spin_unlock(foo->m->wait_lock); <--- Use after free.
Plug the race by changing the slow unlock to the following scheme:
while (!rt_mutex_has_waiters(m)) {
/* Clear the waiters bit in m->owner */
clear_rt_mutex_waiters(m);
owner = rt_mutex_owner(m);
spin_unlock(m->wait_lock);
if (cmpxchg(m->owner, owner, 0) == owner)
return;
spin_lock(m->wait_lock);
}
So in case of a new waiter incoming while the owner tries the slow
path unlock we have two situations:
unlock(wait_lock);
lock(wait_lock);
cmpxchg(p, owner, 0) == owner
mark_rt_mutex_waiters(lock);
acquire(lock);
Or:
unlock(wait_lock);
lock(wait_lock);
mark_rt_mutex_waiters(lock);
cmpxchg(p, owner, 0) != owner
enqueue_waiter();
unlock(wait_lock);
lock(wait_lock);
wakeup_next waiter();
unlock(wait_lock);
lock(wait_lock);
acquire(lock);
If the fast path is disabled, then the simple
m->owner = NULL;
unlock(m->wait_lock);
is sufficient as all access to m->owner is serialized via
m->wait_lock;
Also document and clarify the wakeup_next_waiter function as suggested
by Oleg Nesterov.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3d5c9340d1949733eb37616abd15db36aef9a57c upstream.
Even in the case when deadlock detection is not requested by the
caller, we can detect deadlocks. Right now the code stops the lock
chain walk and keeps the waiter enqueued, even on itself. Silly not to
yell when such a scenario is detected and to keep the waiter enqueued.
Return -EDEADLK unconditionally and handle it at the call sites.
The futex calls return -EDEADLK. The non futex ones dequeue the
waiter, throw a warning and put the task into a schedule loop.
Tagged for stable as it makes the code more robust.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brad Mouring <bmouring@ni.com>
Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 82084984383babe728e6e3c9a8e5c46278091315 upstream.
When we walk the lock chain, we drop all locks after each step. So the
lock chain can change under us before we reacquire the locks. That's
harmless in principle as we just follow the wrong lock path. But it
can lead to a false positive in the dead lock detection logic:
T0 holds L0
T0 blocks on L1 held by T1
T1 blocks on L2 held by T2
T2 blocks on L3 held by T3
T4 blocks on L4 held by T4
Now we walk the chain
lock T1 -> lock L2 -> adjust L2 -> unlock T1 ->
lock T2 -> adjust T2 -> drop locks
T2 times out and blocks on L0
Now we continue:
lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all.
Brad tried to work around that in the deadlock detection logic itself,
but the more I looked at it the less I liked it, because it's crystal
ball magic after the fact.
We actually can detect a chain change very simple:
lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
next_lock = T2->pi_blocked_on->lock;
drop locks
T2 times out and blocks on L0
Now we continue:
lock T2 ->
if (next_lock != T2->pi_blocked_on->lock)
return;
So if we detect that T2 is now blocked on a different lock we stop the
chain walk. That's also correct in the following scenario:
lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
next_lock = T2->pi_blocked_on->lock;
drop locks
T3 times out and drops L3
T2 acquires L3 and blocks on L4 now
Now we continue:
lock T2 ->
if (next_lock != T2->pi_blocked_on->lock)
return;
We don't have to follow up the chain at that point, because T2
propagated our priority up to T4 already.
[ Folded a cleanup patch from peterz ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Brad Mouring <bmouring@ni.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.de
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 397335f004f41e5fcf7a795e94eb3ab83411a17c upstream.
The current deadlock detection logic does not work reliably due to the
following early exit path:
/*
* Drop out, when the task has no waiters. Note,
* top_waiter can be NULL, when we are in the deboosting
* mode!
*/
if (top_waiter && (!task_has_pi_waiters(task) ||
top_waiter != task_top_pi_waiter(task)))
goto out_unlock_pi;
So this not only exits when the task has no waiters, it also exits
unconditionally when the current waiter is not the top priority waiter
of the task.
So in a nested locking scenario, it might abort the lock chain walk
and therefor miss a potential deadlock.
Simple fix: Continue the chain walk, when deadlock detection is
enabled.
We also avoid the whole enqueue, if we detect the deadlock right away
(A-A). It's an optimization, but also prevents that another waiter who
comes in after the detection and before the task has undone the damage
observes the situation and detects the deadlock and returns
-EDEADLOCK, which is wrong as the other task is not in a deadlock
situation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
[ Upstream commit 945b2b2d259d1a4364a2799e80e8ff32f8c6ee6f ]
Quoting Samu Kallio:
Basically what's happening is, during netns cleanup,
nf_nat_net_exit gets called before ipv4_net_exit. As I understand
it, nf_nat_net_exit is supposed to kill any conntrack entries which
have NAT context (through nf_ct_iterate_cleanup), but for some
reason this doesn't happen (perhaps something else is still holding
refs to those entries?).
When ipv4_net_exit is called, conntrack entries (including those
with NAT context) are cleaned up, but the
nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The
bug happens when attempting to free a conntrack entry whose NAT hash
'prev' field points to a slot in the freed hash table (head for that
bin).
We ignore conntracks with null nat bindings. But this is wrong,
as these are in bysource hash table as well.
Restore nat-cleaning for the netns-is-being-removed case.
bug:
https://bugzilla.kernel.org/show_bug.cgi?id=65191
Cc: <stable@vger.kernel.org> # 3.15.x
Cc: <stable@vger.kernel.org> # 3.14.x
Cc: <stable@vger.kernel.org> # 3.12.x
Cc: <stable@vger.kernel.org> # 3.10.x
Fixes: c2d421e1718 ('netfilter: nf_nat: fix race when unloading protocol modules')
Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
[ Upstream commit 9802d21e7a0b0d2167ef745edc1f4ea7a0fc6ea3 ]
The tot_stats estimator is started only when CONFIG_SYSCTL
is defined. But it is stopped without checking CONFIG_SYSCTL.
Fix the crash by moving ip_vs_stop_estimator into
ip_vs_control_net_cleanup_sysctl.
The change is needed after commit 14e405461e664b
("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.
Cc: <stable@vger.kernel.org> # 3.15.x
Cc: <stable@vger.kernel.org> # 3.14.x
Cc: <stable@vger.kernel.org> # 3.12.x
Cc: <stable@vger.kernel.org> # 3.10.x
Cc: <stable@vger.kernel.org> # 3.2.x
Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Sgned-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
This reverts commit 0e2e24e5dc6eb6f0698e9dc97e652f132b885624, which
was applied twice mistakenly. The first one is
bee3f7b8188d4b2a5dfaeb2eb4a68d99f67daecf.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
|
|
|