linux-toradex.git/Documentation, branch v3.12.33

Documentation: lzo: document part of the encoding

2014-10-31T14:11:17+00:00

commit d98a0526434d27e261f622cf9d2e0028b5ff1a00 upstream.

Add a complete description of the LZO format as processed by the
decompressor. I have not found a public specification of this format
hence this analysis, which will be used to better understand the code.

Cc: Willem Pinckaers 
Cc: "Don A. Bailey" 
Signed-off-by: Willy Tarreau 
Signed-off-by: Jiri Slaby

kvm: fix potentially corrupt mmio cache

2014-10-31T14:11:08+00:00

commit ee3d1570b58677885b4552bce8217fda7b226a68 upstream.

vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
                       |
...                                         |
                                  |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Signed-off-by: David Matlack 
Reviewed-by: Xiao Guangrong 
Reviewed-by: David Matlack 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Jiri Slaby

ALSA: virtuoso: add Xonar Essence STX II support

2014-09-03T19:31:16+00:00

commit f42bb22243d2ae264d721b055f836059fe35321f upstream.

Just add the PCI ID for the STX II.  It appears to work the same as the
STX, except for the addition of the not-yet-supported daughterboard.

Tested-by: Mario 
Tested-by: corubba 
Signed-off-by: Clemens Ladisch 
Signed-off-by: Takashi Iwai 
Signed-off-by: Jiri Slaby

DMA-API: provide a helper to set both DMA and coherent DMA masks

2014-08-26T12:12:07+00:00

commit 4aa806b771d16b810771d86ce23c4c3160888db3 upstream.

Provide a helper to set both the DMA and coherent DMA masks to the
same value - this avoids duplicated code in a number of drivers,
sometimes with buggy error handling, and also allows us identify
which drivers do things differently.

Signed-off-by: Russell King 
Signed-off-by: Jiri Slaby

x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

2014-08-19T12:23:37+00:00

commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst 
Signed-off-by: H. Peter Anvin 
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk 
Cc: Borislav Petkov 
Cc: Andrew Lutomriski 
Cc: Linus Torvalds 
Cc: Dirk Hohndel 
Cc: Arjan van de Ven 
Cc: comex 
Cc: Alexander van Heukelum 
Cc: Boris Ostrovsky 
Cc:  # consider after upstream merge
Signed-off-by: Jiri Slaby

mm, pcp: allow restoring percpu_pagelist_fraction default

2014-07-18T13:51:01+00:00

commit 7cd2b0a34ab8e4db971920eef8982f985441adfb upstream.

Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

    divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
    RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
    RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
    RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
    RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
    R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
    FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
    Call Trace:
      proc_sys_call_handler+0xb3/0xc0
      proc_sys_write+0x14/0x20
      vfs_write+0xba/0x1e0
      SyS_write+0x46/0xb0
      tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.

This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity.  If a value in the range [1, 7]
is written, the sysctl will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Signed-off-by: David Rientjes 
Reported-by: Oleg Drokin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

Documentation/SubmittingPatches: describe the Fixes: tag

2014-07-17T11:43:20+00:00

commit 8401aa1f59975c03eeebd3ac6d264cbdfe9af5de upstream.

Update the SubmittingPatches process to include howto about the new
'Fixes:' tag to be used when a patch fixes an issue in a previous commit
(found by git-bisect for example).

Signed-off-by: Jacob Keller 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
Cc: Randy Dunlap 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO)

2014-07-02T10:06:11+00:00

commit 3ba08129e38437561df44c36b7ea9081185d5333 upstream.

Currently memory error handler handles action optional errors in the
deferred manner by default.  And if a recovery aware application wants
to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
However, such signal can be sent only to the main thread, so it's
problematic if the application wants to have a dedicated thread to
handler such signals.

So this patch adds dedicated thread support to memory error handler.  We
have PF_MCE_EARLY flags for each thread separately, so with this patch
AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
thread.  If you want to implement a dedicated thread, you call prctl()
to set PF_MCE_EARLY on the thread.

Memory error handler collects processes to be killed, so this patch lets
it check PF_MCE_EARLY flag on each thread in the collecting routines.

No behavioral change for all non-early kill cases.

Tony said:

: The old behavior was crazy - someone with a multithreaded process might
: well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
: that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
: that thread wasn't the main thread for the process.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi 
Reviewed-by: Tony Luck 
Cc: Kamil Iskra 
Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: Chen Gong 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

ACPI / memhotplug: add parameter to disable memory hotplug

2014-06-27T08:25:16+00:00

commit 00159a2013269bc0a617de885e4b921349192bd0 upstream.

When booting a kexec/kdump kernel on a system that has specific memory
hotplug regions the boot will fail with warnings like:

 swapper/0: page allocation failure: order:9, mode:0x84d0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
  0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
  ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
  ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
 Call Trace:
  [] dump_stack+0x19/0x1b
  [] warn_alloc_failed+0xf0/0x160
  [] ?  __alloc_pages_direct_compact+0xac/0x196
  [] __alloc_pages_nodemask+0x7ff/0xa00
  [] vmemmap_alloc_block+0x62/0xba
  [] vmemmap_alloc_block_buf+0x15/0x3b
  [] vmemmap_populate+0xb4/0x21b
  [] sparse_mem_map_populate+0x27/0x35
  [] sparse_add_one_section+0x7a/0x185
  [] __add_pages+0xaf/0x240
  [] arch_add_memory+0x59/0xd0
  [] add_memory+0xb9/0x1b0
  [] acpi_memory_device_add+0x18d/0x26d
  [] acpi_bus_device_attach+0x7d/0xcd
  [] acpi_ns_walk_namespace+0xc8/0x17f
  [] ? acpi_bus_type_and_status+0x90/0x90
  [] ? acpi_bus_type_and_status+0x90/0x90
  [] acpi_walk_namespace+0x95/0xc5
  [] acpi_bus_scan+0x8b/0x9d
  [] acpi_scan_init+0x63/0x160
  [] acpi_init+0x25d/0x2a6
  [] ? acpi_sleep_proc_init+0x2a/0x2a
  [] do_one_initcall+0xe2/0x190
  [] kernel_init_freeable+0x17c/0x207
  [] ? do_early_param+0x88/0x88
  [] ? rest_init+0x80/0x80
  [] kernel_init+0xe/0x180
  [] ret_from_fork+0x7c/0xb0
  [] ? rest_init+0x80/0x80
 Mem-Info:
 Node 0 DMA per-cpu:
 CPU    0: hi:    0, btch:   1 usd:   0
 Node 0 DMA32 per-cpu:
 CPU    0: hi:   42, btch:   7 usd:   0
 active_anon:0 inactive_anon:0 isolated_anon:0
  active_file:0 inactive_file:0 isolated_file:0
  unevictable:0 dirty:0 writeback:0 unstable:0
  free:872 slab_reclaimable:13 slab_unreclaimable:1880
  mapped:0 shmem:0 pagetables:0 bounce:0
  free_cma:0

because the system has run out of memory at boot time.  This occurs
because of the following sequence in the boot:

Main kernel boots and sets E820 map.  The second kernel is booted with a
map generated by the kdump service using memmap= and memmap=exactmap.
These parameters are added to the kernel parameters of the kexec/kdump
kernel.   The kexec/kdump kernel has limited memory resources so as not
to severely impact the main kernel.

The system then panics and the kdump/kexec kernel boots (which is a
completely new kernel boot).  During this boot ACPI is initialized and the
kernel (as can be seen above) traverses the ACPI namespace and finds an
entry for a memory device to be hotadded.

ie)

  [] __add_pages+0xaf/0x240
  [] arch_add_memory+0x59/0xd0
  [] add_memory+0xb9/0x1b0
  [] acpi_memory_device_add+0x18d/0x26d
  [] acpi_bus_device_attach+0x7d/0xcd
  [] acpi_ns_walk_namespace+0xc8/0x17f
  [] ? acpi_bus_type_and_status+0x90/0x90
  [] ? acpi_bus_type_and_status+0x90/0x90
  [] acpi_walk_namespace+0x95/0xc5
  [] acpi_bus_scan+0x8b/0x9d
  [] acpi_scan_init+0x63/0x160
  [] acpi_init+0x25d/0x2a6

At this point the kernel adds page table information and the the kexec/kdump
kernel runs out of memory.

This can also be reproduced by using the memmap=exactmap and mem=X
parameters on the main kernel and booting.

This patchset resolves the problem by adding a kernel parameter,
acpi_no_memhotplug, to disable ACPI memory hotplug.

Signed-off-by: Prarit Bhargava 
Acked-by: Vivek Goyal 
Acked-by: Toshi Kani 
Acked-by: David Rientjes 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Jiri Slaby

ima: audit log files opened with O_DIRECT flag

2014-06-20T15:34:22+00:00

commit f9b2a735bdddf836214b5dca74f6ca7712e5a08c upstream.

Files are measured or appraised based on the IMA policy.  When a
file, in policy, is opened with the O_DIRECT flag, a deadlock
occurs.

The first attempt at resolving this lockdep temporarily removed the
O_DIRECT flag and restored it, after calculating the hash.  The
second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
time.  The third attempt, by Dmitry Kasatkin, resolves the i_mutex
locking issue, by re-introducing the IMA mutex, but uncovered
another problem.  Reading a file with O_DIRECT flag set, writes
directly to userspace pages.  A second patch allocates a user-space
like memory.  This works for all IMA hooks, except ima_file_free(),
which is called on __fput() to recalculate the file hash.

Until this last issue is addressed, do not 'collect' the
measurement for measuring, appraising, or auditing files opened
with the O_DIRECT flag set.  Based on policy, permit or deny file
access.  This patch defines a new IMA policy rule option named
'permit_directio'.  Policy rules could be defined, based on LSM
or other criteria, to permit specific applications to open files
with the O_DIRECT flag set.

Changelog v1:
- permit or deny file access based IMA policy rules

Signed-off-by: Mimi Zohar 
Acked-by: Dmitry Kasatkin 
Signed-off-by: Jiri Slaby