linux-toradex.git/fs/proc, branch v3.0.18

proc: clear_refs: do not clear reserved pages

2012-01-26T01:25:05+00:00

commit 85e72aa5384b1a614563ad63257ded0e91d1a620 upstream.

/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.

On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts.  Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in.  This results in a system deadlock
on the next exception.

To see this problem in action, just run:

	$ echo 1 > /proc/self/clear_refs

on an ARM platform (as any user) and watch your system hang.  I think this
has been the case since 2.6.37

This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM.  Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.

Signed-off-by: Will Deacon 
Reported-by: Moussa Ba 
Acked-by: Hugh Dickins 
Cc: David Rientjes 
Cc: Russell King 
Acked-by: Nicolas Pitre 
Cc: Matt Mackall 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

proc: clean up and fix /proc//mem handling

2012-01-26T01:24:55+00:00

commit e268337dfe26dfc7efd422a804dbb27977a3cccc upstream.

Jüri Aedla reported that the /proc//mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.

Reported-by: Jüri Aedla 
Cc: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fix cputime overflow in uptime_proc_show

2012-01-26T01:24:53+00:00

commit c3e0ef9a298e028a82ada28101ccd5cf64d209ee upstream.

For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.

Cc: Michael Abbott 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

fs/proc/meminfo.c: fix compilation error

2011-12-21T20:57:35+00:00

commit b53fc7c2974a50913f49e1d800fe904a28c338e3 upstream.

Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino 
Cc: Andrea Arcangeli 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs: show O_CLOEXE bit properly in /proc//fdinfo/ files

2011-11-11T17:36:30+00:00

commit 1117f72ea0217ba0cc19f05adbbd8b9a397f5ab7 upstream.

The CLOEXE bit is magical, and for performance (and semantic) reasons we
don't actually maintain it in the file descriptor itself, but in a
separate bit array.  Which means that when we show f_flags, the CLOEXE
status is shown incorrectly: we show the status not as it is now, but as
it was when the file was opened.

Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
array.

Uli needs this in order to re-implement the pfiles program:

  "For normal file descriptors (not sockets) this was the last piece of
   information which wasn't available.  This is all part of my 'give
   Solaris users no reason to not switch' effort.  I intend to offer the
   code to the util-linux-ng maintainers."

Requested-by: Ulrich Drepper 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

/proc/self/numa_maps: restore "huge" tag for hugetlb vmas

2011-11-11T17:36:15+00:00

commit fc360bd9cdcf875639a77f07fafec26699c546f3 upstream.

The display of the "huge" tag was accidentally removed in 29ea2f698 ("mm:
use walk_page_range() instead of custom page table walking code").

Reported-by: Stephen Hemminger 
Tested-by: Stephen Hemminger 
Reviewed-by: Stephen Wilson 
Cc: KOSAKI Motohiro 
Cc: Hugh Dickins 
Acked-by: David Rientjes 
Cc: Lee Schermerhorn 
Cc: Alexey Dobriyan 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

teach /proc/$pid/numa_maps about transparent hugepages

2011-10-03T18:40:38+00:00

commit 32ef43848f283e0ef945d3c67e851c143fea3970 upstream.

This is modeled after the smaps code.

It detects transparent hugepages and then does a single gather_stats()
for the page as a whole.  This has two benifits:
 1. It is more efficient since it does many pages in a single shot.
 2. It does not have to break down the huge page.

Signed-off-by: Dave Hansen 
Acked-by: Hugh Dickins 
Acked-by: David Rientjes 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

break out numa_maps gather_pte_stats() checks

2011-10-03T18:40:38+00:00

commit 3200a8aaab0c9ccdc0f59b0dac2d4a47029137fa upstream.

gather_pte_stats() does a number of checks on a target page
to see whether it should even be considered for statistics.
This breaks that code out in to a separate function so that
we can use it in the transparent hugepage case in the next
patch.

Signed-off-by: Dave Hansen 
Acked-by: Hugh Dickins 
Reviewed-by: Christoph Lameter 
Acked-by: David Rientjes 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

make /proc/$pid/numa_maps gather_stats() take variable page size

2011-10-03T18:40:38+00:00

commit eb4866d0066ffd5446751c102d64feb3318d8bd1 upstream.

We need to teach the numa_maps code about transparent huge pages.  The
first step is to teach gather_stats() that the pte it is dealing with
might represent more than one page.

Note that will we use this in a moment for transparent huge pages since
they have use a single pmd_t which _acts_ as a "surrogate" for a bunch
of smaller pte_t's.

I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes
for hugetlbfs pages and PAGE_SIZE for normal pages.  That means that to
figure out how many _bytes_ "dirty=1" means, you must first know the
hugetlbfs page size.  That's easier said than done especially if you
don't have visibility in to the mount.

But, that's probably a discussion for another day especially since it
would change behavior to fix it.  But, just in case anyone wonders why
this patch only passes a '1' in the hugetlb case...

Signed-off-by: Dave Hansen 
Acked-by: Hugh Dickins 
Acked-by: David Rientjes 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

proc: fix a race in do_io_accounting()

2011-08-05T04:58:40+00:00

commit 293eb1e7772b25a93647c798c7b89bf26c2da2e0 upstream.

If an inode's mode permits opening /proc/PID/io and the resulting file
descriptor is kept across execve() of a setuid or similar binary, the
ptrace_may_access() check tries to prevent using this fd against the
task with escalated privileges.

Unfortunately, there is a race in the check against execve().  If
execve() is processed after the ptrace check, but before the actual io
information gathering, io statistics will be gathered from the
privileged process.  At least in theory this might lead to gathering
sensible information (like ssh/ftp password length) that wouldn't be
available otherwise.

Holding task->signal->cred_guard_mutex while gathering the io
information should protect against the race.

The order of locking is similar to the one inside of ptrace_attach():
first goes cred_guard_mutex, then lock_task_sighand().

Signed-off-by: Vasiliy Kulikov 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman