linux-toradex.git/fs/proc, branch v3.4.108

pagemap: do not leak physical addresses to non-privileged userspace

2015-04-14T09:34:02+00:00

commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov 
Acked-by: Konstantin Khlebnikov 
Acked-by: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Andrew Morton 
Cc: Mark Seaborn 
Signed-off-by: Linus Torvalds 
Signed-off-by: Zefan Li 
[mancha: Backported to 3.10]
Signed-off-by: mancha security

genirq: Prevent proc race against freeing of irq descriptors

2015-04-14T09:33:46+00:00

commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner 
[lizf: Backported to 3.4:
 - define kstat_irqs() for CONFIG_GENERIC_HARDIRQS
 - add ifdef/endif CONFIG_SPARSE_IRQ]
Signed-off-by: Zefan Li

proc: pid/status: show all supplementary groups

2014-04-14T13:44:15+00:00

commit 8d238027b87e654be552eabdf492042a34c5c300 upstream.

We display a list of supplementary group for each process in
/proc//status.  However, we show only the first 32 groups, not all of
them.

Although this is rare, but sometimes processes do have more than 32
supplementary groups, and this kernel limitation breaks user-space apps
that rely on the group list in /proc//status.

Number 32 comes from the internal NGROUPS_SMALL macro which defines the
length for the internal kernel "small" groups buffer.  There is no
apparent reason to limit to this value.

This patch removes the 32 groups printing limit.

The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX,
which is currently set to 65536.  And this is the maximum count of groups
we may possibly print.

Signed-off-by: Artem Bityutskiy 
Acked-by: Serge E. Hallyn 
Acked-by: Kees Cook 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
Cc: Qiang Huang 
Cc: Li Zefan 
Cc: Jianguo Wu 
Signed-off-by: Greg Kroah-Hartman

fs/proc/base.c: fix GPF in /proc/$PID/map_files

2014-03-24T04:37:06+00:00

commit 70335abb2689c8cd5df91bf2d95a65649addf50b upstream.

The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev 
Signed-off-by: Aleksandr Terekhov 
Reported-by: 
Acked-by: Pavel Emelyanov 
Acked-by: Cyrill Gorcunov 
Reviewed-by: "Eric W. Biederman" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs,proc: guarantee unique inodes in /proc

2013-11-29T18:50:30+00:00

commit 51f0885e5415b4cc6535e9cdcc5145bfbc134353 upstream.

Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc//net/ for example.

This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems.  We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.

In short: don't do this.  Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way.  But we're kind of stuck with it.

This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number.  That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.

That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable.  We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.

Reported-and-tested-by: Dave Jones 
Analyzed-by: Al Viro 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Adjust context
 - Never drop the pde reference in proc_get_inode(), as callers only
   expect this when we return an existing inode, and we never do that now]
Signed-off-by: Ben Hutchings 
Cc: Rui Xiang 
Signed-off-by: Greg Kroah-Hartman

fs/proc/task_mmu.c: fix buffer overflow in add_page_map()

2013-08-20T15:26:27+00:00

commit 8c8296223f3abb142be8fc31711b18a704c0e7d8 upstream.

Recently we met quite a lot of random kernel panic issues after enabling
CONFIG_PROC_PAGE_MONITOR.  After debuggind we found this has something
to do with following bug in pagemap:

In struct pagemapread:

  struct pagemapread {
      int pos, len;
      pagemap_entry_t *buffer;
      bool v2;
  };

pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
buffer, it is a mistake to compare pos and len in add_page_map() for
checking buffer is full or not, and this can lead to buffer overflow and
random kernel panic issue.

Correct len to be total number of PM_ENTRY_BYTES in buffer.

[akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
Signed-off-by: Yonghua Zheng 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

nohz: Fix idle ticks in cpu summary line of /proc/stat

2012-10-28T17:14:12+00:00

commit 7386cdbf2f57ea8cff3c9fde93f206e58b9fe13f upstream.

Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
conditional" introduced a bug in regard to cpu hotplug. The effect is
that the number of idle ticks in the cpu summary line in /proc/stat is
still counting ticks for offline cpus.

Reproduction is easy, just start a workload that keeps all cpus busy,
switch off one or more cpus and then watch the idle field in top.
On a dual-core with one cpu 100% busy and one offline cpu you will get
something like this:

%Cpu(s): 48.7 us,  1.3 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi,  0.0 si,
%0.0 st

The problem is that an offline cpu still has ts->idle_active == 1.
To fix this we should make sure that the cpu is online when calling
get_cpu_idle_time_us and get_cpu_iowait_time_us.

[Srivatsa: Rebased to current mainline]

Reported-by: Martin Schwidefsky 
Signed-off-by: Michal Hocko 
Reviewed-by: Srivatsa S. Bhat 
Signed-off-by: Srivatsa S. Bhat 
Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
Cc: deepthi@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

kpageflags: fix wrong KPF_THP on non-huge compound pages

2012-10-12T20:38:50+00:00

commit 7a71932d5676b7410ab64d149bad8bde6b0d8632 upstream.

KPF_THP can be set on non-huge compound pages (like slab pages or pages
allocated by drivers with __GFP_COMP) because PageTransCompound only
checks PG_head and PG_tail.  Obviously this is a bug and breaks user space
applications which look for thp via /proc/kpageflags.

This patch rules out setting KPF_THP wrongly by additionally checking
PageLRU on the head pages.

Signed-off-by: Naoya Horiguchi 
Acked-by: KOSAKI Motohiro 
Acked-by: David Rientjes 
Reviewed-by: Fengguang Wu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs/proc: fix potential unregister_sysctl_table hang

2012-10-02T17:29:54+00:00

commit 6bf6104573482570f7103d3e5ddf9574db43a363 upstream.

The unregister_sysctl_table() function hangs if all references to its
ctl_table_header structure are not dropped.

This can happen sometimes because of a leak in proc_sys_lookup():
proc_sys_lookup() gets a reference to the table via lookup_entry(), but
it does not release it when a subsequent call to sysctl_follow_link()
fails.

This patch fixes this leak by making sure the reference is always
dropped on return.

See also commit 076c3eed2c31 ("sysctl: Rewrite proc_sys_lookup
introducing find_entry and lookup_entry") which reorganized this code in
3.4.

Tested in Linux 3.4.4.

Signed-off-by: Francesco Ruggeri 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs: Fix /proc//fdinfo/ file handling

2012-06-09T15:36:18+00:00

commit 0640113be25d283e0ff77a9f041e1242182387f0 upstream.

Cyrill Gorcunov reports that I broke the fdinfo files with commit
30a08bf2d31d ("proc: move fd symlink i_mode calculations into
tid_fd_revalidate()"), and he's quite right.

The tid_fd_revalidate() function is not just used for the /fd
symlinks, it's also used for the /fdinfo/ files, and the
permission model for those are different.

So do the dynamic symlink permission handling just for symlinks, making
the fdinfo files once more appear as the proper regular files they are.

Of course, Al Viro argued (probably correctly) that we shouldn't do the
symlink permission games at all, and make the symlinks always just be
the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
since somebody noticed that the permissions had changed (which was the
reason for that original commit 30a08bf2d31d in the first place), people
do apparently use this feature.

[ Basically, you can use the symlink permission data as a cheap "fdinfo"
  replacement, since you see whether the file is open for reading and/or
  writing by just looking at st_mode of the symlink.  So the feature
  does make sense, even if the pain it has caused means we probably
  shouldn't have done it to begin with. ]

Reported-and-tested-by: Cyrill Gorcunov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman