linux-toradex.git/kernel/auditsc.c, branch v6.5-rc4

capability: just use a 'u64' instead of a 'u32[2]' array

2023-03-01T18:01:22+00:00

Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words.  It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more.  And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation.  It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently.  By just using a saner data structure, it went from

	unsigned __capi;
	CAP_FOR_EACH_U32(__capi) {
		if (a.cap[__capi] != b.cap[__capi])
			return false;
	}
	return true;

to just being

	return a.val == b.val;

instead.  Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik 
Cc: Casey Schaufler 
Cc: Serge Hallyn 
Cc: Al Viro 
Cc: Paul Moore 
Signed-off-by: Linus Torvalds

Merge tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

2023-02-20T20:38:27+00:00

Pull fsnotify updates from Jan Kara:
 "Support for auditing decisions regarding fanotify permission events"

* tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify,audit: Allow audit to use the full permission event response
  fanotify: define struct members to hold response decision context
  fanotify: Ensure consistent variable type for response

fanotify,audit: Allow audit to use the full permission event response

2023-02-07T11:53:53+00:00

This patch passes the full response so that the audit function can use all
of it. The audit function was updated to log the additional information in
the AUDIT_FANOTIFY record.

Currently the only type of fanotify info that is defined is an audit
rule number, but convert it to hex encoding to future-proof the field.
Hex encoding suggested by Paul Moore .

The {subj,obj}_trust values are {0,1,2}, corresponding to no, yes, unknown.

Sample records:
  type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5
  type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2

Suggested-by: Steve Grubb 
Link: https://lore.kernel.org/r/3075502.aeNJFYEL58@x2
Tested-by: Steve Grubb 
Acked-by: Steve Grubb 
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Jan Kara 
Message-Id:

fanotify: Ensure consistent variable type for response

2023-02-07T11:53:32+00:00

The user space API for the response variable is __u32. This patch makes
sure that the whole path through the kernel uses u32 so that there is
no sign extension or truncation of the user space response.

Suggested-by: Steve Grubb 
Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2
Signed-off-by: Richard Guy Briggs 
Acked-by: Paul Moore 
Tested-by: Steve Grubb 
Acked-by: Steve Grubb 
Signed-off-by: Jan Kara 
Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>

fs: port xattr to mnt_idmap

2023-01-19T08:24:28+00:00

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Christian Brauner (Microsoft)

audit: unify audit_filter_{uring(), inode_name(), syscall()}

2022-10-17T18:24:42+00:00

audit_filter_uring(), audit_filter_inode_name() are substantially
similar to audit_filter_syscall(). Move the core logic to
__audit_filter_op() which can be parametrized for all three.

On a Skylakex system, getpid() latency (all results aggregated
across 12 boot cycles):

         Min     Mean    Median   Max      pstdev
         (ns)    (ns)    (ns)     (ns)

 -    196.63   207.86  206.60  230.98      (+- 3.92%)
 +    183.73   196.95  192.31  232.49	   (+- 6.04%)

Performance counter stats for 'bin/getpid' (3 runs) go from:
    cycles               805.58  (  +-  4.11% )
    instructions        1654.11  (  +-   .05% )
    IPC                    2.06  (  +-  3.39% )
    branches             430.02  (  +-   .05% )
    branch-misses          1.55  (  +-  7.09% )
    L1-dcache-loads      440.01  (  +-   .09% )
    L1-dcache-load-misses  9.05  (  +- 74.03% )
to:
    cycles		 765.37  (  +-  6.66% )
    instructions        1677.07  (  +-  0.04% )
    IPC		           2.20  (  +-  5.90% )
    branches	         431.10  (  +-  0.04% )
    branch-misses	   1.60  (  +- 11.25% )
    L1-dcache-loads	 521.04  (  +-  0.05% )
    L1-dcache-load-misses  6.92  (  +- 77.60% )

(Both aggregated over 12 boot cycles.)

The increased L1-dcache-loads are due to some intermediate values now
coming from the stack.

The improvement in cycles is due to a slightly denser loop (the list
parameter in the list_for_each_entry_rcu() exit check now comes from
a register rather than a constant as before.)

Signed-off-by: Ankur Arora 
Signed-off-by: Paul Moore

audit: cache ctx->major in audit_filter_syscall()

2022-10-17T18:22:08+00:00

ctx->major contains the current syscall number. This is, of course, a
constant for the duration of the syscall. Unfortunately, GCC's alias
analysis cannot prove that it is not modified via a pointer in the
audit_filter_syscall() loop, and so always loads it from memory.

In and of itself the load isn't very expensive (ops dependent on the
ctx->major load are only used to determine the direction of control flow
and have short dependence chains and, in any case the related branches
get predicted perfectly in the fastpath) but still cache ctx->major
in a local for two reasons:

* ctx->major is in the first cacheline of struct audit_context and has
  similar alignment as audit_entry::list audit_entry. For cases
  with a lot of audit rules, doing this reduces one source of contention
  from a potentially busy cache-set.

* audit_in_mask() (called in the hot loop in audit_filter_syscall())
  does cast manipulation and error checking on ctx->major:

     audit_in_mask(const struct audit_krule *rule, unsigned long val):
             if (val > 0xffffffff)
                     return false;

             word = AUDIT_WORD(val);
             if (word >= AUDIT_BITMASK_SIZE)
                     return false;

             bit = AUDIT_BIT(val);

             return rule->mask[word] & bit;

  The clauses related to the rule need to be evaluated in the loop, but
  the rest is unnecessarily re-evaluated for every loop iteration.
  (Note, however, that most of these are cheap ALU ops and the branches
   are perfectly predicted. However, see discussion on cycles
   improvement below for more on why it is still worth hoisting.)

On a Skylakex system change in getpid() latency (aggregated over
12 boot cycles):

             Min     Mean  Median     Max       pstdev
            (ns)     (ns)    (ns)    (ns)

 -        201.30   216.14  216.22  228.46      (+- 1.45%)
 +        196.63   207.86  206.60  230.98      (+- 3.92%)

Performance counter stats for 'bin/getpid' (3 runs) go from:
    cycles               836.89  (  +-   .80% )
    instructions        2000.19  (  +-   .03% )
    IPC                    2.39  (  +-   .83% )
    branches             430.14  (  +-   .03% )
    branch-misses          1.48  (  +-  3.37% )
    L1-dcache-loads      471.11  (  +-   .05% )
    L1-dcache-load-misses  7.62  (  +- 46.98% )

 to:
    cycles               805.58  (  +-  4.11% )
    instructions        1654.11  (  +-   .05% )
    IPC                    2.06  (  +-  3.39% )
    branches             430.02  (  +-   .05% )
    branch-misses          1.55  (  +-  7.09% )
    L1-dcache-loads      440.01  (  +-   .09% )
    L1-dcache-load-misses  9.05  (  +- 74.03% )

(Both aggregated over 12 boot cycles.)

instructions: we reduce around 8 instructions/iteration because some of
the computation is now hoisted out of the loop (branch count does not
change because GCC, for reasons unclear, only hoists the computations
while keeping the basic-blocks.)

cycles: improve by about 5% (in aggregate and looking at individual run
numbers.) This is likely because we now waste fewer pipeline resources
on unnecessary instructions which allows the control flow to
speculatively execute further ahead shortening the execution of the loop
a little. The final gating factor on the performance of this loop
remains the long dependence chain due to the linked-list load.

Signed-off-by: Ankur Arora 
Signed-off-by: Paul Moore

Merge tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

2022-10-04T18:05:43+00:00

Pull audit updates from Paul Moore:
 "Six audit patches for v6.1, most are pretty trivial, but a quick list
  of the highlights are below:

   - Only free the audit proctitle information on task exit. This allows
     us to cache the information and improve performance slightly.

   - Use the time_after() macro to do time comparisons instead of doing
     it directly and potentially causing ourselves problems when the
     timer wraps.

   - Convert an audit_context state comparison from a relative enum
     comparison, e.g. (x < y), to a not-equal comparison to ensure that
     we are not caught out at some unknown point in the future by an
     enum shuffle.

   - A handful of small cleanups such as tidying up comments and
     removing unused declarations"

* tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: remove selinux_audit_rule_update() declaration
  audit: use time_after to compare time
  audit: free audit_proctitle only on task exit
  audit: explicitly check audit_context->context enum value
  audit: audit_context pid unused, context enum comment fix
  audit: fix repeated words in comments

audit: free audit_proctitle only on task exit

2022-08-26T21:18:54+00:00

Since audit_proctitle is generated at syscall exit time, its value is
used immediately and cached for the next syscall.  Since this is the
case, then only clear it at task exit time.  Otherwise, there is no
point in caching the value OR bearing the overhead of regenerating it.

Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Paul Moore

audit: explicitly check audit_context->context enum value

2022-08-26T21:17:11+00:00

Be explicit in checking the struct audit_context "context" member enum
value rather than assuming the order of context enum values.

Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Paul Moore