summaryrefslogtreecommitdiff
path: root/security/smack
AgeCommit message (Collapse)Author
2025-12-05Merge tag 'pull-persistency' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-12-03Merge tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack updates from Casey Schaufler: - fix several cases where labels were treated inconsistently when imported from user space - clean up the assignment of extended attributes - documentation improvements * tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next: Smack: function parameter 'gfp' not described smack: fix kernel-doc warnings for smk_import_valid_label() smack: fix bug: setting task label silently ignores input garbage smack: fix bug: unprivileged task can create labels smack: fix bug: invalid label of unix socket file smack: always "instantiate" inode in smack_inode_init_security() smack: deduplicate xattr setting in smack_inode_init_security() smack: fix bug: SMACK64TRANSMUTE set on non-directory smack: deduplicate "does access rule request transmutation"
2025-11-16convert smackfsAl Viro
Entirely static tree populated by simple_fill_super(). Can use kill_anon_super() as-is. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-11Smack: function parameter 'gfp' not describedCasey Schaufler
Add a descrition of the gfp parameter to smk_import_allocated_label(). Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511061746.dPegBnNf-lkp@intel.com/
2025-10-22smack: move initcalls to the LSM frameworkPaul Moore
As the LSM framework only supports one LSM initcall callback for each initcall type, the init_smk_fs() and smack_nf_ip_init() functions were wrapped with a new function, smack_initcall() that is registered with the LSM framework. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johhansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22lsm: replace the name field with a pointer to the lsm_id structPaul Moore
Reduce the duplication between the lsm_id struct and the DEFINE_LSM() definition by linking the lsm_id struct directly into the individual LSM's DEFINE_LSM() instance. Linking the lsm_id into the LSM definition also allows us to simplify the security_add_hooks() function by removing the code which populates the lsm_idlist[] array and moving it into the normal LSM startup code where the LSM list is parsed and the individual LSMs are enabled, making for a cleaner implementation with less overhead at boot. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-03Merge tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull d_name audit update from Al Viro: "Simplifying ->d_name audits, easy part. Turn dentry->d_name into an anon union of const struct qsrt (d_name itself) and a writable alias (__d_name). With constification of some struct qstr * arguments of functions that get &dentry->d_name passed to them, that ends up with all modifications provably done only in fs/dcache.c (and a fairly small part of it). Any new places doing modifications will be easy to find - grep for __d_name will suffice" * tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make it easier to catch those who try to modify ->d_name generic_ci_validate_strict_name(): constify name argument afs_dir_search: constify qstr argument afs_edit_dir_{add,remove}(): constify qstr argument exfat_find(): constify qstr argument security_dentry_init_security(): constify qstr argument
2025-09-15security_dentry_init_security(): constify qstr argumentAl Viro
Nothing outside of fs/dcache.c has any business modifying dentry names; passing &dentry->d_name as an argument should have that argument declared as a const pointer. Acked-by: Casey Schaufler <casey@schaufler-ca.com> # smack part Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-30audit: add record for multiple object contextsCasey Schaufler
Create a new audit record AUDIT_MAC_OBJ_CONTEXTS. An example of the MAC_OBJ_CONTEXTS record is: type=MAC_OBJ_CONTEXTS msg=audit(1601152467.009:1050): obj_selinux=unconfined_u:object_r:user_home_t:s0 When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record the "obj=" field in other records in the event will be "obj=?". An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on an object security context. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30audit: add record for multiple task security contextsCasey Schaufler
Replace the single skb pointer in an audit_buffer with a list of skb pointers. Add the audit_stamp information to the audit_buffer as there's no guarantee that there will be an audit_context containing the stamp associated with the event. At audit_log_end() time create auxiliary records as have been added to the list. Functions are created to manage the skb list in the audit_buffer. Create a new audit record AUDIT_MAC_TASK_CONTEXTS. An example of the MAC_TASK_CONTEXTS record is: type=MAC_TASK_CONTEXTS msg=audit(1600880931.832:113) subj_apparmor=unconfined subj_smack=_ When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the "subj=" field in other records in the event will be "subj=?". An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on a subject security context. Refactor audit_log_task_context(), creating a new audit_log_subj_ctx(). This is used in netlabel auditing to provide multiple subject security contexts as necessary. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-30smack: fix kernel-doc warnings for smk_import_valid_label()Konstantin Andreev
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506251712.x5SJiNlh-lkp@intel.com/ Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-24smack: fix bug: setting task label silently ignores input garbageKonstantin Andreev
This command: # echo foo/bar >/proc/$$/attr/smack/current gives the task a label 'foo' w/o indication that label does not match input. Setting the label with lsm_set_self_attr() syscall behaves identically. This occures because: 1) smk_parse_smack() is used to convert input to a label 2) smk_parse_smack() takes only that part from the beginning of the input that looks like a label. 3) `/' is prohibited in labels, so only "foo" is taken. (2) is by design, because smk_parse_smack() is used for parsing strings which are more than just a label. Silent failure is not a good thing, and there are two indicators that this was not done intentionally: (size >= SMK_LONGLABEL) ~> invalid clause at the beginning of the do_setattr() and the "Returns the length of the smack label" claim in the do_setattr() description. So I fixed this by adding one tiny check: the taken label length == input length. Since input length is now strictly controlled, I changed the two ways of setting label smack_setselfattr(): lsm_set_self_attr() syscall smack_setprocattr(): > /proc/.../current to accommodate the divergence in what they understand by "input length": smack_setselfattr counts mandatory \0 into input length, smack_setprocattr does not. smack_setprocattr allows various trailers after label Related changes: * fixed description for smk_parse_smack * allow unprivileged tasks validate label syntax. * extract smk_parse_label_len() from smk_parse_smack() so parsing may be done w/o string allocation. * extract smk_import_valid_label() from smk_import_entry() to avoid repeated parsing. * smk_parse_smack(): scan null-terminated strings for no more than SMK_LONGLABEL(256) characters * smack_setselfattr(): require struct lsm_ctx . flags == 0 to reserve them for future. Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel") Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-24smack: fix bug: unprivileged task can create labelsKonstantin Andreev
If an unprivileged task is allowed to relabel itself (/smack/relabel-self is not empty), it can freely create new labels by writing their names into own /proc/PID/attr/smack/current This occurs because do_setattr() imports the provided label in advance, before checking "relabel-self" list. This change ensures that the "relabel-self" list is checked before importing the label. Fixes: 38416e53936e ("Smack: limited capability for changing process label") Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22smack: fix bug: invalid label of unix socket fileKonstantin Andreev
According to [1], the label of a UNIX domain socket (UDS) file (i.e., the filesystem object representing the socket) is not supposed to participate in Smack security. To achieve this, [1] labels UDS files with "*" in smack_d_instantiate(). Before [2], smack_d_instantiate() was responsible for initializing Smack security for all inodes, except ones under /proc [2] imposed the sole responsibility for initializing inode security for newly created filesystem objects on smack_inode_init_security(). However, smack_inode_init_security() lacks some logic present in smack_d_instantiate(). In particular, it does not label UDS files with "*". This patch adds the missing labeling of UDS files with "*" to smack_inode_init_security(). Labeling UDS files with "*" in smack_d_instantiate() still works for stale UDS files that already exist on disk. Stale UDS files are useless, but I keep labeling them for consistency and maybe to make easier for user to delete them. Compared to [1], this version introduces the following improvements: * UDS file label is held inside inode only and not saved to xattrs. * relabeling UDS files (setxattr, removexattr, etc.) is blocked. [1] 2010-11-24 Casey Schaufler commit b4e0d5f0791b ("Smack: UDS revision") [2] 2023-11-16 roberto.sassu Fixes: e63d86b8b764 ("smack: Initialize the in-memory inode in smack_inode_init_security()") Link: https://lore.kernel.org/linux-security-module/20231116090125.187209-5-roberto.sassu@huaweicloud.com/ Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22smack: always "instantiate" inode in smack_inode_init_security()Konstantin Andreev
If memory allocation for the SMACK64TRANSMUTE xattr value fails in smack_inode_init_security(), the SMK_INODE_INSTANT flag is not set in (struct inode_smack *issp)->smk_flags, leaving the inode as not "instantiated". It does not matter if fs frees the inode after failed smack_inode_init_security() call, but there is no guarantee for this. To be safe, mark the inode as "instantiated", even if allocation of xattr values fails. Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22smack: deduplicate xattr setting in smack_inode_init_security()Konstantin Andreev
Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22smack: fix bug: SMACK64TRANSMUTE set on non-directoryKonstantin Andreev
When a new file system object is created and the conditions for label transmutation are met, the SMACK64TRANSMUTE extended attribute is set on the object regardless of its type: file, pipe, socket, symlink, or directory. However, SMACK64TRANSMUTE may only be set on directories. This bug is a combined effect of the commits [1] and [2] which both transfer functionality from smack_d_instantiate() to smack_inode_init_security(), but only in part. Commit [1] set blank SMACK64TRANSMUTE on improper object types. Commit [2] set "TRUE" SMACK64TRANSMUTE on improper object types. [1] 2023-06-10, Fixes: baed456a6a2f ("smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()") Link: https://lore.kernel.org/linux-security-module/20230610075738.3273764-3-roberto.sassu@huaweicloud.com/ [2] 2023-11-16, Fixes: e63d86b8b764 ("smack: Initialize the in-memory inode in smack_inode_init_security()") Link: https://lore.kernel.org/linux-security-module/20231116090125.187209-5-roberto.sassu@huaweicloud.com/ Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22smack: deduplicate "does access rule request transmutation"Konstantin Andreev
Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-05-19security/smack/smackfs: small kernel-doc fixesRandy Dunlap
Add function short descriptions to the kernel-doc where missing. Correct a verb and add ending periods to sentences. smackfs.c:1080: warning: missing initial short description on line: * smk_net4addr_insert smackfs.c:1343: warning: missing initial short description on line: * smk_net6addr_insert Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: linux-security-module@vger.kernel.org Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com>
2025-04-11net: Retire DCCP socket.Kuniyuki Iwashima
DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"), which noted that the last maintainer had been inactive for five years. In recent years, it has become a playground for syzbot, and most changes to DCCP have been odd bug fixes triggered by syzbot. Apart from that, the only changes have been driven by treewide or networking API updates or adjustments related to TCP. Thus, in 2023, we announced we would remove DCCP in 2025 via commit b144fcaf46d4 ("dccp: Print deprecation notice."). Since then, only one individual has contacted the netdev mailing list. [0] There is ongoing research for Multipath DCCP. The repository is hosted on GitHub [1], and development is not taking place through the upstream community. While the repository is published under the GPLv2 license, the scheduling part remains proprietary, with a LICENSE file [2] stating: "This is not Open Source software." The researcher mentioned a plan to address the licensing issue, upstream the patches, and step up as a maintainer, but there has been no further communication since then. Maintaining DCCP for a decade without any real users has become a burden. Therefore, it's time to remove it. Removing DCCP will also provide significant benefits to TCP. It allows us to freely reorganize the layout of struct inet_connection_sock, which is currently shared with DCCP, and optimize it to reduce the number of cachelines accessed in the TCP fast path. Note that we keep DCCP netfilter modules as requested. [3] Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0] Link: https://github.com/telekom/mp-dccp #[1] Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2] Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-16smack: recognize ipv4 CIPSO w/o categoriesKonstantin Andreev
If SMACK label has CIPSO representation w/o categories, e.g.: | # cat /smack/cipso2 | foo 10 | @ 250/2 | ... then SMACK does not recognize such CIPSO in input ipv4 packets and substitues '*' label instead. Audit records may look like | lsm=SMACK fn=smack_socket_sock_rcv_skb action=denied | subject="*" object="_" requested=w pid=0 comm="swapper/1" ... This happens in two steps: 1) security/smack/smackfs.c`smk_set_cipso does not clear NETLBL_SECATTR_MLS_CAT from (struct smack_known *)skp->smk_netlabel.flags on assigning CIPSO w/o categories: | rcu_assign_pointer(skp->smk_netlabel.attr.mls.cat, ncats.attr.mls.cat); | skp->smk_netlabel.attr.mls.lvl = ncats.attr.mls.lvl; 2) security/smack/smack_lsm.c`smack_from_secattr can not match skp->smk_netlabel with input packet's struct netlbl_lsm_secattr *sap because sap->flags have not NETLBL_SECATTR_MLS_CAT (what is correct) but skp->smk_netlabel.flags have (what is incorrect): | if ((sap->flags & NETLBL_SECATTR_MLS_CAT) == 0) { | if ((skp->smk_netlabel.flags & | NETLBL_SECATTR_MLS_CAT) == 0) | found = 1; | break; | } This commit sets/clears NETLBL_SECATTR_MLS_CAT in skp->smk_netlabel.flags according to the presense of CIPSO categories. The update of smk_netlabel is not atomic, so input packets processing still may be incorrect during short time while update proceeds. Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-02-16smack: Revert "smackfs: Added check catlen"Konstantin Andreev
This reverts commit ccfd889acb06eab10b98deb4b5eef0ec74157ea0 The indicated commit * does not describe the problem that change tries to solve * has programming issues * introduces a bug: forever clears NETLBL_SECATTR_MLS_CAT in (struct smack_known *)skp->smk_netlabel.flags Reverting the commit to reapproach original problem Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-02-13smack: remove /smack/logging if audit is not configuredKonstantin Andreev
If CONFIG_AUDIT is not set then SMACK does not generate audit messages, however, keeps audit control file, /smack/logging, while there is no entity to control. This change removes audit control file /smack/logging when audit is not configured in the kernel Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-02-13smack: ipv4/ipv6: tcp/dccp/sctp: fix incorrect child socket labelKonstantin Andreev
Since inception [1], SMACK initializes ipv* child socket security for connection-oriented communications (tcp/sctp/dccp) during accept() syscall, in the security_sock_graft() hook: | void smack_sock_graft(struct sock *sk, ...) | { | // only ipv4 and ipv6 are eligible here | // ... | ssp = sk->sk_security; // socket security | ssp->smk_in = skp; // process label: smk_of_current() | ssp->smk_out = skp; // process label: smk_of_current() | } This approach is incorrect for two reasons: A) initialization occurs too late for child socket security: The child socket is created by the kernel once the handshake completes (e.g., for tcp: after receiving ack for syn+ack). Data can legitimately start arriving to the child socket immediately, long before the application calls accept() on the socket. Those data are (currently — were) processed by SMACK using incorrect child socket security attributes. B) Incoming connection requests are handled using the listening socket's security, hence, the child socket must inherit the listening socket's security attributes. smack_sock_graft() initilizes the child socket's security with a process label, as is done for a new socket() But ... the process label is not necessarily the same as the listening socket label. A privileged application may legitimately set other in/out labels for a listening socket. When this happens, SMACK processes incoming packets using incorrect socket security attributes. In [2] Michael Lontke noticed (A) and fixed it in [3] by adding socket initialization into security_sk_clone_security() hook like | void smack_sk_clone_security(struct sock *oldsk, struct sock *newsk) | { | *(struct socket_smack *)newsk->sk_security = | *(struct socket_smack *)oldsk->sk_security; | } This initializes the child socket security with the parent (listening) socket security at the appropriate time. I was forced to revisit this old story because smack_sock_graft() was left in place by [3] and continues overwriting the child socket's labels with the process label, and there might be a reason for this, so I undertook a study. If the process label differs from the listening socket's labels, the following occurs for ipv4: assigning the smk_out is not accompanied by netlbl_sock_setattr, so the outgoing packet's cipso label does not change. So, the only effect of this assignment for interhost communications is a divergence between the program-visible “out” socket label and the cipso network label. For intrahost communications this label, however, becomes visible via secmark netfilter marking, and is checked for access rights by the client, receiving side. Assigning the smk_in affects both interhost and intrahost communications: the server begins to check access rights against an wrong label. Access check against wrong label (smk_in or smk_out), unsurprisingly fails, breaking the connection. The above affects protocols that calls security_sock_graft() during accept(), namely: {tcp,dccp,sctp}/{ipv4,ipv6} One extra security_sock_graft() caller, crypto/af_alg.c`af_alg_accept is not affected, because smack_sock_graft() does nothing for PF_ALG. To reproduce, assign non-default in/out labels to a listening socket, setup rules between these labels and client label, attempt to connect and send some data. Ipv6 specific: ipv6 packets do not convey SMACK labels. To reproduce the issue in interhost communications set opposite labels in /smack/ipv6host on both hosts. Ipv6 intrahost communications do not require tricking, because SMACK labels are conveyed via secmark netfilter marking. So, currently smack_sock_graft() is not useful, but harmful, therefore, I have removed it. This fixes the issue for {tcp,dccp}/{ipv4,ipv6}, but not sctp/{ipv4,ipv6}. Although this change is necessary for sctp+smack to function correctly, it is not sufficient because: sctp/ipv4 does not call security_sk_clone() and sctp/ipv6 ignores SMACK completely. These are separate issues, belong to other subsystem, and should be addressed separately. [1] 2008-02-04, Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel") [2] Michael Lontke, 2022-08-31, SMACK LSM checks wrong object label during ingress network traffic Link: https://lore.kernel.org/linux-security-module/6324997ce4fc092c5020a4add075257f9c5f6442.camel@elektrobit.com/ [3] 2022-08-31, michael.lontke, commit 4ca165fc6c49 ("SMACK: Add sk_clone_security LSM hook") Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-02-12smack: dont compile ipv6 code unless ipv6 is configuredKonstantin Andreev
I want to be sure that ipv6-specific code is not compiled in kernel binaries if ipv6 is not configured. [1] was getting rid of "unused variable" warning, but, with that, it also mandated compilation of a handful ipv6- specific functions in ipv4-only kernel configurations: smk_ipv6_localhost, smack_ipv6host_label, smk_ipv6_check. Their compiled bodies are likely to be removed by compiler from the resulting binary, but, to be on the safe side, I remove them from the compiler view. [1] Fixes: 00720f0e7f28 ("smack: avoid unused 'sip' variable warning") Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-02-11Smack: fix typos and spelling errorsCasey Schaufler
Fix typos and spelling errors in security/smack module comments that were identified using the codespell tool. No functional changes - documentation only. Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-01-21Merge tag 'lsm-pr-20250121' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Improved handling of LSM "secctx" strings through lsm_context struct The LSM secctx string interface is from an older time when only one LSM was supported, migrate over to the lsm_context struct to better support the different LSMs we now have and make it easier to support new LSMs in the future. These changes explain the Rust, VFS, and networking changes in the diffstat. - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are enabled Small tweak to be a bit smarter about when we build the LSM's common audit helpers. - Check for absurdly large policies from userspace in SafeSetID SafeSetID policies rules are fairly small, basically just "UID:UID", it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which helps quiet a number of syzbot related issues. While work is being done to address the syzbot issues through other mechanisms, this is a trivial and relatively safe fix that we can do now. - Various minor improvements and cleanups A collection of improvements to the kernel selftests, constification of some function parameters, removing redundant assignments, and local variable renames to improve readability. * tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: initialize local array before use to quiet static analysis safesetid: check size of policy writes net: corrections for security_secid_to_secctx returns lsm: rename variable to avoid shadowing lsm: constify function parameters security: remove redundant assignment to return variable lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test binder: initialize lsm_context structure rust: replace lsm context+len with lsm_context lsm: secctx provider check on release lsm: lsm_context in security_dentry_init_security lsm: use lsm_context in security_inode_getsecctx lsm: replace context+len with lsm_context lsm: ensure the correct LSM context releaser
2024-12-06smack: deduplicate access to string conversionKonstantin Andreev
Signed-off-by: Konstantin Andreev <andreev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2024-12-04lsm: use lsm_context in security_inode_getsecctxCasey Schaufler
Change the security_inode_getsecctx() interface to fill a lsm_context structure instead of data and length pointers. This provides the information about which LSM created the context so that security_release_secctx() can use the correct hook. Cc: linux-nfs@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04lsm: replace context+len with lsm_contextCasey Schaufler
Replace the (secctx,seclen) pointer pair with a single lsm_context pointer to allow return of the LSM identifier along with the context and context length. This allows security_release_secctx() to know how to release the context. Callers have been modified to use or save the returned data from the new structure. security_secid_to_secctx() and security_lsmproc_to_secctx() will now return the length value on success instead of 0. Cc: netdev@vger.kernel.org Cc: audit@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: Todd Kjos <tkjos@google.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak, kdoc fix, signedness fix from Dan Carpenter] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: remove lsm_prop scaffoldingCasey Schaufler
Remove the scaffold member from the lsm_prop. Remove the remaining places it is being set. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11netlabel,smack: use lsm_prop for audit dataCasey Schaufler
Replace the secid in the netlbl_audit structure with an lsm_prop. Remove scaffolding that was required when the value was a secid. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: fix the subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: create new security_cred_getlsmprop LSM hookCasey Schaufler
Create a new LSM hook security_cred_getlsmprop() which, like security_cred_getsecid(), fetches LSM specific attributes from the cred structure. The associated data elements in the audit sub-system are changed from a secid to a lsm_prop to accommodate multiple possible LSM audit users. Cc: linux-integrity@vger.kernel.org Cc: audit@vger.kernel.org Cc: selinux@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: use lsm_prop in security_inode_getsecidCasey Schaufler
Change the security_inode_getsecid() interface to fill in a lsm_prop structure instead of a u32 secid. This allows for its callers to gather data from all registered LSMs. Data is provided for IMA and audit. Change the name to security_inode_getlsmprop(). Cc: linux-integrity@vger.kernel.org Cc: selinux@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: use lsm_prop in security_current_getsecidCasey Schaufler
Change the security_current_getsecid_subj() and security_task_getsecid_obj() interfaces to fill in a lsm_prop structure instead of a u32 secid. Audit interfaces will need to collect all possible security data for possible reporting. Cc: linux-integrity@vger.kernel.org Cc: audit@vger.kernel.org Cc: selinux@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: use lsm_prop in security_ipc_getsecidCasey Schaufler
There may be more than one LSM that provides IPC data for auditing. Change security_ipc_getsecid() to fill in a lsm_prop structure instead of the u32 secid. Change the name to security_ipc_getlsmprop() to reflect the change. Cc: audit@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: add lsmprop_to_secctx hookCasey Schaufler
Add a new hook security_lsmprop_to_secctx() and its LSM specific implementations. The LSM specific code will use the lsm_prop element allocated for that module. This allows for the possibility that more than one module may be called upon to translate a secid to a string, as can occur in the audit code. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-11lsm: use lsm_prop in security_audit_rule_matchCasey Schaufler
Change the secid parameter of security_audit_rule_match to a lsm_prop structure pointer. Pass the entry from the lsm_prop structure for the approprite slot to the LSM hook. Change the users of security_audit_rule_match to use the lsm_prop instead of a u32. The scaffolding function lsmprop_init() fills the structure with the value of the old secid, ensuring that it is available to the appropriate module hook. The sources of the secid, security_task_getsecid() and security_inode_getsecid(), will be converted to use the lsm_prop structure later in the series. At that point the use of lsmprop_init() is dropped. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-09-24Merge tag 'lsm-pr-20240923' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM fixes from Paul Moore: - Add a missing security_mmap_file() check to the remap_file_pages() syscall - Properly reference the SELinux and Smack LSM blobs in the security_watch_key() LSM hook - Fix a random IPE selftest crash caused by a missing list terminator in the test * tag 'lsm-pr-20240923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: ipe: Add missing terminator to list of unit tests selinux,smack: properly reference the LSM blob in security_watch_key() mm: call the security_mmap_file() LSM hook in remap_file_pages()
2024-09-19selinux,smack: properly reference the LSM blob in security_watch_key()Paul Moore
Unfortunately when we migrated the lifecycle management of the key LSM blob to the LSM framework we forgot to convert the security_watch_key() callbacks for SELinux and Smack. This patch corrects this by making use of the selinux_key() and smack_key() helper functions respectively. This patch also removes some input checking in the Smack callback as it is no longer needed. Fixes: 5f8d28f6d7d5 ("lsm: infrastructure management of the key security blob") Reported-by: syzbot+044fdf24e96093584232@syzkaller.appspotmail.com Tested-by: syzbot+044fdf24e96093584232@syzkaller.appspotmail.com Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-09-19Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack updates from Casey Schaufler: "Two patches: one is a simple indentation correction, the other corrects a potentially rcu unsafe pointer assignment" * tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next: smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso security: smack: Fix indentation in smack_netfilter.c
2024-09-16Merge tag 'lsm-pr-20240911' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the LSM framework to static calls This transitions the vast majority of the LSM callbacks into static calls. Those callbacks which haven't been converted were left as-is due to the general ugliness of the changes required to support the static call conversion; we can revisit those callbacks at a future date. - Add the Integrity Policy Enforcement (IPE) LSM This adds a new LSM, Integrity Policy Enforcement (IPE). There is plenty of documentation about IPE in this patches, so I'll refrain from going into too much detail here, but the basic motivation behind IPE is to provide a mechanism such that administrators can restrict execution to only those binaries which come from integrity protected storage, e.g. a dm-verity protected filesystem. You will notice that IPE requires additional LSM hooks in the initramfs, dm-verity, and fs-verity code, with the associated patches carrying ACK/review tags from the associated maintainers. We couldn't find an obvious maintainer for the initramfs code, but the IPE patchset has been widely posted over several years. Both Deven Bowers and Fan Wu have contributed to IPE's development over the past several years, with Fan Wu agreeing to serve as the IPE maintainer moving forward. Once IPE is accepted into your tree, I'll start working with Fan to ensure he has the necessary accounts, keys, etc. so that he can start submitting IPE pull requests to you directly during the next merge window. - Move the lifecycle management of the LSM blobs to the LSM framework Management of the LSM blobs (the LSM state buffers attached to various kernel structs, typically via a void pointer named "security" or similar) has been mixed, some blobs were allocated/managed by individual LSMs, others were managed by the LSM framework itself. Starting with this pull we move management of all the LSM blobs, minus the XFRM blob, into the framework itself, improving consistency across LSMs, and reducing the amount of duplicated code across LSMs. Due to some additional work required to migrate the XFRM blob, it has been left as a todo item for a later date; from a practical standpoint this omission should have little impact as only SELinux provides a XFRM LSM implementation. - Fix problems with the LSM's handling of F_SETOWN The LSM hook for the fcntl(F_SETOWN) operation had a couple of problems: it was racy with itself, and it was disconnected from the associated DAC related logic in such a way that the LSM state could be updated in cases where the DAC state would not. We fix both of these problems by moving the security_file_set_fowner() hook into the same section of code where the DAC attributes are updated. Not only does this resolve the DAC/LSM synchronization issue, but as that code block is protected by a lock, it also resolve the race condition. - Fix potential problems with the security_inode_free() LSM hook Due to use of RCU to protect inodes and the placement of the LSM hook associated with freeing the inode, there is a bit of a challenge when it comes to managing any LSM state associated with an inode. The VFS folks are not open to relocating the LSM hook so we have to get creative when it comes to releasing an inode's LSM state. Traditionally we have used a single LSM callback within the hook that is triggered when the inode is "marked for death", but not actually released due to RCU. Unfortunately, this causes problems for LSMs which want to take an action when the inode's associated LSM state is actually released; so we add an additional LSM callback, inode_free_security_rcu(), that is called when the inode's LSM state is released in the RCU free callback. - Refactor two LSM hooks to better fit the LSM return value patterns The vast majority of the LSM hooks follow the "return 0 on success, negative values on failure" pattern, however, there are a small handful that have unique return value behaviors which has caused confusion in the past and makes it difficult for the BPF verifier to properly vet BPF LSM programs. This includes patches to convert two of these"special" LSM hooks to the common 0/-ERRNO pattern. - Various cleanups and improvements A handful of patches to remove redundant code, better leverage the IS_ERR_OR_NULL() helper, add missing "static" markings, and do some minor style fixups. * tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits) security: Update file_set_fowner documentation fs: Fix file_set_fowner LSM hook inconsistencies lsm: Use IS_ERR_OR_NULL() helper function lsm: remove LSM_COUNT and LSM_CONFIG_COUNT ipe: Remove duplicated include in ipe.c lsm: replace indirect LSM hook calls with static calls lsm: count the LSMs enabled at compile time kernel: Add helper macros for loop unrolling init/main.c: Initialize early LSMs after arch code, static keys and calls. MAINTAINERS: add IPE entry with Fan Wu as maintainer documentation: add IPE documentation ipe: kunit test for parser scripts: add boot policy generation program ipe: enable support for fs-verity as a trust provider fsverity: expose verified fsverity built-in signatures to LSMs lsm: add security_inode_setintegrity() hook ipe: add support for dm-verity as a trust provider dm-verity: expose root hash digest and signature data to LSMs block,lsm: add LSM blob and new LSM hooks for block devices ipe: add permissive toggle ...
2024-09-16Merge tag 'vfs-6.12.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs file updates from Christian Brauner: "This is the work to cleanup and shrink struct file significantly. Right now, (focusing on x86) struct file is 232 bytes. After this series struct file will be 184 bytes aka 3 cacheline and a spare 8 bytes for future extensions at the end of the struct. With struct file being as ubiquitous as it is this should make a difference for file heavy workloads and allow further optimizations in the future. - struct fown_struct was embedded into struct file letting it take up 32 bytes in total when really it shouldn't even be embedded in struct file in the first place. Instead, actual users of struct fown_struct now allocate the struct on demand. This frees up 24 bytes. - Move struct file_ra_state into the union containg the cleanup hooks and move f_iocb_flags out of the union. This closes a 4 byte hole we created earlier and brings struct file to 192 bytes. Which means struct file is 3 cachelines and we managed to shrink it by 40 bytes. - Reorder struct file so that nothing crosses a cacheline. I suspect that in the future we will end up reordering some members to mitigate false sharing issues or just because someone does actually provide really good perf data. - Shrinking struct file to 192 bytes is only part of the work. Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer must be located outside of the object because the cache doesn't know what part of the memory can safely be overwritten as it may be needed to prevent object recycling. That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a new cacheline. So this also contains work to add a new kmem_cache_create_rcu() function that allows the caller to specify an offset where the freelist pointer is supposed to be placed. Thus avoiding the implicit addition of a fourth cacheline. - And finally this removes the f_version member in struct file. The f_version member isn't particularly well-defined. It is mainly used as a cookie to detect concurrent seeks when iterating directories. But it is also abused by some subsystems for completely unrelated things. It is mostly a directory and filesystem specific thing that doesn't really need to live in struct file and with its wonky semantics it really lacks a specific function. For pipes, f_version is (ab)used to defer poll notifications until a write has happened. And struct pipe_inode_info is used by multiple struct files in their ->private_data so there's no chance of pushing that down into file->private_data without introducing another pointer indirection. But pipes don't rely on f_pos_lock so this adds a union into struct file encompassing f_pos_lock and a pipe specific f_pipe member that pipes can use. This union of course can be extended to other file types and is similar to what we do in struct inode already" * tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits) fs: remove f_version pipe: use f_pipe fs: add f_pipe ubifs: store cookie in private data ufs: store cookie in private data udf: store cookie in private data proc: store cookie in private data ocfs2: store cookie in private data input: remove f_version abuse ext4: store cookie in private data ext2: store cookie in private data affs: store cookie in private data fs: add generic_llseek_cookie() fs: use must_set_pos() fs: add must_set_pos() fs: add vfs_setpos_cookie() s390: remove unused f_version ceph: remove unused f_version adi: remove unused f_version mm: Removed @freeptr_offset to prevent doc warning ...
2024-09-03smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipsoJiawei Ye
In the `smk_set_cipso` function, the `skp->smk_netlabel.attr.mls.cat` field is directly assigned to a new value without using the appropriate RCU pointer assignment functions. According to RCU usage rules, this is illegal and can lead to unpredictable behavior, including data inconsistencies and impossible-to-diagnose memory corruption issues. This possible bug was identified using a static analysis tool developed by myself, specifically designed to detect RCU-related issues. To address this, the assignment is now done using rcu_assign_pointer(), which ensures that the pointer assignment is done safely, with the necessary memory barriers and synchronization. This change prevents potential RCU dereference issues by ensuring that the `cat` field is safely updated while still adhering to RCU's requirements. Fixes: 0817534ff9ea ("smackfs: Fix use-after-free in netlbl_catmap_walk()") Signed-off-by: Jiawei Ye <jiawei.ye@foxmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2024-08-28selinux,smack: don't bypass permissions check in inode_setsecctx hookScott Mayhew
Marek Gresko reports that the root user on an NFS client is able to change the security labels on files on an NFS filesystem that is exported with root squashing enabled. The end of the kerneldoc comment for __vfs_setxattr_noperm() states: * This function requires the caller to lock the inode's i_mutex before it * is executed. It also assumes that the caller will make the appropriate * permission checks. nfsd_setattr() does do permissions checking via fh_verify() and nfsd_permission(), but those don't do all the same permissions checks that are done by security_inode_setxattr() and its related LSM hooks do. Since nfsd_setattr() is the only consumer of security_inode_setsecctx(), simplest solution appears to be to replace the call to __vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This fixes the above issue and has the added benefit of causing nfsd to recall conflicting delegations on a file when a client tries to change its security label. Cc: stable@kernel.org Reported-by: Marek Gresko <marek.gresko@protonmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-28file: reclaim 24 bytes from f_ownerChristian Brauner
We do embedd struct fown_struct into struct file letting it take up 32 bytes in total. We could tweak struct fown_struct to be more compact but really it shouldn't even be embedded in struct file in the first place. Instead, actual users of struct fown_struct should allocate the struct on demand. This frees up 24 bytes in struct file. That will have some potentially user-visible changes for the ownership fcntl()s. Some of them can now fail due to allocation failures. Practically, that probably will almost never happen as the allocations are small and they only happen once per file. The fown_struct is used during kill_fasync() which is used by e.g., pipes to generate a SIGIO signal. Sending of such signals is conditional on userspace having set an owner for the file using one of the F_OWNER fcntl()s. Such users will be unaffected if struct fown_struct is allocated during the fcntl() call. There are a few subsystems that call __f_setown() expecting file->f_owner to be allocated: (1) tun devices file->f_op->fasync::tun_chr_fasync() -> __f_setown() There are no callers of tun_chr_fasync(). (2) tty devices file->f_op->fasync::tty_fasync() -> __tty_fasync() -> __f_setown() tty_fasync() has no additional callers but __tty_fasync() has. Note that __tty_fasync() only calls __f_setown() if the @on argument is true. It's called from: file->f_op->release::tty_release() -> tty_release() -> __tty_fasync() -> __f_setown() tty_release() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of tty_release() are safe as well. file->f_op->release::tty_open() -> tty_release() -> __tty_fasync() -> __f_setown() __tty_hangup() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of __tty_hangup() are safe as well. From the callchains it's obvious that (1) and (2) end up getting called via file->f_op->fasync(). That can happen either through the F_SETFL fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If FASYNC is requested and the file isn't already FASYNC then file->f_op->fasync() is called with @on true which ends up causing both (1) and (2) to call __f_setown(). (1) and (2) are the only subsystems that call __f_setown() from the file->f_op->fasync() handler. So both (1) and (2) have been updated to allocate a struct fown_struct prior to calling fasync_helper() to register with the fasync infrastructure. That's safe as they both call fasync_helper() which also does allocations if @on is true. The other interesting case are file leases: (3) file leases lease_manager_ops->lm_setup::lease_setup() -> __f_setown() Which in turn is called from: generic_add_lease() -> lease_manager_ops->lm_setup::lease_setup() -> __f_setown() So here again we can simply make generic_add_lease() allocate struct fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() which happens under a spinlock. With that the two remaining subsystems that call __f_setown() are: (4) dnotify (5) sockets Both have their own custom ioctls to set struct fown_struct and both have been converted to allocate a struct fown_struct on demand from their respective ioctls. Interactions with O_PATH are fine as well e.g., when opening a /dev/tty as O_PATH then no file->f_op->open() happens thus no file->f_owner is allocated. That's fine as no file operation will be set for those and the device has never been opened. fcntl()s called on such things will just allocate a ->f_owner on demand. Although I have zero idea why'd you care about f_owner on an O_PATH fd. Link: https://lore.kernel.org/r/20240813-work-f_owner-v2-1-4e9343a79f9f@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-22security: smack: Fix indentation in smack_netfilter.cGiSeong Ji
Aligned parameters in the function declaration of smack_ip_output to adhere to the Linux kernel coding style guidelines. The parameters of the smack_ip_output function were previously misaligned, with the second and third parameters not aligned under the first parameter. This change corrects the indentation, improving code readability and maintaining consistency with the rest of the codebase. Signed-off-by: GiSeong Ji <jiggyjiggy0323@gmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2024-07-31lsm: Refactor return value of LSM hook inode_copy_up_xattrXu Kuohai
To be consistent with most LSM hooks, convert the return value of hook inode_copy_up_xattr to 0 or a negative error code. Before: - Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. After: - Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED* when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-07-29lsm: infrastructure management of the key security blobCasey Schaufler
Move management of the key->security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. There are no existing modules that require a key_free hook, so the call to it and the definition for it have been removed. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>