summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2022-06-09signal: Deliver SIGTRAP on perf event asynchronously if blockedMarco Elver
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ] With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case: <set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers> When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task. This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later. To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ] Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/PCI: Fix ALi M1487 (IBC) PIRQ router link value interpretationMaciej W. Rozycki
[ Upstream commit 4969e223b109754c2340a26bba9b1cf44f0cba9b ] Fix an issue with commit 1ce849c75534 ("x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router") and correct ALi M1487 (IBC) PIRQ router link value (`pirq' cookie) interpretation according to findings in the BIOS. Credit to Nikolai Zhubr for the detective work as to the bit layout. Fixes: 1ce849c75534 ("x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203310013270.44113@angie.orcam.me.uk Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/delay: Fix the wrong asm constraint in delay_loop()Ammar Faizi
[ Upstream commit b86eb74098a92afd789da02699b4b0dd3f73b889 ] The asm constraint does not reflect the fact that the asm statement can modify the value of the local variable loops. Which it does. Specifying the wrong constraint may lead to undefined behavior, it may clobber random stuff (e.g. local variable, important temporary value in regs, etc.). This is especially dangerous when the compiler decides to inline the function and since it doesn't know that the value gets modified, it might decide to use it from a register directly without reloading it. Change the constraint to "+a" to denote that the first argument is an input and an output argument. [ bp: Fix typo, massage commit message. ] Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function") Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220329104705.65256-2-ammarfaizi2@gnuweeb.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/iommu: Add missing of_node_put in iommu_init_early_dartPeng Wu
[ Upstream commit 57b742a5b8945118022973e6416b71351df512fb ] The device_node pointer is returned by of_find_compatible_node with refcount incremented. We should use of_node_put() to avoid the refcount leak. Signed-off-by: Peng Wu <wupeng58@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220425081245.21705-1-wupeng58@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/powernv: fix missing of_node_put in uv_init()Lv Ruyi
[ Upstream commit 3ffa9fd471f57f365bc54fc87824c530422f64a5 ] of_find_compatible_node() returns node pointer with refcount incremented, use of_node_put() on it when done. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220407090043.2491854-1-lv.ruyi@zte.com.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/xics: fix refcount leak in icp_opal_init()Lv Ruyi
[ Upstream commit 5dd9e27ea4a39f7edd4bf81e9e70208e7ac0b7c9 ] The of_find_compatible_node() function returns a node pointer with refcount incremented, use of_node_put() on it when done. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220402013419.2410298-1-lv.ruyi@zte.com.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attrHaren Myneni
[ Upstream commit c127d130f6d59fa81701f6b04023cf7cd1972fb3 ] In init_winctx_regs(), __pa() is called on winctx->rx_fifo and this function is called to initialize registers for receive and fault windows. But the real address is passed in winctx->rx_fifo for receive windows and the virtual address for fault windows which causes errors with DEBUG_VIRTUAL enabled. Fixes this issue by assigning only real address to rx_fifo in vas_rx_win_attr struct for both receive and fault windows. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/338e958c7ab8f3b266fa794a1f80f99b9671829e.camel@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09alpha: fix alloc_zeroed_user_highpage_movable()Matthew Wilcox (Oracle)
[ Upstream commit f9c668d281aa20e38c9bda3b7b0adeb8891aa15e ] Due to a typo, the final argument to alloc_page_vma() didn't refer to a real variable. This only affected CONFIG_NUMA, which was marked BROKEN in 2006 and removed from alpha in 2021. Found due to a refactoring patch. Link: https://lkml.kernel.org/r/20220504182857.4013401-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES settingNicholas Piggin
[ Upstream commit 2852ebfa10afdcefff35ec72c8da97141df9845c ] The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to be sent to L2 and missed by the L0. Clearing LPES when it may be set, as typically happens with XIVE enabled could cause a performance issue despite having no native XIVE support in the guest, because it will cause mediated interrupts for the L2 to be taken in HV mode, which then have to be injected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/rtas: Keep MSR[RI] set when calling RTASLaurent Dufour
[ Upstream commit b6b1c3ce06ca438eb24e0f45bf0e63ecad0369f5 ] RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big endian mode (MSR[SF,LE] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: watchdog: CPU 24 Hard LOCKUP watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Oops: Unrecoverable System Reset, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 3ddec07f638c34a2 ]--- This happens because MSR[RI] is unset when entering RTAS but there is no valid reason to not set it here. RTAS is expected to be called with MSR[RI] as specified in PAPR+ section "7.2.1 Machine State": R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect its own critical regions from recursion by setting the MSR[RI] bit to 0 when in the critical regions. Fixing this by reviewing the way MSR is compute before calling RTAS. Now a hardcoded value meaning real mode, 32 bits big endian mode and Recoverable Interrupt is loaded. In the case MSR[S] is set, it will remain set while entering RTAS as only urfid can unset it (thanks Fabiano). In addition a check is added in do_enter_rtas() to detect calls made with MSR[RI] unset, as we are forcing it on later. This patch has been tested on the following machines: Power KVM Guest P8 S822L (host Ubuntu kernel 5.11.0-49-generic) PowerVM LPAR P8 9119-MME (FW860.A1) p9 9008-22L (FW950.00) P10 9080-HEX (FW1010.00) Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: hisi: Add missing of_node_put after of_find_compatible_nodePeng Wu
[ Upstream commit 9bc72e47d4630d58a840a66a869c56b29554cfe4 ] of_find_compatible_node will increment the refcount of the returned device_node. Calling of_node_put() to avoid the refcount leak Signed-off-by: Peng Wu <wupeng58@huawei.com> Signed-off-by: Wei Xu <xuwei5@hisilicon.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: dts: exynos: add atmel,24c128 fallback to Samsung EEPROMKrzysztof Kozlowski
[ Upstream commit f038e8186fbc5723d7d38c6fa1d342945107347e ] The Samsung s524ad0xd1 EEPROM should use atmel,24c128 fallback, according to the AT24 EEPROM bindings. Reported-by: Rob Herring <robh@kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220426183443.243113-1-krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: versatile: Add missing of_node_put in dcscb_initPeng Wu
[ Upstream commit 23b44f9c649bbef10b45fa33080cd8b4166800ae ] The device_node pointer is returned by of_find_compatible_node with refcount incremented. We should use of_node_put() to avoid the refcount leak. Signed-off-by: Peng Wu <wupeng58@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220428230356.69418-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09powerpc/fadump: Fix fadump to work with a different endian capture kernelHari Bathini
[ Upstream commit b74196af372f7cb4902179009265fe63ac81824f ] Dump capture would fail if capture kernel is not of the endianess as the production kernel, because the in-memory data structure (struct opal_fadump_mem_struct) shared across production kernel and capture kernel assumes the same endianess for both the kernels, which doesn't have to be true always. Fix it by having a well-defined endianess for struct opal_fadump_mem_struct. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161902744901.86147.14719228311655123526.stgit@hbathini Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: OMAP1: clock: Fix UART rate reporting algorithmJanusz Krzysztofik
[ Upstream commit 338d5d476cde853dfd97378d20496baabc2ce3c0 ] Since its introduction to the mainline kernel, omap1_uart_recalc() helper makes incorrect use of clk->enable_bit as a ready to use bitmap mask while it only provides the bit number. Fix it. Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09arm64: dts: qcom: sdm845-xiaomi-beryllium: fix typo in panel's vddio-supply ↵Joel Selvaraj
property [ Upstream commit 1f1c494082a1f10d03ce4ee1485ee96d212e22ff ] vddio is misspelled with a "0" instead of "o". Fix it. Signed-off-by: Joel Selvaraj <jo@jsfamily.in> Reviewed-by: Caleb Connolly <caleb@connolly.tech> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/BY5PR02MB7009901651E6A8D5ACB0425ED91F9@BY5PR02MB7009.namprd02.prod.outlook.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09arm64: dts: qcom: msm8994: Fix BLSP[12]_DMA channels countKonrad Dybcio
[ Upstream commit 1ae438d26b620979ed004d559c304d31c42173ae ] MSM8994 actually features 24 DMA channels for each BLSP, fix it! Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220319174645.340379-14-konrad.dybcio@somainline.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09arm64: dts: qcom: msm8994: Fix the cont_splash_mem addressKonrad Dybcio
[ Upstream commit 049c46f31a726bf8d202ff1681661513447fac84 ] The default memory map places cont_splash_mem at 3401000, which was overlooked.. Fix it! Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220319174645.340379-9-konrad.dybcio@somainline.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: dts: s5pv210: align DMA channels with dtschemaKrzysztof Kozlowski
[ Upstream commit 9e916fb9bc3d16066286f19fc9c51d26a6aec6bd ] dtschema expects DMA channels in specific order (tx, rx and tx-sec). The order actually should not matter because dma-names is used however let's make it aligned with dtschema to suppress warnings like: i2s@eee30000: dma-names: ['rx', 'tx', 'tx-sec'] is not valid under any of the given schemas Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Co-developed-by: Jonathan Bakker <xc-racer2@live.ca> Signed-off-by: Jonathan Bakker <xc-racer2@live.ca> Link: https://lore.kernel.org/r/CY4PR04MB056779A9C50DC95987C5272ACB1C9@CY4PR04MB0567.namprd04.prod.outlook.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: dts: socfpga: align interrupt controller node name with dtschemaKrzysztof Kozlowski
[ Upstream commit c9bdd50d2019f78bf4c1f6a79254c27771901023 ] Fixes dtbs_check warnings like: $nodename:0: 'intc@fffed000' does not match '^interrupt-controller(@[0-9a-f,]+)*$' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20220317115705.450427-2-krzysztof.kozlowski@canonical.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ARM: dts: ox820: align interrupt controller node name with dtschemaKrzysztof Kozlowski
[ Upstream commit fbcd5ad7a419ad40644a0bb8b4152bc660172d8a ] Fixes dtbs_check warnings like: gic@1000: $nodename:0: 'gic@1000' does not match '^interrupt-controller(@[0-9a-f,]+)*$' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20220317115705.450427-1-krzysztof.kozlowski@canonical.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09m68k: atari: Make Atari ROM port I/O write macros return voidGeert Uytterhoeven
[ Upstream commit 30b5e6ef4a32ea4985b99200e06d6660a69f9246 ] The macros implementing Atari ROM port I/O writes do not cast away their output, unlike similar implementations for other I/O buses. When they are combined using conditional expressions in the definitions of outb() and friends, this triggers sparse warnings like: drivers/net/appletalk/cops.c:382:17: error: incompatible types in conditional expression (different base types): drivers/net/appletalk/cops.c:382:17: unsigned char drivers/net/appletalk/cops.c:382:17: void Fix this by adding casts to "void". Reported-by: kernel test robot <lkp@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/c15bedc83d90a14fffcd5b1b6bfb32b8a80282c5.1653057096.git.geert@linux-m68k.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/microcode: Add explicit CPU vendor dependencyBorislav Petkov
[ Upstream commit 9c55d99e099bd7aa6b91fce8718505c35d5dfc65 ] Add an explicit dependency to the respective CPU vendor so that the respective microcode support for it gets built only when that support is enabled. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/8ead0da9-9545-b10d-e3db-7df1a1f219e4@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09openrisc: start CPU timer early in bootJason A. Donenfeld
[ Upstream commit 516dd4aacd67a0f27da94f3fe63fe0f4dbab6e2b ] In order to measure the boot process, the timer should be switched on as early in boot as possible. As well, the commit defines the get_cycles macro, like the previous patches in this series, so that generic code is aware that it's implemented by the platform, as is done on other archs. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Acked-by: Stafford Horne <shorne@gmail.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09perf/amd/ibs: Cascade pmu init functions' return valueRavi Bangoria
[ Upstream commit 39b2ca75eec8a33e2ffdb8aa0c4840ec3e3b472c ] IBS pmu initialization code ignores return value provided by callee functions. Fix it. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220509044914.1473-2-ravi.bangoria@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09s390/preempt: disable __preempt_count_add() optimization for ↵Heiko Carstens
PROFILE_ALL_BRANCHES [ Upstream commit 63678eecec57fc51b778be3da35a397931287170 ] gcc 12 does not (always) optimize away code that should only be generated if parameters are constant and within in a certain range. This depends on various obscure kernel config options, however in particular PROFILE_ALL_BRANCHES can trigger this compile error: In function ‘__atomic_add_const’, inlined from ‘__preempt_count_add.part.0’ at ./arch/s390/include/asm/preempt.h:50:3: ./arch/s390/include/asm/atomic_ops.h:80:9: error: impossible constraint in ‘asm’ 80 | asm volatile( \ | ^~~ Workaround this by simply disabling the optimization for PROFILE_ALL_BRANCHES, since the kernel will be so slow, that this optimization won't matter at all. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscallAlexandru Elisei
[ Upstream commit 3fed9e551417b84038b15117732ea4505eee386b ] If a compat process tries to execute an unknown system call above the __ARM_NR_COMPAT_END number, the kernel sends a SIGILL signal to the offending process. Information about the error is printed to dmesg in compat_arm_syscall() -> arm64_notify_die() -> arm64_force_sig_fault() -> arm64_show_signal(). arm64_show_signal() interprets a non-zero value for current->thread.fault_code as an exception syndrome and displays the message associated with the ESR_ELx.EC field (bits 31:26). current->thread.fault_code is set in compat_arm_syscall() -> arm64_notify_die() with the bad syscall number instead of a valid ESR_ELx value. This means that the ESR_ELx.EC field has the value that the user set for the syscall number and the kernel can end up printing bogus exception messages*. For example, for the syscall number 0x68000000, which evaluates to ESR_ELx.EC value of 0x1A (ESR_ELx_EC_FPAC) the kernel prints this error: [ 18.349161] syscall[300]: unhandled exception: ERET/ERETAA/ERETAB, ESR 0x68000000, Oops - bad compat syscall(2) in syscall[10000+50000] [ 18.350639] CPU: 2 PID: 300 Comm: syscall Not tainted 5.18.0-rc1 #79 [ 18.351249] Hardware name: Pine64 RockPro64 v2.0 (DT) [..] which is misleading, as the bad compat syscall has nothing to do with pointer authentication. Stop arm64_show_signal() from printing exception syndrome information by having compat_arm_syscall() set the ESR_ELx value to 0, as it has no meaning for an invalid system call number. The example above now becomes: [ 19.935275] syscall[301]: unhandled exception: Oops - bad compat syscall(2) in syscall[10000+50000] [ 19.936124] CPU: 1 PID: 301 Comm: syscall Not tainted 5.18.0-rc1-00005-g7e08006d4102 #80 [ 19.936894] Hardware name: Pine64 RockPro64 v2.0 (DT) [..] which although shows less information because the syscall number, wrongfully advertised as the ESR value, is missing, it is better than showing plainly wrong information. The syscall number can be easily obtained with strace. *A 32-bit value above or equal to 0x8000_0000 is interpreted as a negative integer in compat_arm_syscal() and the condition scno < __ARM_NR_COMPAT_END evaluates to true; the syscall will exit to userspace in this case with the ENOSYS error code instead of arm64_notify_die() being called. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-3-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ACPICA: Avoid cache flush inside virtual machinesKirill A. Shutemov
[ Upstream commit e2efb6359e620521d1e13f69b2257de8ceaa9475 ] While running inside virtual machine, the kernel can bypass cache flushing. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss. Before entering sleep states, the ACPI code flushes caches to prevent data loss using the WBINVD instruction. This mechanism is required on bare metal. But, any use WBINVD inside of a guest is worthless. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss, so most hypervisors simply ignore it. Despite this, the ACPI code calls WBINVD unconditionally anyway. It's useless, but also normally harmless. In TDX guests, though, WBINVD stops being harmless; it triggers a virtualization exception (#VE). If the ACPI cache-flushing WBINVD were left in place, TDX guests would need handling to recover from the exception. Avoid using WBINVD whenever running under a hypervisor. This both removes the useless WBINVDs and saves TDX from implementing WBINVD handling. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-30-kirill.shutemov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/platform/uv: Update TSC sync state for UV5Mike Travis
[ Upstream commit bb3ab81bdbd53f88f26ffabc9fb15bd8466486ec ] The UV5 platform synchronizes the TSCs among all chassis, and will not proceed to OS boot without achieving synchronization. Previous UV platforms provided a register indicating successful synchronization. This is no longer available on UV5. On this platform TSC_ADJUST should not be reset by the kernel. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220406195149.228164-3-steve.wahl@hpe.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ptrace: Reimplement PTRACE_KILL by always sending SIGKILLEric W. Biederman
commit 6a2d90ba027adba528509ffa27097cffd3879257 upstream. The current implementation of PTRACE_KILL is buggy and has been for many years as it assumes it's target has stopped in ptrace_stop. At a quick skim it looks like this assumption has existed since ptrace support was added in linux v1.0. While PTRACE_KILL has been deprecated we can not remove it as a quick search with google code search reveals many existing programs calling it. When the ptracee is not stopped at ptrace_stop some fields would be set that are ignored except in ptrace_stop. Making the userspace visible behavior of PTRACE_KILL a noop in those case. As the usual rules are not obeyed it is not clear what the consequences are of calling PTRACE_KILL on a running process. Presumably userspace does not do this as it achieves nothing. Replace the implementation of PTRACE_KILL with a simple send_sig_info(SIGKILL) followed by a return 0. This changes the observable user space behavior only in that PTRACE_KILL on a process not stopped in ptrace_stop will also kill it. As that has always been the intent of the code this seems like a reasonable change. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEPEric W. Biederman
commit 4a3d2717d140401df7501a95e454180831a0c5af upstream. xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in user_enable_single_step and user_disable_single_step without locking could potentiallly cause problems. So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP that xtensa already had defined but unused. Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users. Cc: stable@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEPEric W. Biederman
commit c200e4bb44e80b343c09841e7caaaca0aac5e5fa upstream. User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate single stepping is a little confusing and worse changing tsk->ptrace without locking could potentionally cause problems. So use a thread info flag with a better name instead of flag in tsk->ptrace. Remove the definition PT_DTRACE as uml is the last user. Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09x86/sgx: Set active memcg prior to shmem allocationKristen Carlson Accardi
commit 0c9782e204d3cc5625b9e8bf4e8625d38dfe0139 upstream. When the system runs out of enclave memory, SGX can reclaim EPC pages by swapping to normal RAM. These backing pages are allocated via a per-enclave shared memory area. Since SGX allows unlimited over commit on EPC memory, the reclaimer thread can allocate a large number of backing RAM pages in response to EPC memory pressure. When the shared memory backing RAM allocation occurs during the reclaimer thread context, the shared memory is charged to the root memory control group, and the shmem usage of the enclave is not properly accounted for, making cgroups ineffective at limiting the amount of RAM an enclave can consume. For example, when using a cgroup to launch a set of test enclaves, the kernel does not properly account for 50% - 75% of shmem page allocations on average. In the worst case, when nearly all allocations occur during the reclaimer thread, the kernel accounts less than a percent of the amount of shmem used by the enclave's cgroup to the correct cgroup. SGX stores a list of mm_structs that are associated with an enclave. Pick one of them during reclaim and charge that mm's memcg with the shmem allocation. The one that gets picked is arbitrary, but this list almost always only has one mm. The cases where there is more than one mm with different memcg's are not worth considering. Create a new function - sgx_encl_alloc_backing(). This function is used whenever a new backing storage page needs to be allocated. Previously the same function was used for page allocation as well as retrieving a previously allocated page. Prior to backing page allocation, if there is a mm_struct associated with the enclave that is requesting the allocation, it is set as the active memory control group. [ dhansen: - fix merge conflict with ELDU fixes - check against actual ksgxd_tsk, not ->mm ] Cc: stable@vger.kernel.org Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lkml.kernel.org/r/20220520174248.4918-1-kristen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09x86/kexec: fix memory leak of elf header bufferBaoquan He
commit b3e34a47f98974d0844444c5121aaff123004e57 upstream. This is reported by kmemleak detector: unreferenced object 0xffffc900002a9000 (size 4096): comm "kexec", pid 14950, jiffies 4295110793 (age 373.951s) hex dump (first 32 bytes): 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............ 04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>............. backtrace: [<0000000016a8ef9f>] __vmalloc_node_range+0x101/0x170 [<000000002b66b6c0>] __vmalloc_node+0xb4/0x160 [<00000000ad40107d>] crash_prepare_elf64_headers+0x8e/0xcd0 [<0000000019afff23>] crash_load_segments+0x260/0x470 [<0000000019ebe95c>] bzImage64_load+0x814/0xad0 [<0000000093e16b05>] arch_kexec_kernel_image_load+0x1be/0x2a0 [<000000009ef2fc88>] kimage_file_alloc_init+0x2ec/0x5a0 [<0000000038f5a97a>] __do_sys_kexec_file_load+0x28d/0x530 [<0000000087c19992>] do_syscall_64+0x3b/0x90 [<0000000066e063a4>] entry_SYSCALL_64_after_hwframe+0x44/0xae In crash_prepare_elf64_headers(), a buffer is allocated via vmalloc() to store elf headers. While it's not freed back to system correctly when kdump kernel is reloaded or unloaded. Then memory leak is caused. Fix it by introducing x86 specific function arch_kimage_file_post_load_cleanup(), and freeing the buffer there. And also remove the incorrect elf header buffer freeing code. Before calling arch specific kexec_file loading function, the image instance has been initialized. So 'image->elf_headers' must be NULL. It doesn't make sense to free the elf header buffer in the place. Three different people have reported three bugs about the memory leak on x86_64 inside Redhat. Link: https://lkml.kernel.org/r/20220223113225.63106-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09perf/x86/intel: Fix event constraints for ICLKan Liang
commit 86dca369075b3e310c3c0adb0f81e513c562b5e4 upstream. According to the latest event list, the event encoding 0x55 INST_DECODED.DECODERS and 0x56 UOPS_DECODED.DEC0 are only available on the first 4 counters. Add them into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220525133952.1660658-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09x86/MCE/AMD: Fix memory leak when threshold_create_bank() failsAmmar Faizi
commit e5f28623ceb103e13fc3d7bd45edf9818b227fd0 upstream. In mce_threshold_create_device(), if threshold_create_bank() fails, the previously allocated threshold banks array @bp will be leaked because the call to mce_threshold_remove_device() will not free it. This happens because mce_threshold_remove_device() fetches the pointer through the threshold_banks per-CPU variable but bp is written there only after the bank creation is successful, and not before, when threshold_create_bank() fails. Add a helper which unwinds all the bank creation work previously done and pass into it the previously allocated threshold banks array for freeing. [ bp: Massage. ] Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path") Co-developed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220329104705.65256-3-ammarfaizi2@gnuweeb.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09riscv: Move alternative length validation into subsectionNathan Chancellor
commit 61114e734ccb804bc12561ab4020745e02c468c2 upstream. After commit 49b290e430d3 ("riscv: prevent compressed instructions in alternatives"), builds with LLVM's integrated assembler fail: In file included from arch/riscv/mm/init.c:10: In file included from ./include/linux/mm.h:29: In file included from ./include/linux/pgtable.h:6: In file included from ./arch/riscv/include/asm/pgtable.h:108: ./arch/riscv/include/asm/tlbflush.h:23:2: error: expected assembly-time absolute expression ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory")); ^ ./arch/riscv/include/asm/errata_list.h:33:5: note: expanded from macro 'ALT_FLUSH_TLB_PAGE' asm(ALTERNATIVE("sfence.vma %0", "sfence.vma", SIFIVE_VENDOR_ID, \ ^ ./arch/riscv/include/asm/alternative-macros.h:187:2: note: expanded from macro 'ALTERNATIVE' _ALTERNATIVE_CFG(old_content, new_content, vendor_id, errata_id, CONFIG_k) ^ ./arch/riscv/include/asm/alternative-macros.h:113:2: note: expanded from macro '_ALTERNATIVE_CFG' __ALTERNATIVE_CFG(old_c, new_c, vendor_id, errata_id, IS_ENABLED(CONFIG_k)) ^ ./arch/riscv/include/asm/alternative-macros.h:110:2: note: expanded from macro '__ALTERNATIVE_CFG' ALT_NEW_CONTENT(vendor_id, errata_id, enable, new_c) ^ ./arch/riscv/include/asm/alternative-macros.h:99:3: note: expanded from macro 'ALT_NEW_CONTENT' ".org . - (889b - 888b) + (887b - 886b)\n" \ ^ <inline asm>:26:6: note: instantiated into assembly here .org . - (889b - 888b) + (887b - 886b) ^ This error happens because LLVM's integrated assembler has a one-pass design, which means it cannot figure out the instruction lengths when the .org directive is outside of the subsection that contains the instructions, which was changed by the .option directives added by the above change. Move the .org directives before the .previous directive so that these directives are always within the same subsection, which resolves the failures and does not introduce any new issues with GNU as. This was done for arm64 in commit 966a0acce2fc ("arm64/alternatives: move length validation inside the subsection") and commit 22315a2296f4 ("arm64: alternatives: Move length validation in alternative_{insn, endif}"). While there is no error from the assembly versions of the macro, they appear to have the same problem so just make the same change there as well so that there are no problems in the future. Link: https://github.com/ClangBuiltLinux/linux/issues/1640 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220516214520.3252074-1-nathan@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09riscv: Wire up memfd_secret in UAPI headerTobias Klauser
commit 02d88b40cb2e9614e0117c3385afdce878f0d377 upstream. Move the __ARCH_WANT_MEMFD_SECRET define added in commit 7bb7f2ac24a0 ("arch, mm: wire up memfd_secret system call where relevant") to <uapi/asm/unistd.h> so __NR_memfd_secret is defined when including <unistd.h> in userspace. This allows the memfd_secret selftest to pass on riscv. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Link: https://lore.kernel.org/r/20220505081815.22808-1-tklauser@distanz.ch Fixes: 7bb7f2ac24a0 ("arch, mm: wire up memfd_secret system call where relevant") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09riscv: Fix irq_work when SMP is disabledSamuel Holland
commit 2273272823db6f67d57761df8116ae32e7f05bed upstream. irq_work is triggered via an IPI, but the IPI infrastructure is not included in uniprocessor kernels. As a result, irq_work never runs. Fall back to the tick-based irq_work implementation on uniprocessor configurations. Fixes: 298447928bb1 ("riscv: Support irq_work via self IPIs") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220430030025.58405-1-samuel@sholland.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09riscv: Initialize thread pointer before calling C functionsAlexandre Ghiti
commit 35d33c76d68dfacc330a8eb477b51cc647c5a847 upstream. Because of the stack canary feature that reads from the current task structure the stack canary value, the thread pointer register "tp" must be set before calling any C function from head.S: by chance, setup_vm and all the functions that it calls does not seem to be part of the functions where the canary check is done, but in the following commits, some functions will. Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09RISC-V: Mark IORESOURCE_EXCLUSIVE for reserved mem instead of IORESOURCE_BUSYXianting Tian
commit e61bf5c071148c80d091f8e7220b3b9130780ae3 upstream. Commit 00ab027a3b82 ("RISC-V: Add kernel image sections to the resource tree") marked IORESOURCE_BUSY for reserved memory, which caused resource map failed in subsequent operations of related driver, so remove the IORESOURCE_BUSY flag. In order to prohibit userland mapping reserved memory, mark IORESOURCE_EXCLUSIVE for it. The code to reproduce the issue, dts: mem0: memory@a0000000 { reg = <0x0 0xa0000000 0 0x1000000>; no-map; }; &test { status = "okay"; memory-region = <&mem0>; }; code: np = of_parse_phandle(pdev->dev.of_node, "memory-region", 0); ret = of_address_to_resource(np, 0, &r); base = devm_ioremap_resource(&pdev->dev, &r); // base = -EBUSY Fixes: 00ab027a3b82 ("RISC-V: Add kernel image sections to the resource tree") Reported-by: Huaming Jiang <jianghuaming.jhm@alibaba-inc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220518013428.1338983-1-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09parisc/stifb: Implement fb_is_primary_device()Helge Deller
commit cf936af790a3ef5f41ff687ec91bfbffee141278 upstream. Implement fb_is_primary_device() function, so that fbcon detects if this framebuffer belongs to the default graphics card which was used to start the system. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/sgx: Ensure no data in PCMD page after truncateReinette Chatre
commit e3a3bbe3e99de73043a1d32d36cf4d211dc58c7e upstream. A PCMD (Paging Crypto MetaData) page contains the PCMD structures of enclave pages that have been encrypted and moved to the shmem backing store. When all enclave pages sharing a PCMD page are loaded in the enclave, there is no need for the PCMD page and it can be truncated from the backing store. A few issues appeared around the truncation of PCMD pages. The known issues have been addressed but the PCMD handling code could be made more robust by loudly complaining if any new issue appears in this area. Add a check that will complain with a warning if the PCMD page is not actually empty after it has been truncated. There should never be data in the PCMD page at this point since it is was just checked to be empty and truncated with enclave mutex held and is updated with the enclave mutex held. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/6495120fed43fafc1496d09dd23df922b9a32709.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/sgx: Fix race between reclaimer and page fault handlerReinette Chatre
commit af117837ceb9a78e995804ade4726ad2c2c8981f upstream. Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock); /* * Fault ENCLAVE_B */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...) /* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP. If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed. Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: Haitao Huang <haitao.huang@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/sgx: Obtain backing storage page with enclave mutex heldReinette Chatre
commit 0e4e729a830c1e7f31d3b3fbf8feb355a402b117 upstream. Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling. An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex. Consider the scenario where a page is faulted while a page sharing a PCMD page with the faulted page is being reclaimed. The consequence is a race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch. The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held. In the scenario below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed: sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } } In this scenario there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed. Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang <haitao.huang@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/sgx: Mark PCMD page as dirty when modifying contentsReinette Chatre
commit 2154e1c11b7080aa19f47160bd26b6f39bbd7824 upstream. Recent commit 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave. Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained. Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/sgx: Disconnect backing page references from dirty statusReinette Chatre
commit 6bd429643cc265e94a9d19839c771bcc5d008fa8 upstream. SGX uses shmem backing storage to store encrypted enclave pages and their crypto metadata when enclave pages are moved out of enclave memory. Two shmem backing storage pages are associated with each enclave page - one backing page to contain the encrypted enclave page data and one backing page (shared by a few enclave pages) to contain the crypto metadata used by the processor to verify the enclave page when it is loaded back into the enclave. sgx_encl_put_backing() is used to release references to the backing storage and, optionally, mark both backing store pages as dirty. Managing references and dirty status together in this way results in both backing store pages marked as dirty, even if only one of the backing store pages are changed. Additionally, waiting until the page reference is dropped to set the page dirty risks a race with the page fault handler that may load outdated data into the enclave when a page is faulted right after it is reclaimed. Consider what happens if the reclaimer writes a page to the backing store and the page is immediately faulted back, before the reclaimer is able to set the dirty bit of the page: sgx_reclaim_pages() { sgx_vma_fault() { ... sgx_encl_get_backing(); ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* * Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } } Remove the option to sgx_encl_put_backing() to set the backing pages as dirty and set the needed pages as dirty right after receiving important data while enclave mutex is held. This ensures that the page fault handler can get up to date data from a page and prepares the code for a following change where only one of the backing pages need to be marked as dirty. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/ Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06ARM: dts: s5pv210: Correct interrupt name for bluetooth in AriesJonathan Bakker
commit 3f5e3d3a8b895c8a11da8b0063ba2022dd9e2045 upstream. Correct the name of the bluetooth interrupt from host-wake to host-wakeup. Fixes: 1c65b6184441b ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node") Cc: <stable@vger.kernel.org> Signed-off-by: Jonathan Bakker <xc-racer2@live.ca> Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB0567.namprd04.prod.outlook.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leakAshish Kalra
commit d22d2474e3953996f03528b84b7f52cc26a39403 upstream. For some sev ioctl interfaces, the length parameter that is passed maybe less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the allocated buffer, these sev ioctl interface may return uninitialized kernel slab memory. Reported-by: Andy Nguyen <theflow@google.com> Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Peter Gonda <pgonda@google.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file") Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command") Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command") Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Peter Gonda <pgonda@google.com> Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2Sean Christopherson
commit 45846661d10422ce9e22da21f8277540b29eca22 upstream. Remove WARNs that sanity check that KVM never lets a triple fault for L2 escape and incorrectly end up in L1. In normal operation, the sanity check is perfectly valid, but it incorrectly assumes that it's impossible for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through KVM_RUN (which guarantees kvm_check_nested_state() will see and handle the triple fault). The WARN can currently be triggered if userspace injects a machine check while L2 is active and CR4.MCE=0. And a future fix to allow save/restore of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't lost on migration, will make it trivially easy for userspace to trigger the WARN. Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is tempting, but wrong, especially if/when the request is saved/restored, e.g. if userspace restores events (including a triple fault) and then restores nested state (which may forcibly leave guest mode). Ignoring the fact that KVM doesn't currently provide the necessary APIs, it's userspace's responsibility to manage pending events during save/restore. ------------[ cut here ]------------ WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Modules linked in: kvm_intel kvm irqbypass CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Call Trace: <TASK> vmx_leave_nested+0x30/0x40 [kvm_intel] vmx_set_nested_state+0xca/0x3e0 [kvm_intel] kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm] kvm_vcpu_ioctl+0x4b9/0x660 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]--- Fixes: cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1") Cc: stable@vger.kernel.org Cc: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220407002315.78092-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>