commit 57d3dcea476e924eb6eeeb7a585692edf5c8bd23 Author: Greg Kroah-Hartman Date: Tue May 22 18:56:31 2018 +0200 Linux 4.16.11 commit 75e3417f898fe1d2451e7ceffe93db5c66772b0a Author: Alexei Starovoitov Date: Tue May 15 09:27:05 2018 -0700 bpf: Prevent memory disambiguation attack commit af86ca4e3088fe5eacf2f7e58c01fa68ca067672 upstream Detect code patterns where malicious 'speculative store bypass' can be used and sanitize such patterns. 39: (bf) r3 = r10 40: (07) r3 += -216 41: (79) r8 = *(u64 *)(r7 +0) // slow read 42: (7a) *(u64 *)(r10 -72) = 0 // verifier inserts this instruction 43: (7b) *(u64 *)(r8 +0) = r3 // this store becomes slow due to r8 44: (79) r1 = *(u64 *)(r6 +0) // cpu speculatively executes this load 45: (71) r2 = *(u8 *)(r1 +0) // speculatively arbitrary 'load byte' // is now sanitized Above code after x86 JIT becomes: e5: mov %rbp,%rdx e8: add $0xffffffffffffff28,%rdx ef: mov 0x0(%r13),%r14 f3: movq $0x0,-0x48(%rbp) fb: mov %rdx,0x0(%r14) ff: mov 0x0(%rbx),%rdi 103: movzbq 0x0(%rdi),%rsi Signed-off-by: Alexei Starovoitov Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 677af592349708498671b0d9290912acb2f203e4 Author: Konrad Rzeszutek Wilk Date: Wed May 16 23:18:09 2018 -0400 x86/bugs: Rename SSBD_NO to SSB_NO commit 240da953fcc6a9008c92fae5b1f727ee5ed167ab upstream The "336996 Speculative Execution Side Channel Mitigations" from May defines this as SSB_NO, hence lets sync-up. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 7445962ff2d652bb957722ca1a08a92d09f3e5d7 Author: Tom Lendacky Date: Thu May 10 22:06:39 2018 +0200 KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD commit bc226f07dcd3c9ef0b7f6236fe356ea4a9cb4769 upstream Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 95271aeb93d4681c65e2f94969b23ef6070367a6 Author: Thomas Gleixner Date: Thu May 10 20:42:48 2018 +0200 x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG commit 47c61b3955cf712cadfc25635bf9bc174af030ea upstream Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl argument to check whether the state must be modified on the host. The update reuses speculative_store_bypass_update() so the ZEN-specific sibling coordination can be reused. Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit bd4b410bc5ea560107126a3df18e9233baaec9f3 Author: Thomas Gleixner Date: Sat May 12 20:10:00 2018 +0200 x86/bugs: Rework spec_ctrl base and mask logic commit be6fcb5478e95bb1c91f489121238deb3abca46a upstream x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value which are not to be modified. However the implementation is not really used and the bitmask was inverted to make a check easier, which was removed in "x86/bugs: Remove x86_spec_ctrl_set()" Aside of that it is missing the STIBP bit if it is supported by the platform, so if the mask would be used in x86_virt_spec_ctrl() then it would prevent a guest from setting STIBP. Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to sanitize the value which is supplied by the guest. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman commit dde9807143e7580c3832858f62001742011da264 Author: Thomas Gleixner Date: Sat May 12 20:53:14 2018 +0200 x86/bugs: Remove x86_spec_ctrl_set() commit 4b59bdb569453a60b752b274ca61f009e37f4dae upstream x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there provide no real value as both call sites can just write x86_spec_ctrl_base to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra masking or checking. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 6f350863c98e0f95634178ea81bc8ff08db14cab Author: Thomas Gleixner Date: Sat May 12 20:49:16 2018 +0200 x86/bugs: Expose x86_spec_ctrl_base directly commit fa8ac4988249c38476f6ad678a4848a736373403 upstream x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR. x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to prevent modification to that variable. Though the variable is read only after init and globaly visible already. Remove the function and export the variable instead. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit aaf6e76e6a30481458ce7d87ec6d35b39903fed6 Author: Borislav Petkov Date: Sat May 12 00:14:51 2018 +0200 x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} commit cc69b34989210f067b2c51d5539b5f96ebcc3a01 upstream Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 6a4872a0cb222aa6e246aac5fe8a9ba397fde560 Author: Thomas Gleixner Date: Thu May 10 20:31:44 2018 +0200 x86/speculation: Rework speculative_store_bypass_update() commit 0270be3e34efb05a88bc4c422572ece038ef3608 upstream The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 10e436d078ada2ed9f429035d35e70a03b2d9891 Author: Tom Lendacky Date: Thu May 17 17:09:18 2018 +0200 x86/speculation: Add virtualized speculative store bypass disable support commit 11fb0683493b2da112cd64c9dada221b52463bf7 upstream Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman commit 8bddd4295a6c0f5ad2955f96026e6d74e1a8d0d9 Author: Thomas Gleixner Date: Wed May 9 23:01:01 2018 +0200 x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 2658f4c3abe23fbb3a6cfafcfe0139f5b0f16f3c Author: Thomas Gleixner Date: Wed May 9 21:53:09 2018 +0200 x86/speculation: Handle HT correctly on AMD commit 1f50ddb4f4189243c05926b842dc1a0332195f31 upstream The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 6e03d4bed37848762a34f14ca7059435addbd42c Author: Thomas Gleixner Date: Thu May 10 16:26:00 2018 +0200 x86/cpufeatures: Add FEATURE_ZEN commit d1035d971829dcf80e8686ccde26f94b0a069472 upstream Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 0f106929848115b57fc2029b036499313df1491e Author: Thomas Gleixner Date: Thu May 10 20:21:36 2018 +0200 x86/cpufeatures: Disentangle SSBD enumeration commit 52817587e706686fcdb27f14c1b000c92f266c96 upstream The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 44e405201570aed0cdc3f1eeb018fa21d72d69a5 Author: Thomas Gleixner Date: Thu May 10 19:13:18 2018 +0200 x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS commit 7eb8956a7fec3c1f0abc2a5517dada99ccc8a961 upstream The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit df35c3e66e6da210fed4a011722644cf1de590dd Author: Borislav Petkov Date: Wed May 2 18:15:14 2018 +0200 x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP commit e7c587da125291db39ddf1f49b18e5970adbac17 upstream Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Jörg Otte Cc: Linus Torvalds Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic Signed-off-by: Greg Kroah-Hartman commit 7eed4a87774784b39613d67a3be356421b834a61 Author: Thomas Gleixner Date: Fri May 11 15:21:01 2018 +0200 KVM: SVM: Move spec control call after restore of GS commit 15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765 upstream svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 6384a9130622ac42ccde466e499c5bfbe3e38f89 Author: Jim Mattson Date: Sun May 13 17:33:57 2018 -0400 x86/cpu: Make alternative_msr_write work for 32-bit code commit 5f2b745f5e1304f438f9b2cd03ebc8120b6e0d3b upstream Cast val and (val >> 32) to (u32), so that they fit in a general-purpose register in both 32-bit and 64-bit code. [ tglx: Made it u32 instead of uintptr_t ] Fixes: c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") Signed-off-by: Jim Mattson Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Acked-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c3a8aab35b7f162aeecfe4d2de0bc24a692c3b24 Author: Konrad Rzeszutek Wilk Date: Fri May 11 16:50:35 2018 -0400 x86/bugs: Fix the parameters alignment and missing void commit ffed645e3be0e32f8e9ab068d257aee8d0fe8eec upstream Fixes: 7bb4d366c ("x86/bugs: Make cpu_show_common() static") Fixes: 24f7fc83b ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 08140850070bd14cf4fc3fc3f3c60eb55f6b9a3c Author: Jiri Kosina Date: Thu May 10 22:47:32 2018 +0200 x86/bugs: Make cpu_show_common() static commit 7bb4d366cba992904bffa4820d24e70a3de93e76 upstream cpu_show_common() is not used outside of arch/x86/kernel/cpu/bugs.c, so make it static. Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit bb82d388b5e860aed39d6bd4202ed3befe1a58d0 Author: Jiri Kosina Date: Thu May 10 22:47:18 2018 +0200 x86/bugs: Fix __ssb_select_mitigation() return type commit d66d8ff3d21667b41eddbe86b35ab411e40d8c5f upstream __ssb_select_mitigation() returns one of the members of enum ssb_mitigation, not ssb_mitigation_cmd; fix the prototype to reflect that. Fixes: 24f7fc83b9204 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit b5e5979dba3936bce29bb5e733c5bfeddd5d2c88 Author: Borislav Petkov Date: Tue May 8 15:43:45 2018 +0200 Documentation/spec_ctrl: Do some minor cleanups commit dd0792699c4058e63c0715d9a7c2d40226fcdddc upstream Fix some typos, improve formulations, end sentences with a fullstop. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit feb2788b090bac2201af4517b7c341cb3f6bd53a Author: Konrad Rzeszutek Wilk Date: Wed May 9 21:41:38 2018 +0200 proc: Use underscores for SSBD in 'status' commit e96f46ee8587607a828f783daa6eb5b44d25004d upstream The style for the 'status' file is CamelCase or this. _. Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 96867367cf81cc2cc5f8ed5d43102665be563f77 Author: Konrad Rzeszutek Wilk Date: Wed May 9 21:41:38 2018 +0200 x86/bugs: Rename _RDS to _SSBD commit 9f65fb29374ee37856dbad847b4e121aab72b510 upstream Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 9378c64a76a07988b933383d947f919fc8a1d4ac Author: Kees Cook Date: Thu May 3 14:37:54 2018 -0700 x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass commit f21b53b20c754021935ea43364dbf53778eeba32 upstream Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit a92b6ffb737fe3214251d3827bddefc012ead5a6 Author: Thomas Gleixner Date: Fri May 4 15:12:06 2018 +0200 seccomp: Move speculation migitation control to arch code commit 8bf37d8c067bb7eb8e7c381bdadf9bd89182b6bc upstream The migitation control is simpler to implement in architecture code as it avoids the extra function call to check the mode. Aside of that having an explicit seccomp enabled mode in the architecture mitigations would require even more workarounds. Move it into architecture code and provide a weak function in the seccomp code. Remove the 'which' argument as this allows the architecture to decide which mitigations are relevant for seccomp. Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 2f9083d1c00e05c1ae128d7b1b683067ea07d0b7 Author: Kees Cook Date: Thu May 3 14:56:12 2018 -0700 seccomp: Add filter flag to opt-out of SSB mitigation commit 00a02d0c502a06d15e07b857f8ff921e3e402675 upstream If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 04832dda9484916a728d2524cef5088b3d567626 Author: Thomas Gleixner Date: Fri May 4 09:40:03 2018 +0200 seccomp: Use PR_SPEC_FORCE_DISABLE commit b849a812f7eb92e96d1c8239b06581b2cfd8b275 upstream Use PR_SPEC_FORCE_DISABLE in seccomp() because seccomp does not allow to widen restrictions. Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 080b78edb23cadf8125118ea9be4390a737b1aea Author: Thomas Gleixner Date: Thu May 3 22:09:15 2018 +0200 prctl: Add force disable speculation commit 356e4bfff2c5489e016fdb925adbf12a1e3950ee upstream For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 6672d85a43c4aab30840a3654abec7aead1150cd Author: Kees Cook Date: Thu May 3 15:03:30 2018 -0700 x86/bugs: Make boot modes __ro_after_init commit f9544b2b076ca90d887c5ae5d74fab4c21bb7c13 upstream There's no reason for these to be changed after boot. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 62fffc7129840290e191569d02e406d356037804 Author: Kees Cook Date: Tue May 1 15:07:31 2018 -0700 seccomp: Enable speculation flaw mitigations commit 5c3070890d06ff82eecb808d02d2ca39169533ef upstream When speculation flaw mitigations are opt-in (via prctl), using seccomp will automatically opt-in to these protections, since using seccomp indicates at least some level of sandboxing is desired. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 7074687d3a6539d5204c1c8310e98d1714094b2f Author: Kees Cook Date: Tue May 1 15:31:45 2018 -0700 proc: Provide details on speculation flaw mitigations commit fae1fa0fc6cca8beee3ab8ed71d54f9a78fa3f64 upstream As done with seccomp and no_new_privs, also show speculation flaw mitigation state in /proc/$pid/status. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 280ea3678f2465851c4f71c248fad54f9f2ca711 Author: Kees Cook Date: Tue May 1 15:19:04 2018 -0700 nospec: Allow getting/setting on non-current task commit 7bbf1373e228840bb0295a2ca26d548ef37f448e upstream Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than current. This is needed both for /proc/$pid/status queries and for seccomp (since thread-syncing can trigger seccomp in non-current threads). Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 54d97c1d92a10428f4fbe1b39d9cc668b7bc505a Author: Thomas Gleixner Date: Sun Apr 29 15:26:40 2018 +0200 x86/speculation: Add prctl for Speculative Store Bypass mitigation commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 556f9159f47d0bec7e10a16c39af2e471d64d1db Author: Thomas Gleixner Date: Sun Apr 29 15:21:42 2018 +0200 x86/process: Allow runtime control of Speculative Store Bypass commit 885f82bfbc6fefb6664ea27965c3ab9ac4194b8c upstream The Speculative Store Bypass vulnerability can be mitigated with the Reduced Data Speculation (RDS) feature. To allow finer grained control of this eventually expensive mitigation a per task mitigation control is required. Add a new TIF_RDS flag and put it into the group of TIF flags which are evaluated for mismatch in switch_to(). If these bits differ in the previous and the next task, then the slow path function __switch_to_xtra() is invoked. Implement the TIF_RDS dependent mitigation control in the slow path. If the prctl for controlling Speculative Store Bypass is disabled or no task uses the prctl then there is no overhead in the switch_to() fast path. Update the KVM related speculation control functions to take TID_RDS into account as well. Based on a patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit a06b21c754e819fa7cff7665ce9dba68dc874d2a Author: Thomas Gleixner Date: Sun Apr 29 15:20:11 2018 +0200 prctl: Add speculation control prctls commit b617cfc858161140d69cc0b5cc211996b557a1c7 upstream Add two new prctls to control aspects of speculation related vulnerabilites and their mitigations to provide finer grained control over performance impacting mitigations. PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature which is selected with arg2 of prctl(2). The return value uses bit 0-2 with the following meaning: Bit Define Description 0 PR_SPEC_PRCTL Mitigation can be controlled per task by PR_SET_SPECULATION_CTRL 1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is disabled 2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is enabled If all bits are 0 the CPU is not affected by the speculation misfeature. If PR_SPEC_PRCTL is set, then the per task control of the mitigation is available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation misfeature will fail. PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which is selected by arg2 of prctl(2) per task. arg3 is used to hand in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE. The common return values are: EINVAL prctl is not implemented by the architecture or the unused prctl() arguments are not 0 ENODEV arg2 is selecting a not supported speculation misfeature PR_SET_SPECULATION_CTRL has these additional return values: ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE ENXIO prctl control of the selected speculation misfeature is disabled The first supported controlable speculation misfeature is PR_SPEC_STORE_BYPASS. Add the define so this can be shared between architectures. Based on an initial patch from Tim Chen and mostly rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 2cd9a9a41a70ce5bdad4b82850cf3cbbedd8f484 Author: Thomas Gleixner Date: Sun Apr 29 15:01:37 2018 +0200 x86/speculation: Create spec-ctrl.h to avoid include hell commit 28a2775217b17208811fa43a9e96bd1fdf417b86 upstream Having everything in nospec-branch.h creates a hell of dependencies when adding the prctl based switching mechanism. Move everything which is not required in nospec-branch.h to spec-ctrl.h and fix up the includes in the relevant files. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit efa66ff263de7f5bb2655240c3a95d5a802e8289 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:25 2018 -0400 x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest commit da39556f66f5cfe8f9c989206974f1cb16ca5d7c upstream Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 57c8073bcd42ac63751c6150a32cec9621e32f61 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:24 2018 -0400 x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested commit 764f3c21588a059cd783c6ba0734d4db2d72822d upstream AMD does not need the Speculative Store Bypass mitigation to be enabled. The parameters for this are already available and can be done via MSR C001_1020. Each family uses a different bit in that MSR for this. [ tglx: Expose the bit mask via a variable and move the actual MSR fiddling into the bugs code as that's the right thing to do and also required to prepare for dynamic enable/disable ] Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 88e65eda8b0db4245a6ee8eea873a307824f057c Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:23 2018 -0400 x86/bugs: Whitelist allowed SPEC_CTRL MSR values commit 1115a859f33276fe8afb31c60cf9d8e657872558 upstream Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the future (or in fact use different MSRs for the same functionality). As such a run-time mechanism is required to whitelist the appropriate MSR values. [ tglx: Made the variable __ro_after_init ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 67fc823943c0ecc1f7aba293ef264c42413a7082 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:22 2018 -0400 x86/bugs/intel: Set proper CPU features and setup RDS commit 772439717dbf703b39990be58d8d4e3e4ad0598a upstream Intel CPUs expose methods to: - Detect whether RDS capability is available via CPUID.7.0.EDX[31], - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS. - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS. With that in mind if spec_store_bypass_disable=[auto,on] is selected set at boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it. Note that this does not fix the KVM case where the SPEC_CTRL is exposed to guests which can muck with it, see patch titled : KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS. And for the firmware (IBRS to be set), see patch titled: x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits [ tglx: Distangled it from the intel implementation and kept the call order ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit f3e6aff543a560ac8b9373cd50a8427443ab181f Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:21 2018 -0400 x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation commit 24f7fc83b9204d20f878c57cb77d261ae825e033 upstream Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 64656a6bb5a0203c7dbd5ca49788d1177d385c2f Author: Konrad Rzeszutek Wilk Date: Sat Apr 28 22:34:17 2018 +0200 x86/cpufeatures: Add X86_FEATURE_RDS commit 0cc5fa00b0a88dad140b4e5c2cead9951ad36822 upstream Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU supports Reduced Data Speculation. [ tglx: Split it out from a later patch ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 569e3b16770b6d3c8ea08bb41678473f786868a3 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:20 2018 -0400 x86/bugs: Expose /sys/../spec_store_bypass commit c456442cd3a59eeb1d60293c26cbe2ff2c4e42cf upstream Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 2460962b14b78b47ebfeb744bd9e09d813c8236d Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:19 2018 -0400 x86/bugs, KVM: Support the combination of guest and host IBRS commit 5cf687548705412da47c9cec342fd952d71ed3d5 upstream A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 4fa760f200941e88187c0241ce5df72e8ec9cd97 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:18 2018 -0400 x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits commit 1b86883ccb8d5d9506529d42dbe1a5257cb30b18 upstream The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all the other bits as reserved. The Intel SDM glossary defines reserved as implementation specific - aka unknown. As such at bootup this must be taken it into account and proper masking for the bits in use applied. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Made x86_spec_ctrl_base __ro_after_init ] Suggested-by: Jon Masters Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 0e303bbda22ac4a655f0a2bfdd51cda209562ddb Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:17 2018 -0400 x86/bugs: Concentrate bug reporting into a separate function commit d1059518b4789cabe34bb4b714d07e6089c82ca1 upstream Those SysFS functions have a similar preamble, as such make common code to handle them. Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit d1ee580200e9937cc4e3f0ff1d45c3cfb2532f9e Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:16 2018 -0400 x86/bugs: Concentrate bug detection into a separate function commit 4a28bfe3267b68e22c663ac26185aa16c9b879ef upstream Combine the various logic which goes through all those x86_cpu_id matching structures in one function. Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 2cd883a4cc87871db17dbc52398a58321af209b1 Author: Linus Torvalds Date: Tue May 1 15:55:51 2018 +0200 x86/nospec: Simplify alternative_msr_write() commit 1aa7a5735a41418d8e01fa7c9565eb2657e2ea3f upstream The macro is not type safe and I did look for why that "g" constraint for the asm doesn't work: it's because the asm is more fundamentally wrong. It does movl %[val], %%eax but "val" isn't a 32-bit value, so then gcc will pass it in a register, and generate code like movl %rsi, %eax and gas will complain about a nonsensical 'mov' instruction (it's moving a 64-bit register to a 32-bit one). Passing it through memory will just hide the real bug - gcc still thinks the memory location is 64-bit, but the "movl" will only load the first 32 bits and it all happens to work because x86 is little-endian. Convert it to a type safe inline function with a little trick which hands the feature into the ALTERNATIVE macro. Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 7f9dbf2e3fc93d9d14838456e0bdb53cea0e04d8 Author: Liu Bo Date: Wed May 16 01:37:36 2018 +0800 btrfs: fix reading stale metadata blocks after degraded raid1 mounts commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream. If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e. btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk. A simple PoC is included in btrfs/124, 0. A two-disk raid1 btrfs, 1. Right after mkfs.btrfs, block A is allocated to be device tree's root. 2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet. 3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list. 4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem. 5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A. 6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk. To avoid the problem, parent transid needs to be passed to read_tree_block(). In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child. This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'. Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Liu Bo Reviewed-by: Filipe Manana Reviewed-by: Qu Wenruo [ update changelog ] Signed-off-by: David Sterba Signed-off-by: Nikolay Borisov Signed-off-by: Greg Kroah-Hartman commit 07703adbc067040e9b2b3c5d0c1e6cc08c0cfa78 Author: Nikolay Borisov Date: Fri Apr 27 12:21:53 2018 +0300 btrfs: Fix delalloc inodes invalidation during transaction abort commit fe816d0f1d4c31c4c31d42ca78a87660565fc800 upstream. When a transaction is aborted btrfs_cleanup_transaction is called to cleanup all the various in-flight bits and pieces which migth be active. One of those is delalloc inodes - inodes which have dirty pages which haven't been persisted yet. Currently the process of freeing such delalloc inodes in exceptional circumstances such as transaction abort boiled down to calling btrfs_invalidate_inodes whose sole job is to invalidate the dentries for all inodes related to a root. This is in fact wrong and insufficient since such delalloc inodes will likely have pending pages or ordered-extents and will be linked to the sb->s_inode_list. This means that unmounting a btrfs instance with an aborted transaction could potentially lead inodes/their pages visible to the system long after their superblock has been freed. This in turn leads to a "use-after-free" situation once page shrink is triggered. This situation could be simulated by running generic/019 which would cause such inodes to be left hanging, followed by generic/176 which causes memory pressure and page eviction which lead to touching the freed super block instance. This situation is additionally detected by the unmount code of VFS with the following message: "VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..." Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); in free_fs_root for the same reason. This patch aims to rectify the sitaution by doing the following: 1. Change btrfs_destroy_delalloc_inodes so that it calls invalidate_inode_pages2 for every inode on the delalloc list, this ensures that all the pages of the inode are released. This function boils down to calling btrfs_releasepage. During test I observed cases where inodes on the delalloc list were having an i_count of 0, so this necessitates using igrab to be sure we are working on a non-freed inode. 2. Since calling btrfs_releasepage might queue delayed iputs move the call out to btrfs_cleanup_transaction in btrfs_error_commit_super before calling run_delayed_iputs for the last time. This is necessary to ensure that delayed iputs are run. Note: this patch is tagged for 4.14 stable but the fix applies to older versions too but needs to be backported manually due to conflicts. CC: stable@vger.kernel.org # 4.14.x: 2b8773313494: btrfs: Split btrfs_del_delalloc_inode into 2 functions CC: stable@vger.kernel.org # 4.14.x Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba [ add comment to igrab ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit c1bdbf9411f94285743708d1daa25e6d58374adc Author: Nikolay Borisov Date: Fri Apr 27 12:21:51 2018 +0300 btrfs: Split btrfs_del_delalloc_inode into 2 functions commit 2b8773313494ede83a26fb372466e634564002ed upstream. This is in preparation of fixing delalloc inodes leakage on transaction abort. Also export the new function. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Reviewed-by: Anand Jain Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit f0c4445251851006d5b86b28dee278a9d05dca2d Author: Anand Jain Date: Thu May 17 15:16:51 2018 +0800 btrfs: fix crash when trying to resume balance without the resume flag commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream. We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug: kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8 Reproducer: On a mounted filesystem: btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async(). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Anand Jain Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 1804ea9e8fab1ed99438eeef1be3d5b0ed9ff9a1 Author: Misono Tomohiro Date: Tue May 15 16:51:26 2018 +0900 btrfs: property: Set incompat flag if lzo/zstd compression is set commit 1a63c198ddb810c790101d693c7071cca703b3c7 upstream. Incompat flag of LZO/ZSTD compression should be set at: 1. mount time (-o compress/compress-force) 2. when defrag is done 3. when property is set Currently 3. is missing and this commit adds this. This could lead to a filesystem that uses ZSTD but is not marked as such. If a kernel without a ZSTD support encounteres a ZSTD compressed extent, it will handle that but this could be confusing to the user. Typically the filesystem is mounted with the ZSTD option, but the discrepancy can arise when a filesystem is never mounted with ZSTD and then the property on some file is set (and some new extents are written). A simple mount with -o compress=zstd will fix that up on an unpatched kernel. Same goes for LZO, but this has been around for a very long time (2.6.37) so it's unlikely that a pre-LZO kernel would be used. Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Tomohiro Misono Reviewed-by: Anand Jain Reviewed-by: David Sterba [ add user visible impact ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 1f0bf6658743532daa32169b360ca4d0d5f6d3f6 Author: Robbie Ko Date: Mon May 14 10:51:34 2018 +0800 Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting commit 6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2 upstream. [BUG] btrfs incremental send BUG happens when creating a snapshot of snapshot that is being used by send. [REASON] The problem can happen if while we are doing a send one of the snapshots used (parent or send) is snapshotted, because snapshoting implies COWing the root of the source subvolume/snapshot. 1. When doing an incremental send, the send process will get the commit roots from the parent and send snapshots, and add references to them through extent_buffer_get(). 2. When a snapshot/subvolume is snapshotted, its root node is COWed (transaction.c:create_pending_snapshot()). 3. COWing releases the space used by the node immediately, through: __btrfs_cow_block() --btrfs_free_tree_block() ----btrfs_add_free_space(bytenr of node) 4. Because send doesn't hold a transaction open, it's possible that the transaction used to create the snapshot commits, switches the commit root and the old space used by the previous root node gets assigned to some other node allocation. Allocation of a new node will use the existing extent buffer found in memory, which we previously got a reference through extent_buffer_get(), and allow the extent buffer's content (pages) to be modified: btrfs_alloc_tree_block --btrfs_reserve_extent ----find_free_extent (get bytenr of old node) --btrfs_init_new_buffer (use bytenr of old node) ----btrfs_find_create_tree_block ------alloc_extent_buffer --------find_extent_buffer (get old node) 5. So send can access invalid memory content and have unpredictable behaviour. [FIX] So we fix the problem by copying the commit roots of the send and parent snapshots and use those copies. CallTrace looks like this: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1861! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721 ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000 RIP: 0010:[] read_node_slot+0x108/0x110 [btrfs] RSP: 0018:ffff88041b723b68 EFLAGS: 00010246 RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000 RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66 R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890 R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001 FS: 00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000) CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880 ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50 ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400 Call Trace: [] ? tree_move_down.isra.33+0x27/0x50 [btrfs] [] ? tree_advance+0xb5/0xc0 [btrfs] [] ? btrfs_compare_trees+0x2d4/0x760 [btrfs] [] ? finish_inode_if_needed+0x870/0x870 [btrfs] [] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs] [] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs] [] ? handle_pte_fault+0x373/0x990 [] ? atomic_notifier_call_chain+0x16/0x20 [] ? set_task_cpu+0xb6/0x1d0 [] ? handle_mm_fault+0x143/0x2a0 [] ? __do_page_fault+0x1d0/0x500 [] ? check_preempt_curr+0x57/0x90 [] ? do_vfs_ioctl+0x4aa/0x990 [] ? do_fork+0x113/0x3b0 [] ? trace_hardirqs_off_thunk+0x3a/0x6c [] ? SyS_ioctl+0x88/0xa0 [] ? system_call_fastpath+0x16/0x1b ---[ end trace 29576629ee80b2e1 ]--- Fixes: 7069830a9e38 ("Btrfs: add btrfs_compare_trees function") CC: stable@vger.kernel.org # 3.6+ Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 492ed01a8c6028f28b331b1cdd47068ec4653a10 Author: Filipe Manana Date: Fri May 11 16:42:42 2018 +0100 Btrfs: fix xattr loss after power failure commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream. If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure. Trivial reproducer $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ sync $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar $ So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode. Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: # 4.2+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit f0eeac5ef7364ae3c764e30c8754d8d89b2ddb16 Author: Masami Hiramatsu Date: Sun May 13 05:04:29 2018 +0100 ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions commit 0d73c3f8e7f6ee2aab1bb350f60c180f5ae21a2c upstream. Since do_undefinstr() uses get_user to get the undefined instruction, it can be called before kprobes processes recursive check. This can cause an infinit recursive exception. Prohibit probing on get_user functions. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu Cc: stable@vger.kernel.org Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 23ec31c531c9f3d1fb48cc0a318f33bb1eb76cbc Author: Masami Hiramatsu Date: Sun May 13 05:04:10 2018 +0100 ARM: 8770/1: kprobes: Prohibit probing on optimized_callback commit 70948c05fdde0aac32f9667856a88725c192fa40 upstream. Prohibit probing on optimized_callback() because it is called from kprobes itself. If we put a kprobes on it, that will cause a recursive call loop. Mark it NOKPROBE_SYMBOL. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu Cc: stable@vger.kernel.org Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 7fc53e4641486e7d77b9a0d73bbff32ec0d934d7 Author: Masami Hiramatsu Date: Sun May 13 05:03:54 2018 +0100 ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed commit 69af7e23a6870df2ea6fa79ca16493d59b3eebeb upstream. Since get_kprobe_ctlblk() uses smp_processor_id() to access per-cpu variable, it hits smp_processor_id sanity check as below. [ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 7.007859] caller is debug_smp_processor_id+0x20/0x24 [ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1 [ 7.008890] Hardware name: Generic DT based system [ 7.009917] [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [ 7.010473] [] (show_stack) from [] (dump_stack+0x84/0x98) [ 7.010990] [] (dump_stack) from [] (check_preemption_disabled+0x138/0x13c) [ 7.011592] [] (check_preemption_disabled) from [] (debug_smp_processor_id+0x20/0x24) [ 7.012214] [] (debug_smp_processor_id) from [] (optimized_callback+0x2c/0xe4) [ 7.013077] [] (optimized_callback) from [] (0xbf0021b0) To fix this issue, call get_kprobe_ctlblk() right after irq-disabled since that disables preemption. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu Cc: stable@vger.kernel.org Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit ac91a763c3959a2e510fbc5c3fbac64f9ffb506c Author: Dexuan Cui Date: Tue May 15 19:52:50 2018 +0000 tick/broadcast: Use for_each_cpu() specially on UP kernels commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream. for_each_cpu() unintuitively reports CPU0 as set independent of the actual cpumask content on UP kernels. This causes an unexpected PIT interrupt storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as a result, the virtual machine can suffer from a strange random delay of 1~20 minutes during boot-up, and sometimes it can hang forever. Protect if by checking whether the cpumask is empty before entering the for_each_cpu() loop. [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ] Signed-off-by: Dexuan Cui Signed-off-by: Thomas Gleixner Cc: Josh Poulson Cc: "Michael Kelley (EOSG)" Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: stable@vger.kernel.org Cc: Rakib Mullick Cc: Jork Loeser Cc: Greg Kroah-Hartman Cc: Andrew Morton Cc: KY Srinivasan Cc: Linus Torvalds Cc: Alexey Dobriyan Cc: Dmitry Vyukov Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Signed-off-by: Greg Kroah-Hartman commit 0e905442ce89ca71892da62223854a931c5a66e6 Author: Dmitry Safonov Date: Fri May 18 00:35:10 2018 +0100 x86/mm: Drop TS_COMPAT on 64-bit exec() syscall commit acf46020012ccbca1172e9c7aeab399c950d9212 upstream. The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 __set_personality_x32() was dropping TS_COMPAT flag, but set_personality_64bit() has kept compat syscall flag making in_compat_syscall() return true during the first exec() syscall. Which in result has user-visible effects, mentioned by Alexey: 1) It breaks ASAN $ gcc -fsanitize=address wrap.c -o wrap-asan $ ./wrap32 ./wrap-asan true ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==1217==Process memory map follows: 0x000000400000-0x000000401000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000600000-0x000000601000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000601000-0x000000602000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x0000f7dbd000-0x0000f7de2000 /lib64/ld-2.27.so 0x0000f7fe2000-0x0000f7fe3000 /lib64/ld-2.27.so 0x0000f7fe3000-0x0000f7fe4000 /lib64/ld-2.27.so 0x0000f7fe4000-0x0000f7fe5000 0x7fed9abff000-0x7fed9af54000 0x7fed9af54000-0x7fed9af6b000 /lib64/libgcc_s.so.1 [snip] 2) It doesn't seem to be great for security if an attacker always knows that ld.so is going to be mapped into the first 4GB in this case (the same thing happens for PIEs as well). The testcase: $ cat wrap.c int main(int argc, char *argv[]) { execvp(argv[1], &argv[1]); return 127; } $ gcc wrap.c -o wrap $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE AT_BASE: 0x7f63b8309000 AT_BASE: 0x7faec143c000 AT_BASE: 0x7fbdb25fa000 $ gcc -m32 wrap.c -o wrap32 $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE AT_BASE: 0xf7eff000 AT_BASE: 0xf7cee000 AT_BASE: 0x7f8b9774e000 Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec") Reported-by: Alexey Izbyshev Bisected-by: Alexander Monakov Investigated-by: Andy Lutomirski Signed-off-by: Dmitry Safonov Signed-off-by: Thomas Gleixner Reviewed-by: Cyrill Gorcunov Cc: Borislav Petkov Cc: Alexander Monakov Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andy Lutomirski Cc: "H. Peter Anvin" Cc: Cyrill Gorcunov Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com Signed-off-by: Greg Kroah-Hartman commit 8999ad72a8e5fb12b0612778ced226baf16a6bbb Author: Thomas Gleixner Date: Thu May 17 14:36:39 2018 +0200 x86/apic/x2apic: Initialize cluster ID properly commit fed71f7d98795ed0fa1d431910787f0f4a68324f upstream. Rick bisected a regression on large systems which use the x2apic cluster mode for interrupt delivery to the commit wich reworked the cluster management. The problem is caused by a missing initialization of the clusterid field in the shared cluster data structures. So all structures end up with cluster ID 0 which only allows sharing between all CPUs which belong to cluster 0. All other CPUs with a cluster ID > 0 cannot share the data structure because they cannot find existing data with their cluster ID. This causes malfunction with IPIs because IPIs are sent to the wrong cluster and the caller waits for ever that the target CPU handles the IPI. Add the missing initialization when a upcoming CPU is the first in a cluster so that the later booting CPUs can find the data and share it for proper operation. Fixes: 023a611748fd ("x86/apic/x2apic: Simplify cluster management") Reported-by: Rick Warner Bisected-by: Rick Warner Signed-off-by: Thomas Gleixner Tested-by: Rick Warner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman commit f531f47af4ed6d285848b62370a42170d054cb37 Author: Masami Hiramatsu Date: Sun May 13 05:04:16 2018 +0100 ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr commit eb0146daefdde65665b7f076fbff7b49dade95b9 upstream. Prohibit kprobes on do_undefinstr because kprobes on arm is implemented by undefined instruction. This means if we probe do_undefinstr(), it can cause infinit recursive exception. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu Cc: stable@vger.kernel.org Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 2a4ff868963616de734988e12fb8f35863f7b1ce Author: Ard Biesheuvel Date: Fri May 4 07:59:58 2018 +0200 efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream. Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes. 'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes. Tested-by: Hans de Goede Signed-off-by: Ard Biesheuvel Cc: Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 92f7e3a9f2805d278c257e54ad7d53d2fc5f338c Author: Dave Hansen Date: Wed May 9 10:13:58 2018 -0700 x86/pkeys: Do not special case protection key 0 commit 2fa9d1cfaf0e02f8abef0757002bff12dfcfa4e6 upstream. mm_pkey_is_allocated() treats pkey 0 as unallocated. That is inconsistent with the manpages, and also inconsistent with mm->context.pkey_allocation_map. Stop special casing it and only disallow values that are actually bad (< 0). The end-user visible effect of this is that you can now use mprotect_pkey() to set pkey=0. This is a bit nicer than what Ram proposed[1] because it is simpler and removes special-casing for pkey 0. On the other hand, it does allow applications to pkey_free() pkey-0, but that's just a silly thing to do, so we are not going to protect against it. The scenario that could happen is similar to what happens if you free any other pkey that is in use: it might get reallocated later and used to protect some other data. The most likely scenario is that pkey-0 comes back from pkey_alloc(), an access-disable or write-disable bit is set in PKRU for it, and the next stack access will SIGSEGV. It's not horribly different from if you mprotect()'d your stack or heap to be unreadable or unwritable, which is generally very foolish, but also not explicitly prevented by the kernel. 1. http://lkml.kernel.org/r/1522112702-27853-1-git-send-email-linuxram@us.ibm.com Signed-off-by: Dave Hansen Cc: Andrew Morton p Cc: Dave Hansen Cc: Linus Torvalds Cc: Michael Ellermen Cc: Peter Zijlstra Cc: Ram Pai Cc: Shuah Khan Cc: Thomas Gleixner Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 58ab9a088dda ("x86/pkeys: Check against max pkey to avoid overflows") Link: http://lkml.kernel.org/r/20180509171358.47FD785E@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b16aec9ee8b6fe3463e7203b963d71ead090f9a4 Author: Dave Hansen Date: Wed May 9 10:13:51 2018 -0700 x86/pkeys: Override pkey when moving away from PROT_EXEC commit 0a0b152083cfc44ec1bb599b57b7aab41327f998 upstream. I got a bug report that the following code (roughly) was causing a SIGSEGV: mprotect(ptr, size, PROT_EXEC); mprotect(ptr, size, PROT_NONE); mprotect(ptr, size, PROT_READ); *ptr = 100; The problem is hit when the mprotect(PROT_EXEC) is implicitly assigned a protection key to the VMA, and made that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() failed to remove the protection key, and the PROT_NONE-> PROT_READ left the PTE usable, but the pkey still in place and left the memory inaccessible. To fix this, we ensure that we always "override" the pkee at mprotect() if the VMA does not have execute-only permissions, but the VMA has the execute-only pkey. We had a check for PROT_READ/WRITE, but it did not work for PROT_NONE. This entirely removes the PROT_* checks, which ensures that PROT_NONE now works. Reported-by: Shakeel Butt Signed-off-by: Dave Hansen Cc: Andrew Morton Cc: Dave Hansen Cc: Linus Torvalds Cc: Michael Ellermen Cc: Peter Zijlstra Cc: Ram Pai Cc: Shuah Khan Cc: Thomas Gleixner Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Link: http://lkml.kernel.org/r/20180509171351.084C5A71@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 7dad6e35ee735dcbec3c3f31383f36da9a74fe18 Author: Coly Li Date: Thu May 17 23:33:26 2018 +0800 bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n commit 1c1a2ee1b53b006754073eefc65d2b2cedb5264b upstream. Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") returns the return value of debugfs_create_dir() to bcache_init(). When CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes bcache_init() failedi. This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, so bcache can continue to work for the kernels which don't have debugfs enanbled. Changelog: v4: Add Acked-by from Kent Overstreet. v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS. v2: Remove a warning information v1: Initial version. Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") Cc: stable@vger.kernel.org Signed-off-by: Coly Li Reported-by: Massimo B. Reported-by: Kai Krakow Tested-by: Kai Krakow Acked-by: Kent Overstreet Signed-off-by: Jens Axboe Signed-off-by: Kai Krakow Signed-off-by: Greg Kroah-Hartman commit 6ad253a1d8cee7d6f7ce0ef82c4cc8eae766f8d6 Author: Martin Schwidefsky Date: Tue Apr 24 11:18:49 2018 +0200 s390: remove indirect branch from do_softirq_own_stack commit 9f18fff63cfd6f559daa1eaae60640372c65f84b upstream. The inline assembly to call __do_softirq on the irq stack uses an indirect branch. This can be replaced with a normal relative branch. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 6f128bcaf0eb70245593755ca97ca59f0b75d8d7 Author: Julian Wiedmann Date: Wed May 2 08:28:34 2018 +0200 s390/qdio: don't release memory in qdio_setup_irq() commit 2e68adcd2fb21b7188ba449f0fab3bee2910e500 upstream. Calling qdio_release_memory() on error is just plain wrong. It frees the main qdio_irq struct, when following code still uses it. Also, no other error path in qdio_establish() does this. So trust callers to clean up via qdio_free() if some step of the QDIO initialization fails. Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Cc: #v2.6.27+ Signed-off-by: Julian Wiedmann Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 3fe3662130df1bef8cd8127124bc98336e9f5162 Author: Hendrik Brueckner Date: Thu May 3 15:56:15 2018 +0200 s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream. Correct a trinity finding for the perf_event_open() system call with a perf event attribute structure that uses a frequency but has the sampling frequency set to zero. This causes a FP divide exception during the sample rate initialization for the hardware sampling facility. Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility") Cc: stable@vger.kernel.org # 3.14+ Reviewed-by: Heiko Carstens Signed-off-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 06a0947dfa005076863ed84e8a246fdee2ee5fd7 Author: Julian Wiedmann Date: Wed May 2 08:48:43 2018 +0200 s390/qdio: fix access to uninitialized qdio_q fields commit e521813468f786271a87e78e8644243bead48fad upstream. Ever since CQ/QAOB support was added, calling qdio_free() straight after qdio_alloc() results in qdio_release_memory() accessing uninitialized memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a kmem_cache_free() on the random AOB addresses. For older kernels that don't have 6e30c549f6ca, the same applies if qdio_establish() fails in the DEV_STATE_ONLINE check. While initializing q->u.out.use_cq would be enough to fix this particular bug, the more future-proof change is to just zero-alloc the whole struct. Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: #v3.2+ Signed-off-by: Julian Wiedmann Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit b1dc581290de71886c820d40a53e37c69468ed6e Author: Michel Thierry Date: Mon May 14 09:54:45 2018 -0700 drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk commit b579f924a90f42fa561afd8201514fc216b71949 upstream. Factor in clear values wherever required while updating destination min/max. References: HSDES#1604444184 Signed-off-by: Michel Thierry Cc: mesa-dev@lists.freedesktop.org Cc: Mika Kuoppala Cc: Oscar Mateo Reviewed-by: Mika Kuoppala Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20180510200708.18097-1-michel.thierry@intel.com Cc: stable@vger.kernel.org Cc: Joonas Lahtinen Link: https://patchwork.freedesktop.org/patch/msgid/20180514165445.9198-1-michel.thierry@intel.com (backported from commit 0c79f9cb77eae28d48a4f9fc1b3341aacbbd260c) Signed-off-by: Joonas Lahtinen Signed-off-by: Greg Kroah-Hartman commit d81fc95a71d15e14d00282bad072c0d0e0cd7057 Author: Pavel Tatashin Date: Fri May 18 16:09:13 2018 -0700 mm: don't allow deferred pages with NEED_PER_CPU_KM commit ab1e8d8960b68f54af42b6484b5950bd13a4054b upstream. It is unsafe to do virtual to physical translations before mm_init() is called if struct page is needed in order to determine the memory section number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we initialize struct pages for all the allocated memory when deferred struct pages are used. My recent fix in commit c9e97a1997 ("mm: initialize pages on demand during boot") exposed this problem, because it greatly reduced number of pages that are initialized before mm_init(), but the problem existed even before my fix, as Fengguang Wu found. Below is a more detailed explanation of the problem. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call (when memblock is finished) 4. After smp_init() when the rest free deferred pages are initialized. The problem occurs if we try to do va to phys translation of a memory between steps 1 and 2. Because we have not yet initialized struct pages for all the reserved pages, it is inherently unsafe to do va to phys if the translation itself requires access of "struct page" as in case of this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP The following path exposes the problem: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) We disable this path by not allowing NEED_PER_CPU_KM with deferred struct pages feature. The problems are discussed in these threads: http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Pavel Tatashin Acked-by: Michal Hocko Reviewed-by: Andrew Morton Cc: Steven Sistare Cc: Daniel Jordan Cc: Mel Gorman Cc: Fengguang Wu Cc: Dennis Zhou Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3daa84b7a5dc723ce55f8fca9659028e8d173d38 Author: Ross Zwisler Date: Fri May 18 16:09:06 2018 -0700 radix tree: fix multi-order iteration race commit 9f418224e8114156d995b98fa4e0f4fd21f685fe upstream. Fix a race in the multi-order iteration code which causes the kernel to hit a GP fault. This was first seen with a production v4.15 based kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD DAX entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember for example that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. We fix this race by fixing the way that skip_siblings() detects sibling nodes. Instead of testing against the preceding slot we instead look for siblings via is_sibling_entry() which compares against the position of the struct radix_tree_node.slots[] array. This ensures that sibling entries are properly identified, even if they are no longer contiguous with the 'entry' they point to. Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators") Signed-off-by: Ross Zwisler Reported-by: CR, Sapthagirish Reviewed-by: Jan Kara Cc: Matthew Wilcox Cc: Christoph Hellwig Cc: Dan Williams Cc: Dave Chinner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f05284aa53ce6476683732c6bd246cebd682f0f2 Author: Matthew Wilcox Date: Fri May 18 16:08:44 2018 -0700 lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly commit 1e3054b98c5415d5cb5f8824fc33b548ae5644c3 upstream. I had neglected to increment the error counter when the tests failed, which made the tests noisy when they fail, but not actually return an error code. Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au Fixes: 3cc78125a081 ("lib/test_bitmap.c: add optimisation tests") Signed-off-by: Matthew Wilcox Signed-off-by: Michael Ellerman Reported-by: Michael Ellerman Tested-by: Michael Ellerman Reviewed-by: Kees Cook Cc: Yury Norov Cc: Andy Shevchenko Cc: Geert Uytterhoeven Cc: [4.13+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0ef05f1ff077cb95672286ee9aef9c37f3d0bb1e Author: Miquel Raynal Date: Tue Apr 24 17:45:06 2018 +0200 cpufreq: armada-37xx: driver relies on cpufreq-dt commit 0cf442c6bcf572e04f5690340d5b8e62afcee2ca upstream. Armada-37xx driver registers a cpufreq-dt driver. Not having CONFIG_CPUFREQ_DT selected leads to a silent abort during the probe. Prevent that situation by having the former depending on the latter. Fixes: 92ce45fb875d7 (cpufreq: Add DVFS support for Armada 37xx) Cc: 4.16+ # 4.16+ Signed-off-by: Miquel Raynal Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 1ff5be16439ad7718dd96f484427514cd53ecee1 Author: Haneen Mohammed Date: Fri May 11 07:15:42 2018 +0300 drm: Match sysfs name in link removal to link creation commit 7f6df440b8623c441c42d070bf592e2d2c1fa9bb upstream. This patch matches the sysfs name used in the unlinking with the linking function. Otherwise, remove_compat_control_link() fails to remove sysfs created by create_compat_control_link() in drm_dev_register(). Fixes: 6449b088dd51 ("drm: Add fake controlD* symlinks for backwards compat") Cc: Dave Airlie Cc: Alex Deucher Cc: Emil Velikov Cc: David Herrmann Cc: Greg Kroah-Hartman Cc: Daniel Vetter Cc: Gustavo Padovan Cc: Maarten Lankhorst Cc: Sean Paul Cc: David Airlie Cc: dri-devel@lists.freedesktop.org Cc: # v4.10+ Signed-off-by: Haneen Mohammed [seanpaul added Fixes and Cc tags] Signed-off-by: Sean Paul Link: https://patchwork.freedesktop.org/patch/msgid/20180511041542.GA4253@haneen-vb Signed-off-by: Greg Kroah-Hartman commit fa9a6a7953726aa0e4d254513b3eeddf8f5e43d7 Author: Nicholas Piggin Date: Tue May 15 01:59:47 2018 +1000 powerpc/powernv: Fix NVRAM sleep in invalid context when crashing commit c1d2a31397ec51f0370f6bd17b19b39152c263cb upstream. Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 19b393c981b59c0ea2c3a19d987c109a7ba23d58 Author: Boris Brezillon Date: Wed May 9 09:13:58 2018 +0200 mtd: rawnand: marvell: Fix read logic for layouts with ->nchunks > 2 commit 90d617633368ab97a2c7571c6e66dad54f39228d upstream. The code is doing monolithic reads for all chunks except the last one which is wrong since a monolithic read will issue the READ0+ADDRS+READ_START sequence. It not only takes longer because it forces the NAND chip to reload the page content into its internal cache, but by doing that we also reset the column pointer to 0, which means we'll always read the first chunk instead of moving to the next one. Rework the code to do a monolithic read only for the first chunk, then switch to naked reads for all intermediate chunks and finally issue a last naked read for the last chunk. Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver Cc: stable@vger.kernel.org Reported-by: Chris Packham Signed-off-by: Boris Brezillon Tested-by: Chris Packham Acked-by: Miquel Raynal Signed-off-by: Greg Kroah-Hartman commit 76d1897a6aa0eb444d9b92856bf34bee50195fd7 Author: Alexander Monakov Date: Sat Apr 28 16:56:06 2018 +0300 i2c: designware: fix poll-after-enable regression commit 06cb616b1bca7080824acfedb3d4c898e7a64836 upstream. Not all revisions of DW I2C controller implement the enable status register. On platforms where that's the case (e.g. BG2CD and SPEAr ARM SoCs), waiting for enable will time out as reading the unimplemented register yields zero. It was observed that reading the IC_ENABLE_STATUS register once suffices to avoid getting it stuck on Bay Trail hardware, so replace polling with one dummy read of the register. Fixes: fba4adbbf670 ("i2c: designware: must wait for enable") Signed-off-by: Alexander Monakov Tested-by: Ben Gardner Acked-by: Jarkko Nikula Signed-off-by: Wolfram Sang Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit 0dd0cd42c8cf3d44cb9274b44ede1e24759d24a0 Author: Maxime Chevallier Date: Wed Apr 25 20:19:47 2018 +0200 ARM64: dts: marvell: armada-cp110: Add mg_core_clk for ethernet node commit f43194c1447c9536efb0859c2f3f46f6bf2b9154 upstream. Marvell PPv2.2 controller present on CP-110 need the extra "mg_core_clk" clock to avoid system hangs when powering some network interfaces up. This issue appeared after a recent clock rework on Armada 7K/8K platforms. This commit adds the new clock and updates the documentation accordingly. [gregory.clement: use the real first commit to fix and add the cc:stable flag] Fixes: e3af9f7c6ece ("RM64: dts: marvell: armada-cp110: Fix clock resources for various node") Cc: Signed-off-by: Maxime Chevallier Signed-off-by: Gregory CLEMENT Signed-off-by: Greg Kroah-Hartman commit 819a37765295fff4a12d82708f8e8fecb0c45f6d Author: Maxime Chevallier Date: Wed Apr 25 13:07:31 2018 +0200 ARM64: dts: marvell: armada-cp110: Add clocks for the xmdio node commit a057344806d035cb9ac991619fa07854e807562d upstream. The Marvell XSMI controller needs 3 clocks to operate correctly : - The MG clock (clk 5) - The MG Core clock (clk 6) - The GOP clock (clk 18) This commit adds them, to avoid system hangs when using these interfaces. [gregory.clement: use the real first commit to fix and add the cc:stable flag] Fixes: f66b2aff46ea ("arm64: dts: marvell: add xmdio nodes for 7k/8k") Cc: Signed-off-by: Maxime Chevallier Signed-off-by: Gregory CLEMENT Signed-off-by: Greg Kroah-Hartman commit 4304ac6b7df07544e19282495395b4b98d58932c Author: kbuild test robot Date: Sat Jan 20 04:27:58 2018 +0800 netfilter: nf_tables: nf_tables_obj_lookup_byhandle() can be static commit ae0662f84b105776734cb089703a7bf834bac195 upstream. Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle") Signed-off-by: Fengguang Wu Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 0e995e27be65ef0378dbcb25d322119568fc1ec8 Author: Florian Westphal Date: Tue Apr 10 09:30:27 2018 +0200 netfilter: nf_tables: can't fail after linking rule into active rule list commit 569ccae68b38654f04b6842b034aa33857f605fe upstream. rules in nftables a free'd using kfree, but protected by rcu, i.e. we must wait for a grace period to elapse. Normal removal patch does this, but nf_tables_newrule() doesn't obey this rule during error handling. It calls nft_trans_rule_add() *after* linking rule, and, if that fails to allocate memory, it unlinks the rule and then kfree() it -- this is unsafe. Switch order -- first add rule to transaction list, THEN link it to public list. Note: nft_trans_rule_add() uses GFP_KERNEL; it will not fail so this is not a problem in practice (spotted only during code review). Fixes: 0628b123c96d12 ("netfilter: nfnetlink: add batch support and use it from nf_tables") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit fb876de0520d1e46e490faca94e9360b398be1f2 Author: Florian Westphal Date: Tue Apr 10 09:00:24 2018 +0200 netfilter: nf_tables: free set name in error path commit 2f6adf481527c8ab8033c601f55bfb5b3712b2ac upstream. set->name must be free'd here in case ops->init fails. Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 3c079e40eea50b3f2168648922e80bc942c81cce Author: Jann Horn Date: Wed Apr 4 21:03:21 2018 +0200 tee: shm: fix use-after-free via temporarily dropped reference commit bb765d1c331f62b59049d35607ed2e365802bef9 upstream. Bump the file's refcount before moving the reference into the fd table, not afterwards. The old code could drop the file's refcount to zero for a short moment before calling get_file() via get_dma_buf(). This code can only be triggered on ARM systems that use Linaro's OP-TEE. Fixes: 967c9cca2cc5 ("tee: generic TEE subsystem") Signed-off-by: Jann Horn Signed-off-by: Jens Wiklander Signed-off-by: Greg Kroah-Hartman commit 7f8538e799e975e5f6028167418a208891d1c5f1 Author: Guenter Roeck Date: Fri May 4 13:01:32 2018 -0700 x86/amd_nb: Add support for Raven Ridge CPUs commit f9bc6b2dd9cf025f827f471769e1d88b527bfb91 upstream. Add Raven Ridge root bridge and data fabric PCI IDs. This is required for amd_pci_dev_to_node_id() and amd_smn_read(). Cc: stable@vger.kernel.org # v4.16+ Tested-by: Gabriel Craciunescu Acked-by: Thomas Gleixner Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit f19fce0400cf0ae1d14e85540fcdf65a7b5b5f7c Author: Steven Rostedt (VMware) Date: Tue May 15 22:24:52 2018 -0400 vsprintf: Replace memory barrier with static_key for random_ptr_key update commit 85f4f12d51397f1648e1f4350f77e24039b82d61 upstream. Reviewing Tobin's patches for getting pointers out early before entropy has been established, I noticed that there's a lone smp_mb() in the code. As with most lone memory barriers, this one appears to be incorrectly used. We currently basically have this: get_random_bytes(&ptr_key, sizeof(ptr_key)); /* * have_filled_random_ptr_key==true is dependent on get_random_bytes(). * ptr_to_id() needs to see have_filled_random_ptr_key==true * after get_random_bytes() returns. */ smp_mb(); WRITE_ONCE(have_filled_random_ptr_key, true); And later we have: if (unlikely(!have_filled_random_ptr_key)) return string(buf, end, "(ptrval)", spec); /* Missing memory barrier here. */ hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key); As the CPU can perform speculative loads, we could have a situation with the following: CPU0 CPU1 ---- ---- load ptr_key = 0 store ptr_key = random smp_mb() store have_filled_random_ptr_key load have_filled_random_ptr_key = true BAD BAD BAD! (you're so bad!) Because nothing prevents CPU1 from loading ptr_key before loading have_filled_random_ptr_key. But this race is very unlikely, but we can't keep an incorrect smp_mb() in place. Instead, replace the have_filled_random_ptr_key with a static_branch not_filled_random_ptr_key, that is initialized to true and changed to false when we get enough entropy. If the update happens in early boot, the static_key is updated immediately, otherwise it will have to wait till entropy is filled and this happens in an interrupt handler which can't enable a static_key, as that requires a preemptible context. In that case, a work_queue is used to enable it, as entropy already took too long to establish in the first place waiting a little more shouldn't hurt anything. The benefit of using the static key is that the unlikely branch in vsprintf() now becomes a nop. Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home Cc: stable@vger.kernel.org Fixes: ad67b74d2469d ("printk: hash addresses printed with %p") Acked-by: Linus Torvalds Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 3cd04d9568311a3cd1b8211d4f8c33172caeef9e Author: Steven Rostedt (VMware) Date: Wed May 9 14:36:09 2018 -0400 tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} commit 45dd9b0666a162f8e4be76096716670cf1741f0e upstream. Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering. Worse yet, the hack used was: __array(char, x, 0) Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace. Nuke the trace events! Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: Juergen Gross Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit e2af063a44d1068cf18125448f88ca90abb7a7b7 Author: Halil Pasic Date: Tue Apr 24 13:26:56 2018 +0200 vfio: ccw: fix cleanup if cp_prefetch fails commit d66a7355717ec903d455277a550d930ba13df4a8 upstream. If the translation of a channel program fails, we may end up attempting to clean up (free, unpin) stuff that never got translated (and allocated, pinned) in the first place. By adjusting the lengths of the chains accordingly (so the element that failed, and all subsequent elements are excluded) cleanup activities based on false assumptions can be avoided. Let's make sure cp_free works properly after cp_prefetch returns with an error by setting ch_len of a ccw chain to the number of the translated CCWs on that chain. Cc: stable@vger.kernel.org #v4.12+ Acked-by: Pierre Morel Reviewed-by: Dong Jia Shi Signed-off-by: Halil Pasic Signed-off-by: Dong Jia Shi Message-Id: <20180423110113.59385-2-bjsdjshi@linux.vnet.ibm.com> [CH: fixed typos] Signed-off-by: Cornelia Huck Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 08f3f89de20a33ffa658ebfcad1633bf9705434e Author: Guenter Roeck Date: Fri May 4 13:01:33 2018 -0700 hwmon: (k10temp) Use API function to access System Management Network commit 3b031622f598481970400519bd5abc2a16708282 upstream. The SMN (System Management Network) on Family 17h AMD CPUs is also accessed from other drivers, specifically EDAC. Accessing it directly is racy. On top of that, accessing the SMN through root bridge 00:00 is wrong on multi-die CPUs and may result in reading the temperature from the wrong die. Use available API functions to fix the problem. For this to work, add dependency on AMD_NB. Also change the Raven Ridge PCI device ID to point to Data Fabric Function 3, since this ID is used by the API functions to find the CPU node. Cc: stable@vger.kernel.org # v4.16+ Tested-by: Gabriel Craciunescu Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 2515baf4c2be4b406d655e79bfc787d7f5a4ab82 Author: Guenter Roeck Date: Sun Apr 29 08:08:24 2018 -0700 hwmon: (k10temp) Fix reading critical temperature register commit 40626a1bf657eef557fcee9e1b8ef5b4f5b56dcd upstream. The HTC (Hardware Temperature Control) register has moved for recent chips. Cc: stable@vger.kernel.org # v4.16+ Tested-by: Gabriel Craciunescu Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 3cd00d1c13ab8c8b63fc48850e6c03e14476a3be Author: Andre Przywara Date: Fri May 11 15:20:14 2018 +0100 KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock commit bf308242ab98b5d1648c3663e753556bef9bec01 upstream. kvm_read_guest() will eventually look up in kvm_memslots(), which requires either to hold the kvm->slots_lock or to be inside a kvm->srcu critical section. In contrast to x86 and s390 we don't take the SRCU lock on every guest exit, so we have to do it individually for each kvm_read_guest() call. Provide a wrapper which does that and use that everywhere. Note that ending the SRCU critical section before returning from the kvm_read_guest() wrapper is safe, because the data has been *copied*, so we don't need to rely on valid references to the memslot anymore. Cc: Stable # 4.8+ Reported-by: Jan Glauber Signed-off-by: Andre Przywara Acked-by: Christoffer Dall Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 3ffdd5080fd08d348dd6116e2fa303f56f09e991 Author: Andre Przywara Date: Fri May 11 15:20:15 2018 +0100 KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls commit 711702b57cc3c50b84bd648de0f1ca0a378805be upstream. kvm_read_guest() will eventually look up in kvm_memslots(), which requires either to hold the kvm->slots_lock or to be inside a kvm->srcu critical section. In contrast to x86 and s390 we don't take the SRCU lock on every guest exit, so we have to do it individually for each kvm_read_guest() call. Use the newly introduced wrapper for that. Cc: Stable # 4.12+ Reported-by: Jan Glauber Signed-off-by: Andre Przywara Acked-by: Christoffer Dall Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit fe4e206dd92c83200493cfa1418a73b8fb7afc4c Author: Andre Przywara Date: Fri May 11 15:20:13 2018 +0100 KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity commit 9c4188762f7fee032abf8451fd9865a9abfc5516 upstream. Apparently the development of update_affinity() overlapped with the promotion of irq_lock to be _irqsave, so the patch didn't convert this lock over. This will make lockdep complain. Fix this by disabling IRQs around the lock. Cc: stable@vger.kernel.org Fixes: 08c9fd042117 ("KVM: arm/arm64: vITS: Add a helper to update the affinity of an LPI") Reported-by: Jan Glauber Signed-off-by: Andre Przywara Acked-by: Christoffer Dall Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 9cea0befe253f7c0bce108a36ee386a2f570f325 Author: Andre Przywara Date: Fri May 11 15:20:12 2018 +0100 KVM: arm/arm64: Properly protect VGIC locks from IRQs commit 388d4359680b56dba82fe2ffca05871e9fd2b73e upstream. As Jan reported [1], lockdep complains about the VGIC not being bullet proof. This seems to be due to two issues: - When commit 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context") promoted irq_lock and ap_list_lock to _irqsave, we forgot two instances of irq_lock. lockdeps seems to pick those up. - If a lock is _irqsave, any other locks we take inside them should be _irqsafe as well. So the lpi_list_lock needs to be promoted also. This fixes both issues by simply making the remaining instances of those locks _irqsave. One irq_lock is addressed in a separate patch, to simplify backporting. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html Cc: stable@vger.kernel.org Fixes: 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context") Reported-by: Jan Glauber Acked-by: Christoffer Dall Signed-off-by: Andre Przywara Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 28208a6e244636cc1a11f72a49bbe044edad9d0f Author: Sean Christopherson Date: Mon Apr 30 10:01:06 2018 -0700 KVM: vmx: update sec exec controls for UMIP iff emulating UMIP commit 64f7a11586ab9262f00b8b6eceef6d8154921bd8 upstream. Update SECONDARY_EXEC_DESC for UMIP emulation if and only UMIP is actually being emulated. Skipping the VMCS update eliminates unnecessary VMREAD/VMWRITE when UMIP is supported in hardware, and on platforms that don't have SECONDARY_VM_EXEC_CONTROL. The latter case resolves a bug where KVM would fill the kernel log with warnings due to failed VMWRITEs on older platforms. Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP") Cc: stable@vger.kernel.org #4.16 Reported-by: Paolo Zeppegno Suggested-by: Paolo Bonzini Suggested-by: Radim KrÄmář Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit bde66cabb70768fc84915cde05de39185c50a1ab Author: Kamal Dasu Date: Thu Apr 26 14:48:01 2018 -0400 spi: bcm-qspi: Always read and set BSPI_MAST_N_BOOT_CTRL commit 602805fb618b018b7a41fbb3f93c1992b078b1ae upstream. Always confirm the BSPI_MAST_N_BOOT_CTRL bit when enabling or disabling BSPI transfers. Fixes: 4e3b2d236fe00 ("spi: bcm-qspi: Add BSPI spi-nor flash controller driver") Signed-off-by: Kamal Dasu Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit d350c43f59b69ac1c7414e3158fe20dfd3817a2d Author: Kamal Dasu Date: Thu Apr 26 14:48:00 2018 -0400 spi: bcm-qspi: Avoid setting MSPI_CDRAM_PCS for spi-nor master commit 5eb9a07a4ae1008b67d8bcd47bddb3dae97456b7 upstream. Added fix for probing of spi-nor device non-zero chip selects. Set MSPI_CDRAM_PCS (peripheral chip select) with spi master for MSPI controller and not for MSPI/BSPI spi-nor master controller. Ensure setting of cs bit in chip select register on chip select change. Fixes: fa236a7ef24048 ("spi: bcm-qspi: Add Broadcom MSPI driver") Signed-off-by: Kamal Dasu Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit d17adca045cfb8c7ceea87c9b1d7fa6196ed4171 Author: Andy Shevchenko Date: Thu Apr 19 19:53:32 2018 +0300 spi: pxa2xx: Allow 64-bit DMA commit efc4a13724b852ddaa3358402a8dec024ffbcb17 upstream. Currently the 32-bit device address only is supported for DMA. However, starting from Intel Sunrisepoint PCH the DMA address of the device FIFO can be 64-bit. Change the respective variable to be compatible with DMA engine expectations, i.e. to phys_addr_t. Fixes: 34cadd9c1bcb ("spi: pxa2xx: Add support for Intel Sunrisepoint") Signed-off-by: Andy Shevchenko Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit eba775a2ceabfa68d3cdde9478779729e041658b Author: Wenwen Wang Date: Sat May 5 13:38:03 2018 -0500 ALSA: control: fix a redundant-copy issue commit 3f12888dfae2a48741c4caa9214885b3aaf350f9 upstream. In snd_ctl_elem_add_compat(), the fields of the struct 'data' need to be copied from the corresponding fields of the struct 'data32' in userspace. This is achieved by invoking copy_from_user() and get_user() functions. The problem here is that the 'type' field is copied twice. One is by copy_from_user() and one is by get_user(). Given that the 'type' field is not used between the two copies, the second copy is *completely* redundant and should be removed for better performance and cleanup. Also, these two copies can cause inconsistent data: as the struct 'data32' resides in userspace and a malicious userspace process can race to change the 'type' field between the two copies to cause inconsistent data. Depending on how the data is used in the future, such an inconsistency may cause potential security risks. For above reasons, we should take out the second copy. Signed-off-by: Wenwen Wang Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit fe66ea44d8bcd53624e016f1fab6d657ab6d98fa Author: Hans de Goede Date: Tue May 8 09:27:46 2018 +0200 ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist commit c8beccc19b92f5172994c0732db689c08f4f98e5 upstream. Power-saving is causing loud plops on the Lenovo C50 All in one, add it to the blacklist. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1572975 Signed-off-by: Hans de Goede Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit bbf79c520d454153b457a4e7647640afdc727bf8 Author: Jeremy Soller Date: Mon May 7 09:28:45 2018 -0600 ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup commit 2f0d520a1a73555ac51c19cd494493f60b4c1cea upstream. This adds support for the P950ER, which has the same required fixup as the P950HR, but has a different PCI ID. Signed-off-by: Jeremy Soller Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 54cd5e6af589fe36cfec2cd127546a4a24b091ce Author: Federico Cuello Date: Wed May 9 00:13:38 2018 +0200 ALSA: usb: mixer: volume quirk for CM102-A+/102S+ commit 21493316a3c4598f308d5a9fa31cc74639c4caff upstream. Currently it's not possible to set volume lower than 26% (it just mutes). Also fixes this warning: Warning! Unlikely big volume range (=9472), cval->res is probably wrong. [13] FU [PCM Playback Volume] ch = 2, val = -9473/-1/1 , and volume works fine for full range. Signed-off-by: Federico Cuello Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit f190d83597d038d792a753a7df35f0e06082966c Author: Shuah Khan (Samsung OSG) Date: Tue May 15 17:57:23 2018 -0600 usbip: usbip_host: fix bad unlock balance during stub_probe() commit c171654caa875919be3c533d3518da8be5be966e upstream. stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ #5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5 Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) Cc: stable Signed-off-by: Greg Kroah-Hartman commit 8fa17d2b620c615a5a1723184e968275dc26a896 Author: Shuah Khan (Samsung OSG) Date: Mon May 14 20:49:58 2018 -0600 usbip: usbip_host: fix NULL-ptr deref and use-after-free errors commit 22076557b07c12086eeb16b8ce2b0b735f7a27e7 upstream. usbip_host updates device status without holding lock from stub probe, disconnect and rebind code paths. When multiple requests to import a device are received, these unprotected code paths step all over each other and drive fails with NULL-ptr deref and use-after-free errors. The driver uses a table lock to protect the busid array for adding and deleting busids to the table. However, the probe, disconnect and rebind paths get the busid table entry and update the status without holding the busid table lock. Add a new finer grain lock to protect the busid entry. This new lock will be held to search and update the busid entry fields from get_busid_idx(), add_match_busid() and del_match_busid(). match_busid_show() does the same to access the busid entry fields. get_busid_priv() changed to return the pointer to the busid entry holding the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind() call put_busid_priv() to release the busid lock before returning. This changes fixes the unprotected code paths eliminating the race conditions in updating the busid entries. Reported-by: Jakub Jirasek Signed-off-by: Shuah Khan (Samsung OSG) Cc: stable Signed-off-by: Greg Kroah-Hartman commit cefcc95797dcfd9a51c10a9af0bdfc0931ba32d6 Author: Shuah Khan (Samsung OSG) Date: Mon Apr 30 16:17:20 2018 -0600 usbip: usbip_host: run rebind from exit when module is removed commit 7510df3f29d44685bab7b1918b61a8ccd57126a9 upstream. After removing usbip_host module, devices it releases are left without a driver. For example, when a keyboard or a mass storage device are bound to usbip_host when it is removed, these devices are no longer bound to any driver. Fix it to run device_attach() from the module exit routine to restore the devices to their original drivers. This includes cleanup changes and moving device_attach() code to a common routine to be called from rebind_store() and usbip_host_exit(). Signed-off-by: Shuah Khan (Samsung OSG) Cc: stable Signed-off-by: Greg Kroah-Hartman commit db89eb5a075af79dc1a4163eac384b44a0bd9a46 Author: Shuah Khan (Samsung OSG) Date: Mon Apr 30 16:17:19 2018 -0600 usbip: usbip_host: delete device from busid_table after rebind commit 1e180f167d4e413afccbbb4a421b48b2de832549 upstream. Device is left in the busid_table after unbind and rebind. Rebind initiates usb bus scan and the original driver claims the device. After rescan the device should be deleted from the busid_table as it no longer belongs to usbip_host. Fix it to delete the device after device_attach() succeeds. Signed-off-by: Shuah Khan (Samsung OSG) Cc: stable Signed-off-by: Greg Kroah-Hartman commit 982fec8bb2945bf8f7a3d5c2fa4abf32683abf15 Author: Shuah Khan Date: Wed Apr 11 18:13:30 2018 -0600 usbip: usbip_host: refine probe and disconnect debug msgs to be useful commit 28b68acc4a88dcf91fd1dcf2577371dc9bf574cc upstream. Refine probe and disconnect debug msgs to be useful and say what is in progress. Signed-off-by: Shuah Khan Cc: stable Signed-off-by: Greg Kroah-Hartman commit 31c270e51c39c07c2b021e16f0d766e8c4d6cdf9 Author: Mathias Nyman Date: Mon May 14 11:57:23 2018 +0300 xhci: Fix USB3 NULL pointer dereference at logical disconnect. commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e upstream. Hub driver will try to disable a USB3 device twice at logical disconnect, racing with xhci_free_dev() callback from the first port disable. This can be triggered with "udisksctl power-off --block-device " or by writing "1" to the "remove" sysfs file for a USB3 device in 4.17-rc4. USB3 devices don't have a similar disabled link state as USB2 devices, and use a U3 suspended link state instead. In this state the port is still enabled and connected. hub_port_connect() first disconnects the device, then later it notices that device is still enabled (due to U3 states) it will try to disable the port again (set to U3). The xhci_free_dev() called during device disable is async, so checking for existing xhci->devs[i] when setting link state to U3 the second time was successful, even if device was being freed. The regression was caused by, and whole thing revealed by, Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned. and causes a NULL pointer dereference the second time we try to set U3. Fix this by checking xhci->devs[i]->udev exists before setting link state. The original patch went to stable so this fix needs to be applied there as well. Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") Cc: Reported-by: Jordan Glover Tested-by: Jordan Glover Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman