commit c65ed08dcc049da5668962a63a25f8296665a2b4 Author: Greg Kroah-Hartman Date: Fri Jan 6 11:16:53 2017 +0100 Linux 4.8.16 commit 645897231f960590220144b06d1f994b7eb88326 Author: Ming Lei Date: Sun Jul 10 19:27:36 2016 +0800 driver core: fix race between creating/querying glue dir and its cleanup commit cebf8fd16900fdfd58c0028617944f808f97fe50 upstream. The global mutex of 'gdp_mutex' is used to serialize creating/querying glue dir and its cleanup. Turns out it isn't a perfect way because part(kobj_kset_leave()) of the actual cleanup action() is done inside the release handler of the glue dir kobject. That means gdp_mutex has to be held before releasing the last reference count of the glue dir kobject. This patch moves glue dir's cleanup after kobject_del() in device_del() for avoiding the race. Cc: Yijing Wang Reported-by: Chandra Sekhar Lingutla Signed-off-by: Ming Lei Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman commit f199bdbaab37585ff6912dfb5524cf2a0ef06a05 Author: Greg Kroah-Hartman Date: Wed Jan 4 18:29:16 2017 +0100 Revert "netfilter: move nat hlist_head to nf_conn" This reverts commit 7c9664351980aaa6a4b8837a314360b3a4ad382a as it is not working properly. Please move to 4.9 to get the full fix. Reported-by: Pablo Neira Ayuso Cc: Florian Westphal Signed-off-by: Greg Kroah-Hartman commit 99d6d4e0c50c6e64e3cca11dc77538cadcf3b2e2 Author: Greg Kroah-Hartman Date: Wed Jan 4 18:27:19 2017 +0100 Revert "netfilter: nat: convert nat bysrc hash to rhashtable" This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1 as it is not working properly. Please move to 4.9 to get the full fix. Reported-by: Pablo Neira Ayuso Cc: Florian Westphal Signed-off-by: Greg Kroah-Hartman commit 774225699b4dbb86bebd94341c6a96bf5f4d9cb4 Author: AKASHI Takahiro Date: Mon Aug 22 15:55:24 2016 +0900 arm64: mark reserved memblock regions explicitly in iomem commit e7cd190385d17790cc3eb3821b1094b00aacf325 upstream. Kdump(kexec-tools) parses /proc/iomem to identify all the memory regions on the system. Since the current kernel names "nomap" regions, like UEFI runtime services code/data, as "System RAM," kexec-tools sets up elf core header to include them in a crash dump file (/proc/vmcore). Then crash dump kernel parses UEFI memory map again, re-marks those regions as "nomap" and does not create a memory mapping for them unlike the other areas of System RAM. In this case, copying /proc/vmcore through copy_oldmem_page() on crash dump kernel will end up with a kernel abort, as reported in [1]. This patch names all the "nomap" regions explicitly as "reserved" so that we can exclude them from a crash dump file. acpi_os_ioremap() must also be modified because those regions have WB attributes [2]. Apart from kdump, this change also matches x86's use of acpi (and /proc/iomem). [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/448186.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450089.html Reviewed-by: Catalin Marinas Tested-by: James Morse Reviewed-by: James Morse Signed-off-by: AKASHI Takahiro Signed-off-by: Will Deacon Cc: Matthias Brugger Signed-off-by: Greg Kroah-Hartman commit 587e89bd56c1f161876f2a25d1f252e8d85f22a6 Author: Eric Sandeen Date: Mon Dec 5 12:31:06 2016 +1100 xfs: set AGI buffer type in xlog_recover_clear_agi_bucket commit 6b10b23ca94451fae153a5cc8d62fd721bec2019 upstream. xlog_recover_clear_agi_bucket didn't set the type to XFS_BLFT_AGI_BUF, so we got a warning during log replay (or an ASSERT on a debug build). XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 Fix this, as was done in f19b872b for 2 other locations with the same problem. Signed-off-by: Eric Sandeen Reviewed-by: Brian Foster Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Greg Kroah-Hartman commit 959e363eaf14509bb7085e0bd6ded2a45a4fccb5 Author: Julien Grall Date: Wed Dec 7 12:24:40 2016 +0000 arm/xen: Use alloc_percpu rather than __alloc_percpu commit 24d5373dda7c00a438d26016bce140299fae675e upstream. The function xen_guest_init is using __alloc_percpu with an alignment which are not power of two. However, the percpu allocator never supported alignments which are not power of two and has always behaved incorectly in thise case. Commit 3ca45a4 "percpu: ensure requested alignment is power of two" introduced a check which trigger a warning [1] when booting linux-next on Xen. But in reality this bug was always present. This can be fixed by replacing the call to __alloc_percpu with alloc_percpu. The latter will use an alignment which are a power of two. [1] [ 0.023921] illegal size (48) or align (48) for percpu allocation [ 0.024167] ------------[ cut here ]------------ [ 0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0 [ 0.024584] Modules linked in: [ 0.024708] [ 0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc7-next-20161128 #473 [ 0.025012] Hardware name: Foundation-v8A (DT) [ 0.025162] task: ffff80003d870000 task.stack: ffff80003d844000 [ 0.025351] PC is at pcpu_alloc+0x88/0x6c0 [ 0.025490] LR is at pcpu_alloc+0x88/0x6c0 [ 0.025624] pc : [] lr : [] pstate: 60000045 [ 0.025830] sp : ffff80003d847cd0 [ 0.025946] x29: ffff80003d847cd0 x28: 0000000000000000 [ 0.026147] x27: 0000000000000000 x26: 0000000000000000 [ 0.026348] x25: 0000000000000000 x24: 0000000000000000 [ 0.026549] x23: 0000000000000000 x22: 00000000024000c0 [ 0.026752] x21: ffff000008e97000 x20: 0000000000000000 [ 0.026953] x19: 0000000000000030 x18: 0000000000000010 [ 0.027155] x17: 0000000000000a3f x16: 00000000deadbeef [ 0.027357] x15: 0000000000000006 x14: ffff000088f79c3f [ 0.027573] x13: ffff000008f79c4d x12: 0000000000000041 [ 0.027782] x11: 0000000000000006 x10: 0000000000000042 [ 0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c [ 0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84 [ 0.028419] x5 : 0000000000000005 x4 : 0000000000000000 [ 0.028628] x3 : 0000000000000000 x2 : 000000000000017f [ 0.028840] x1 : ffff80003d870000 x0 : 0000000000000035 [ 0.029056] [ 0.029152] ---[ end trace 0000000000000000 ]--- [ 0.029297] Call trace: [ 0.029403] Exception stack(0xffff80003d847b00 to 0xffff80003d847c30) [ 0.029621] 7b00: 0000000000000030 0001000000000000 ffff80003d847cd0 ffff00000818e678 [ 0.029901] 7b20: 0000000000000002 0000000000000004 ffff000008f7c060 0000000000000035 [ 0.030153] 7b40: ffff000008f79000 ffff000008c4cd88 ffff80003d847bf0 ffff000008101778 [ 0.030402] 7b60: 0000000000000030 0000000000000000 ffff000008e97000 00000000024000c0 [ 0.030647] 7b80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.030895] 7ba0: 0000000000000035 ffff80003d870000 000000000000017f 0000000000000000 [ 0.031144] 7bc0: 0000000000000000 0000000000000005 ffff000008f79c84 6120757063726570 [ 0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40 0000000000000042 0000000000000006 [ 0.031643] 7c00: 0000000000000041 ffff000008f79c4d ffff000088f79c3f 0000000000000006 [ 0.031877] 7c20: 00000000deadbeef 0000000000000a3f [ 0.032051] [] pcpu_alloc+0x88/0x6c0 [ 0.032229] [] __alloc_percpu+0x18/0x20 [ 0.032409] [] xen_guest_init+0x174/0x2f4 [ 0.032591] [] do_one_initcall+0x38/0x130 [ 0.032783] [] kernel_init_freeable+0xe0/0x248 [ 0.032995] [] kernel_init+0x10/0x100 [ 0.033172] [] ret_from_fork+0x10/0x50 Reported-by: Wei Chen Link: https://lkml.org/lkml/2016/11/28/669 Signed-off-by: Julien Grall Signed-off-by: Stefano Stabellini Reviewed-by: Stefano Stabellini Signed-off-by: Greg Kroah-Hartman commit 6fbd3fb6c4df723a30d6f90a2553d68e0b431b23 Author: Boris Ostrovsky Date: Mon Nov 21 09:56:06 2016 -0500 xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing commit 30faaafdfa0c754c91bac60f216c9f34a2bfdf7e upstream. Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Reported-by: Olaf Hering Suggested-by: Hugh Dickins Signed-off-by: Boris Ostrovsky Acked-by: Hugh Dickins Tested-by: Olaf Hering Signed-off-by: Juergen Gross Signed-off-by: Greg Kroah-Hartman commit 883f12a2058375498d581d6101de98bd59bb5552 Author: Jason Gunthorpe Date: Wed Oct 26 16:28:45 2016 -0600 tpm xen: Remove bogus tpm_chip_unregister commit 1f0f30e404b3d8f4597a2d9b77fba55452f8fd0e upstream. tpm_chip_unregister can only be called after tpm_chip_register. devm manages the allocation so no unwind is needed here. Fixes: afb5abc262e96 ("tpm: two-phase chip management functions") Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman commit 8419f5215db37bdf63c7398f90b3c04723f4c5b4 Author: Douglas Anderson Date: Wed Dec 14 15:05:49 2016 -0800 kernel/debug/debug_core.c: more properly delay for secondary CPUs commit 2d13bb6494c807bcf3f78af0e96c0b8615a94385 upstream. We've got a delay loop waiting for secondary CPUs. That loop uses loops_per_jiffy. However, loops_per_jiffy doesn't actually mean how many tight loops make up a jiffy on all architectures. It is quite common to see things like this in the boot log: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=24000) In my case I was seeing lots of cases where other CPUs timed out entering the debugger only to print their stack crawls shortly after the kdb> prompt was written. Elsewhere in kgdb we already use udelay(), so that should be safe enough to use to implement our timeout. We'll delay 1 ms for 1000 times, which should give us a full second of delay (just like the old code wanted) but allow us to notice that we're done every 1 ms. [akpm@linux-foundation.org: simplifications, per Daniel] Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org Signed-off-by: Douglas Anderson Reviewed-by: Daniel Thompson Cc: Jason Wessel Cc: Brian Norris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 63b33e0885d63bbe1ed8b6f05f36d7e812549492 Author: Christian Lamparter Date: Mon Nov 14 02:11:16 2016 +0100 watchdog: qcom: fix kernel panic due to external abort on non-linefetch commit f06f35c66fdbd5ac38901a3305ce763a0cd59375 upstream. This patch fixes a off-by-one in the "watchdog: qcom: add option for standalone watchdog not in timer block" patch that causes the following panic on boot: > Unhandled fault: external abort on non-linefetch (0x1008) at 0xc8874002 > pgd = c0204000 > [c8874002] *pgd=87806811, *pte=0b017653, *ppte=0b017453 > Internal error: : 1008 [#1] SMP ARM > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.8.6 #0 > Hardware name: Generic DT based system > PC is at 0xc02222f4 > LR is at 0x1 > pc : [] lr : [<00000001>] psr: 00000113 > sp : c782fc98 ip : 00000003 fp : 00000000 > r10: 00000004 r9 : c782e000 r8 : c04ab98c > r7 : 00000001 r6 : c8874002 r5 : c782fe00 r4 : 00000002 > r3 : 00000000 r2 : c782fe00 r1 : 00100000 r0 : c8874002 > Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 8020406a DAC: 00000051 > Process swapper/0 (pid: 1, stack limit = 0xc782e210) > Stack: (0xc782fc98 to 0xc7830000) > [...] The WDT_STS (status) needs to be translated via wdt_addr as well. fixes: f0d9d0f4b44a ("watchdog: qcom: add option for standalone watchdog not in timer block") Signed-off-by: Christian Lamparter Reviewed-by: Guenter Roeck Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit bf902ead61d8987423961c926f52994534e2422b Author: Alexander Usyskin Date: Tue Nov 8 17:55:52 2016 +0200 watchdog: mei_wdt: request stop on reboot to prevent false positive event commit 9eff1140a82db8c5520f76e51c21827b4af670b3 upstream. Systemd on reboot enables shutdown watchdog that leaves the watchdog device open to ensure that even if power down process get stuck the platform reboots nonetheless. The iamt_wdt is an alarm-only watchdog and can't reboot system, but the FW will generate an alarm event reboot was completed in time, as the watchdog is not automatically disabled during power cycle. So we should request stop watchdog on reboot to eliminate wrong alarm from the FW. Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Reviewed-by: Guenter Roeck Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 2f826a72ea8beae0fdae7c8c50ac52c118eb0aac Author: Konstantin Khlebnikov Date: Wed Dec 14 15:04:04 2016 -0800 kernel/watchdog: use nmi registers snapshot in hardlockup handler commit 4d1f0fb096aedea7bb5489af93498a82e467c480 upstream. NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers point exactly to the instruction interrupted by NMI. Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov Cc: Jiri Kosina Cc: Ulrich Obergfell Cc: Aaron Tomlin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit bbf23f00d55e1bfbe105d72780b1242dc9772c6d Author: Pavel Shilovsky Date: Tue Nov 29 16:14:43 2016 -0800 CIFS: Fix a possible memory corruption in push locks commit e3d240e9d505fc67f8f8735836df97a794bbd946 upstream. If maxBuf is not 0 but less than a size of SMB2 lock structure we can end up with a memory corruption. Signed-off-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman commit 9f1f5076149a534723d7293097656a621f477015 Author: Pavel Shilovsky Date: Tue Nov 29 11:30:58 2016 -0800 CIFS: Fix missing nls unload in smb2_reconnect() commit 4772c79599564bd08ee6682715a7d3516f67433f upstream. Acked-by: Sachin Prabhu Signed-off-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman commit ff04da387c10b6bf7b510392742c8cd46c130fd6 Author: Pavel Shilovsky Date: Fri Nov 4 11:50:31 2016 -0700 CIFS: Fix a possible memory corruption during reconnect commit 53e0e11efe9289535b060a51d4cf37c25e0d0f2b upstream. We can not unlock/lock cifs_tcp_ses_lock while walking through ses and tcon lists because it can corrupt list iterator pointers and a tcon structure can be released if we don't hold an extra reference. Fix it by moving a reconnect process to a separate delayed work and acquiring a reference to every tcon that needs to be reconnected. Also do not send an echo request on newly established connections. Signed-off-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman commit 6cb589c7529f1c7bdb923ad49ec758e29a0ed345 Author: Takashi Iwai Date: Fri Nov 25 16:54:06 2016 +0100 ASoC: intel: Fix crash at suspend/resume without card registration commit 2fc995a87f2efcd803438f07bfecd35cc3d90d32 upstream. When ASoC Intel SST Medfield driver is probed but without codec / card assigned, it causes an Oops and freezes the kernel at suspend/resume, PM: Suspending system (freeze) Suspending console(s) (use no_console_suspend to debug) BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [] sst_soc_prepare+0x19/0xa0 [snd_soc_sst_mfld_platform] Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 1552 Comm: systemd-sleep Tainted: G W 4.9.0-rc6-1.g5f5c2ad-default #1 Call Trace: [] dpm_prepare+0x209/0x460 [] dpm_suspend_start+0x11/0x60 [] suspend_devices_and_enter+0xb2/0x710 [] pm_suspend+0x30e/0x390 [] state_store+0x8a/0x90 [] kobj_attr_store+0xf/0x20 [] sysfs_kf_write+0x37/0x40 [] kernfs_fop_write+0x11c/0x1b0 [] __vfs_write+0x28/0x140 [] ? apparmor_file_permission+0x18/0x20 [] ? security_file_permission+0x3b/0xc0 [] vfs_write+0xb5/0x1a0 [] SyS_write+0x46/0xa0 [] entry_SYSCALL_64_fastpath+0x1e/0xad Add proper NULL checks in the PM code of mdfld driver. Signed-off-by: Takashi Iwai Acked-by: Vinod Koul Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 769c0922d4a622981228c05d68934883660fbffd Author: Benjamin Marzinski Date: Wed Nov 30 17:56:14 2016 -0600 dm space map metadata: fix 'struct sm_metadata' leak on failed create commit 314c25c56c1ee5026cf99c570bdfe01847927acb upstream. In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: Benjamin Marzinski Acked-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit ab10ab0a2a3d4ef58209cc51e7bbe6fd5937c024 Author: Heinz Mauelshagen Date: Tue Nov 29 22:37:30 2016 +0100 dm raid: fix discard support regression commit 11e2968478edc07a75ee1efb45011b3033c621c2 upstream. Commit ecbfb9f118 ("dm raid: add raid level takeover support") moved the configure_discard_support() call from raid_ctr() to raid_preresume(). Enabling/disabling discard _must_ happen during table load (through the .ctr hook). Fix this regression by moving the configure_discard_support() call back to raid_ctr(). Fixes: ecbfb9f118 ("dm raid: add raid level takeover support") Signed-off-by: Heinz Mauelshagen Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 454b98d3f2b57b633e7d4820fc1acbe2096f5166 Author: Bart Van Assche Date: Fri Nov 11 17:05:27 2016 -0800 dm rq: fix a race condition in rq_completed() commit d15bb3a6467e102e60d954aadda5fb19ce6fd8ec upstream. It is required to hold the queue lock when calling blk_run_queue_async() to avoid that a race between blk_run_queue_async() and blk_cleanup_queue() is triggered. Signed-off-by: Bart Van Assche Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 26011e67339e1a7b2244077468f8413cf26e956b Author: Ondrej Kozina Date: Wed Nov 2 15:02:08 2016 +0100 dm crypt: mark key as invalid until properly loaded commit 265e9098bac02bc5e36cda21fdbad34cb5b2f48d upstream. In crypt_set_key(), if a failure occurs while replacing the old key (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag set. Otherwise, the crypto layer would have an invalid key that still has DM_CRYPT_KEY_VALID flag set. Signed-off-by: Ondrej Kozina Reviewed-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit bd5fcd18c8c3c032d58d5ffdb2513d136a5450c8 Author: Wei Yongjun Date: Mon Aug 8 14:09:27 2016 +0000 dm flakey: return -EINVAL on interval bounds error in flakey_ctr() commit bff7e067ee518f9ed7e1cbc63e4c9e01670d0b71 upstream. Fix to return error code -EINVAL instead of 0, as is done elsewhere in this function. Fixes: e80d1c805a3b ("dm: do not override error code returned from dm_get_device()") Signed-off-by: Wei Yongjun Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 1ca66d6a19d231ee635296e636439e704cde1a47 Author: Bart Van Assche Date: Wed Dec 7 16:56:06 2016 -0800 dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device commit 301fc3f5efb98633115bd887655b19f42c6dfaa8 upstream. When dm_table_set_type() is used by a target to establish a DM table's type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the DM core must go on to verify that the devices in the table are compatible with the established type. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Signed-off-by: Bart Van Assche Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit d948d3b1e4168a06eaf7336084f4e7ba81f358bf Author: Mike Snitzer Date: Wed Nov 23 13:51:09 2016 -0500 dm table: fix 'all_blk_mq' inconsistency when an empty table is loaded commit 6936c12cf809850180b24947271b8f068fdb15e9 upstream. An earlier DM multipath table could have been build ontop of underlying devices that were all using blk-mq. In that case, if that active multipath table is replaced with an empty DM multipath table (that reflects all paths have failed) then it is important that the 'all_blk_mq' state of the active table is transfered to the new empty DM table. Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a request that isn't needed by the DM multipath target when it is to issue IO to an underlying blk-mq device. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Reported-by: Bart Van Assche Tested-by: Bart Van Assche Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 45f631113b364999eb13fff78cb75610957acc87 Author: Bart Van Assche Date: Fri Oct 28 17:18:48 2016 -0700 blk-mq: Do not invoke .queue_rq() for a stopped queue commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream. The meaning of the BLK_MQ_S_STOPPED flag is "do not call .queue_rq()". Hence modify blk_mq_make_request() such that requests are queued instead of issued if a queue has been stopped. Reported-by: Ming Lei Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig Reviewed-by: Ming Lei Reviewed-by: Hannes Reinecke Reviewed-by: Johannes Thumshirn Reviewed-by: Sagi Grimberg Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit e3742a15d6cdad8a22cea66feb3ad44618085072 Author: Stephen Boyd Date: Wed Nov 30 16:21:25 2016 +0530 PM / OPP: Pass opp_table to dev_pm_opp_put_regulator() commit 91291d9ad92faa65a56a9a19d658d8049b78d3d4 upstream. Joonyoung Shim reported an interesting problem on his ARM octa-core Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator() was failing for a struct device for which dev_pm_opp_set_regulator() is called earlier. This happened because an earlier call to dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file) removed all the entries from opp_table->dev_list apart from the last CPU device in the cpumask of CPUs sharing the OPP. But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator() routines get CPU device for the first CPU in the cpumask. And so the OPP core failed to find the OPP table for the struct device. This patch attempts to fix this problem by returning a pointer to the opp_table from dev_pm_opp_set_regulator() and using that as the parameter to dev_pm_opp_put_regulator(). This ensures that the dev_pm_opp_put_regulator() doesn't fail to find the opp table. Note that similar design problem also exists with other dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and so we don't need to update them for now. Reported-by: Joonyoung Shim Signed-off-by: Stephen Boyd Signed-off-by: Viresh Kumar [ Viresh: Wrote commit log and tested on exynos 5250 ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 8b63a922ac07d59fdaa091d3329d8d61a1d4d245 Author: Felipe Balbi Date: Wed Sep 28 12:33:31 2016 +0300 usb: gadget: composite: always set ep->mult to a sensible value commit eaa496ffaaf19591fe471a36cef366146eeb9153 upstream. ep->mult is supposed to be set to Isochronous and Interrupt Endapoint's multiplier value. This value is computed from different places depending on the link speed. If we're dealing with HighSpeed, then it's part of bits [12:11] of wMaxPacketSize. This case wasn't taken into consideration before. While at that, also make sure the ep->mult defaults to one so drivers can use it unconditionally and assume they'll never multiply ep->maxpacket to zero. Cc: Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit d4f4b2e659f51b859c1d9370ea652a9d19c68b87 Author: Mel Gorman Date: Mon Dec 12 16:44:41 2016 -0800 mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted commit a6de734bc002fe2027ccc074fbbd87d72957b7a4 upstream. Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") will allow the per-cpu list counter to be out of sync with the per-cpu list contents if a struct page is corrupted. The consequence is an infinite loop if the per-cpu lists get fully drained by free_pcppages_bulk because all the lists are empty but the count is positive. The infinite loop occurs here do { batch_free++; if (++migratetype == MIGRATE_PCPTYPES) migratetype = 0; list = &pcp->lists[migratetype]; } while (list_empty(list)); What the user sees is a bad page warning followed by a soft lockup with interrupts disabled in free_pcppages_bulk(). This patch keeps the accounting in sync. Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Hillf Danton Cc: Christoph Lameter Cc: Johannes Weiner Cc: Jesper Dangaard Brouer Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0927d281a2fc563963a10f4a293e0a8c911a8d00 Author: Shaohua Li Date: Mon Dec 12 16:41:50 2016 -0800 mm/vmscan.c: set correct defer count for shrinker commit 5f33a0803bbd781de916f5c7448cbbbbc763d911 upstream. Our system uses significantly more slab memory with memcg enabled with the latest kernel. With 3.10 kernel, slab uses 2G memory, while with 4.6 kernel, 6G memory is used. The shrinker has problem. Let's see we have two memcg for one shrinker. In do_shrink_slab: 1. Check cg1. nr_deferred = 0, assume total_scan = 700. batch size is 1024, then no memory is freed. nr_deferred = 700 2. Check cg2. nr_deferred = 700. Assume freeable = 20, then total_scan = 10 or 40. Let's assume it's 10. No memory is freed. nr_deferred = 10. The deferred share of cg1 is lost in this case. kswapd will free no memory even run above steps again and again. The fix makes sure one memcg's deferred share isn't lost. Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.com Signed-off-by: Shaohua Li Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3e0ef1b8e0cc9179f498da7892af8ed6621c9a2b Author: Solganik Alexander Date: Sun Oct 30 10:35:15 2016 +0200 nvmet: Fix possible infinite loop triggered on hot namespace removal commit e4fcf07cca6a3b6c4be00df16f08be894325eaa3 upstream. When removing a namespace we delete it from the subsystem namespaces list with list_del_init which allows us to know if it is enabled or not. The problem is that list_del_init initialize the list next and does not respect the RCU list-traversal we do on the IO path for locating a namespace. Instead we need to use list_del_rcu which is allowed to run concurrently with the _rcu list-traversal primitives (keeps list next intact) and guarantees concurrent nvmet_find_naespace forward progress. By changing that, we cannot rely on ns->dev_link for knowing if the namspace is enabled, so add enabled indicator entry to nvmet_ns for that. Signed-off-by: Sagi Grimberg Signed-off-by: Solganik Alexander Signed-off-by: Greg Kroah-Hartman commit 6290a3bcd3c3af6d1851782e81beda8bb4f80c9a Author: Omar Sandoval Date: Mon Nov 14 14:56:17 2016 -0800 loop: return proper error from loop_queue_rq() commit b4a567e8114327518c09f5632339a5954ab975a3 upstream. ->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not an errno. Fixes: f4aa4c7bbac6 ("block: loop: convert to per-device workqueue") Signed-off-by: Omar Sandoval Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit bf0f02079039146620f90125efe8ae778db9a729 Author: Jaegeuk Kim Date: Wed Nov 23 10:51:17 2016 -0800 f2fs: fix overflow due to condition check order commit e87f7329bbd6760c2acc4f1eb423362b08851a71 upstream. In the last ilen case, i was already increased, resulting in accessing out- of-boundary entry of do_replace and blkaddr. Fix to check ilen first to exit the loop. Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 154d83a8384eb5b8747008fcad8696a3a2845abf Author: Nicolai Stange Date: Sun Nov 20 19:57:23 2016 +0100 f2fs: set ->owner for debugfs status file's file_operations commit 05e6ea2685c964db1e675a24a4f4e2adc22d2388 upstream. The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall Reported-by: Martin Brandenburg Signed-off-by: Nicolai Stange Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 67e5239c83669686f5a2502386787650917b1481 Author: Jaegeuk Kim Date: Fri Dec 2 15:11:32 2016 -0800 Revert "f2fs: use percpu_counter for # of dirty pages in inode" commit 204706c7accfabb67b97eef9f9a28361b6201199 upstream. This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit d06eaf28f1bb562ea3ec362e87d040539df9933f Author: Sergey Karamov Date: Sat Dec 10 17:54:58 2016 -0500 ext4: do not perform data journaling when data is encrypted commit 73b92a2a5e97d17cc4d5c4fe9d724d3273fb6fd2 upstream. Currently data journalling is incompatible with encryption: enabling both at the same time has never been supported by design, and would result in unpredictable behavior. However, users are not precluded from turning on both features simultaneously. This change programmatically replaces data journaling for encrypted regular files with ordered data journaling mode. Background: Journaling encrypted data has not been supported because it operates on buffer heads of the page in the page cache. Namely, when the commit happens, which could be up to five seconds after caching, the commit thread uses the buffer heads attached to the page to copy the contents of the page to the journal. With encryption, it would have been required to keep the bounce buffer with ciphertext for up to the aforementioned five seconds, since the page cache can only hold plaintext and could not be used for journaling. Alternatively, it would be required to setup the journal to initiate a callback at the commit time to perform deferred encryption - in this case, not only would the data have to be written twice, but it would also have to be encrypted twice. This level of complexity was not justified for a mode that in practice is very rarely used because of the overhead from the data journalling. Solution: If data=journaled has been set as a mount option for a filesystem, or if journaling is enabled on a regular file, do not perform journaling if the file is also encrypted, instead fall back to the data=ordered mode for the file. Rationale: The intent is to allow seamless and proper filesystem operation when journaling and encryption have both been enabled, and have these two conflicting features gracefully resolved by the filesystem. Fixes: 4461471107b7 Signed-off-by: Sergey Karamov Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit e33673bef6f05183cded16c8e14b1180cf137ae4 Author: Dan Carpenter Date: Sat Dec 10 09:56:01 2016 -0500 ext4: return -ENOMEM instead of success commit 578620f451f836389424833f1454eeeb2ffc9e9f upstream. We should set the error code if kzalloc() fails. Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 3664877022eab014a669ccb13221f119b7e8dc8d Author: Darrick J. Wong Date: Sat Dec 10 09:55:01 2016 -0500 ext4: reject inodes with negative size commit 7e6e1ef48fc02f3ac5d0edecbb0c6087cd758d58 upstream. Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] Fixes: a48380f769df (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 1bfcffbb88017c09c1a469507616c88a27037757 Author: Theodore Ts'o Date: Fri Nov 18 13:37:47 2016 -0500 ext4: add sanity checking to count_overhead() commit c48ae41bafe31e9a66d8be2ced4e42a6b57fa814 upstream. The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 9689eb99ce0f2ea904f7959022e6c1fdee158db1 Author: Theodore Ts'o Date: Fri Nov 18 13:24:26 2016 -0500 ext4: fix in-superblock mount options processing commit 5aee0f8a3f42c94c5012f1673420aee96315925a upstream. Fix a large number of problems with how we handle mount options in the superblock. For one, if the string in the superblock is long enough that it is not null terminated, we could run off the end of the string and try to interpret superblocks fields as characters. It's unlikely this will cause a security problem, but it could result in an invalid parse. Also, parse_options is destructive to the string, so in some cases if there is a comma-separated string, it would be modified in the superblock. (Fortunately it only happens on file systems with a 1k block size.) Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 52a9daa3d5c0c8a4649d10adb2d1ec780e667ec9 Author: Theodore Ts'o Date: Fri Nov 18 13:28:30 2016 -0500 ext4: use more strict checks for inodes_per_block on mount commit cd6bb35bf7f6d7d922509bf50265383a0ceabe96 upstream. Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger Signed-off-by: Greg Kroah-Hartman commit 7505584356d2b681b235340c0944780bfa62edb0 Author: Chandan Rajendra Date: Mon Nov 14 21:26:26 2016 -0500 ext4: fix stack memory corruption with 64k block size commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 upstream. The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 86efd99f0ebeb42df96587e450fbf098b5400371 Author: Chandan Rajendra Date: Mon Nov 14 21:04:37 2016 -0500 ext4: fix mballoc breakage with 64k block size commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 upstream. 'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger Signed-off-by: Greg Kroah-Hartman commit 8022387d236a7586726ca1bccc97d0447f8b020d Author: Alex Porosanu Date: Wed Nov 9 10:46:11 2016 +0200 crypto: caam - fix AEAD givenc descriptors commit d128af17876d79b87edf048303f98b35f6a53dbc upstream. The AEAD givenc descriptor relies on moving the IV through the output FIFO and then back to the CTX2 for authentication. The SEQ FIFO STORE could be scheduled before the data can be read from OFIFO, especially since the SEQ FIFO LOAD needs to wait for the SEQ FIFO LOAD SKIP to finish first. The SKIP takes more time when the input is SG than when it's a contiguous buffer. If the SEQ FIFO LOAD is not scheduled before the STORE, the DECO will hang waiting for data to be available in the OFIFO so it can be transferred to C2. In order to overcome this, first force transfer of IV to C2 by starting the "cryptlen" transfer first and then starting to store data from OFIFO to the output buffer. Fixes: 1acebad3d8db8 ("crypto: caam - faster aead implementation") Signed-off-by: Alex Porosanu Signed-off-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit ade692b8f1f5cb0e47ae08f04fe24bb0cc2767d5 Author: Eric W. Biederman Date: Mon Nov 14 18:48:07 2016 -0600 ptrace: Capture the ptracer's creds not PT_PTRACE_CAP commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream. When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was overlooked. This can result in incorrect behavior when an application like strace traces an exec of a setuid executable. Further PT_PTRACE_CAP does not have enough information for making good security decisions as it does not report which user namespace the capability is in. This has already allowed one mistake through insufficient granulariy. I found this issue when I was testing another corner case of exec and discovered that I could not get strace to set PT_PTRACE_CAP even when running strace as root with a full set of caps. This change fixes the above issue with strace allowing stracing as root a setuid executable without disabling setuid. More fundamentaly this change allows what is allowable at all times, by using the correct information in it's decision. Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12") Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit 23d179acb363b0670ac698c342ec9a18b3d8759a Author: Linus Torvalds Date: Wed Dec 14 12:45:25 2016 -0800 vfs,mm: fix return value of read() at s_maxbytes commit d05c5f7ba164aed3db02fb188c26d0dd94f5455b upstream. We truncated the possible read iterator to s_maxbytes in commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()"), but our end condition handling was wrong: it's not an error to try to read at the end of the file. Reading past the end should return EOF (0), not EINVAL. See for example https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1649342 http://lists.gnu.org/archive/html/bug-coreutils/2016-12/msg00008.html where a md5sum of a maximally sized file fails because the final read is exactly at s_maxbytes. Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Reported-by: Joseph Salisbury Cc: Wei Fang Cc: Christoph Hellwig Cc: Dave Chinner Cc: Al Viro Cc: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e45692fa1aea06676449b63ef3c2b6e1e72b7578 Author: Eric W. Biederman Date: Thu Oct 13 21:23:16 2016 -0500 mm: Add a user_ns owner to mm_struct and fix ptrace permission checks commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream. During exec dumpable is cleared if the file that is being executed is not readable by the user executing the file. A bug in ptrace_may_access allows reading the file if the executable happens to enter into a subordinate user namespace (aka clone(CLONE_NEWUSER), unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER). This problem is fixed with only necessary userspace breakage by adding a user namespace owner to mm_struct, captured at the time of exec, so it is clear in which user namespace CAP_SYS_PTRACE must be present in to be able to safely give read permission to the executable. The function ptrace_may_access is modified to verify that the ptracer has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns. This ensures that if the task changes it's cred into a subordinate user namespace it does not become ptraceable. The function ptrace_attach is modified to only set PT_PTRACE_CAP when CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of PT_PTRACE_CAP is to be a flag to note that whatever permission changes the task might go through the tracer has sufficient permissions for it not to be an issue. task->cred->user_ns is always the same as or descendent of mm->user_ns. Which guarantees that having CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks credentials. To prevent regressions mm->dumpable and mm->user_ns are not considered when a task has no mm. As simply failing ptrace_may_attach causes regressions in privileged applications attempting to read things such as /proc//stat Acked-by: Kees Cook Tested-by: Cyrill Gorcunov Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces") Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit 04804d83d483e2b503e842a56acb36e4502187eb Author: NeilBrown Date: Mon Dec 12 08:21:51 2016 -0700 block_dev: don't test bdev->bd_contains when it is not stable commit bcc7f5b4bee8e327689a4d994022765855c807ff upstream. bdev->bd_contains is not stable before calling __blkdev_get(). When __blkdev_get() is called on a parition with ->bd_openers == 0 it sets bdev->bd_contains = bdev; which is not correct for a partition. After a call to __blkdev_get() succeeds, ->bd_openers will be > 0 and then ->bd_contains is stable. When FMODE_EXCL is used, blkdev_get() calls bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim() This call happens before __blkdev_get() is called, so ->bd_contains is not stable. So bd_may_claim() cannot safely use ->bd_contains. It currently tries to use it, and this can lead to a BUG_ON(). This happens when a whole device is already open with a bd_holder (in use by dm in my particular example) and two threads race to open a partition of that device for the first time, one opening with O_EXCL and one without. The thread that doesn't use O_EXCL gets through blkdev_get() to __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev; Immediately thereafter the other thread, using FMODE_EXCL, calls bd_start_claiming() from blkdev_get(). This should fail because the whole device has a holder, but because bdev->bd_contains == bdev bd_may_claim() incorrectly reports success. This thread continues and blocks on bd_mutex. The first thread then sets bdev->bd_contains correctly and drops the mutex. The thread using FMODE_EXCL then continues and when it calls bd_may_claim() again in: BUG_ON(!bd_may_claim(bdev, whole, holder)); The BUG_ON fires. Fix this by removing the dependency on ->bd_contains in bd_may_claim(). As bd_may_claim() has direct access to the whole device, it can simply test if the target bdev is the whole device. Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block") Signed-off-by: NeilBrown Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 52d69727a441ff37527ede38706002e1ac911a12 Author: Aleksa Sarai Date: Wed Dec 21 16:26:24 2016 +1100 fs: exec: apply CLOEXEC before changing dumpable task flags commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream. If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc//fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Reported-by: Michael Crosby Signed-off-by: Aleksa Sarai Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit 781e976ac2c67fe45bacff8c7ca98129f94b2fb7 Author: Eric W. Biederman Date: Wed Nov 16 22:06:51 2016 -0600 exec: Ensure mm->user_ns contains the execed files commit f84df2a6f268de584a201e8911384a2d244876e3 upstream. When the user namespace support was merged the need to prevent ptrace from revealing the contents of an unreadable executable was overlooked. Correct this oversight by ensuring that the executed file or files are in mm->user_ns, by adjusting mm->user_ns. Use the new function privileged_wrt_inode_uidgid to see if the executable is a member of the user namespace, and as such if having CAP_SYS_PTRACE in the user namespace should allow tracing the executable. If not update mm->user_ns to the parent user namespace until an appropriate parent is found. Reported-by: Jann Horn Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.") Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit fc1d3e5f3cd0a9f3bf173c19d321102e406f8d2f Author: Wang Xiaoguang Date: Thu Oct 13 09:23:39 2016 +0800 btrfs: make file clone aware of fatal signals commit 69ae5e4459e43e56f03d0987e865fbac2b05af2a upstream. Indeed this just make the behavior similar to xfs when process has fatal signals pending, and it'll make fstests/generic/298 happy. Signed-off-by: Wang Xiaoguang Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 8c59356c12322d26e597156558b6361536c8ca86 Author: Filipe Manana Date: Mon Sep 19 10:57:40 2016 +0100 Btrfs: fix incremental send failure caused by balance commit d5e84fd8d0634d056248b67463b42f6c85896a19 upstream. Commit 951555856b88 ("Btrfs: send, don't bug on inconsistent snapshots") removed some BUG_ON() statements (replacing them with returning errors to user space and logging error messages) when a snapshot is in an inconsistent state due to failures to update a delayed inode item (ENOMEM or ENOSPC) after adding/updating/deleting references, xattrs or file extent items. However there is a case, when no errors happen, where a file extent item can be modified without having the corresponding inode item updated. This case happens during balance under very specific timings, when relocation is in the stage where it updates data pointers and a leaf that contains file extent items is COWed. When that happens file extent items get their disk_bytenr field updated to a new value that reflects the post relocation logical address of the extent, without updating their respective inode items (as there is nothing that needs to be updated on them). This is performed at relocation.c:replace_file_extents() through relocation.c:btrfs_reloc_cow_block(). So make an incremental send deal with this case and don't do any processing for a file extent item that got its disk_bytenr field updated by relocation, since the extent's data is the same as the one pointed by the file extent item in the parent snapshot. After the recent commit mentioned above this case resulted in EIO errors returned to user space (and an error message logged to dmesg/syslog) when doing an incremental send, while before it, it resulted in hitting a BUG_ON leading to the following trace: [ 952.206705] ------------[ cut here ]------------ [ 952.206714] kernel BUG at ../fs/btrfs/send.c:5653! [ 952.206719] Internal error: Oops - BUG: 0 [#1] SMP [ 952.209854] Modules linked in: st dm_mod nls_utf8 isofs fuse nf_log_ipv6 xt_pkttype xt_physdev br_netfilter nf_log_ipv4 nf_log_common xt_LOG xt_limit ebtable_filter ebtables af_packet bridge stp llc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables xfs libcrc32c nls_iso8859_1 nls_cp437 vfat fat joydev aes_ce_blk ablk_helper cryptd snd_intel8x0 aes_ce_cipher snd_ac97_codec ac97_bus snd_pcm ghash_ce sha2_ce sha1_ce snd_timer snd virtio_net soundcore btrfs xor sr_mod cdrom hid_generic usbhid raid6_pq virtio_blk virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_mmio xhci_pci xhci_hcd usbcore usb_common virtio_pci virtio_ring virtio drm sg efivarfs [ 952.228333] Supported: Yes [ 952.228908] CPU: 0 PID: 12779 Comm: snapperd Not tainted 4.4.14-50-default #1 [ 952.230329] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 952.231683] task: ffff800058e94100 ti: ffff8000d866c000 task.ti: ffff8000d866c000 [ 952.233279] PC is at changed_cb+0x9f4/0xa48 [btrfs] [ 952.234375] LR is at changed_cb+0x58/0xa48 [btrfs] [ 952.236552] pc : [] lr : [] pstate: 80000145 [ 952.238049] sp : ffff8000d866fa20 [ 952.238732] x29: ffff8000d866fa20 x28: 0000000000000019 [ 952.239840] x27: 00000000000028d5 x26: 00000000000024a2 [ 952.241008] x25: 0000000000000002 x24: ffff8000e66e92f0 [ 952.242131] x23: ffff8000b8c76800 x22: ffff800092879140 [ 952.243238] x21: 0000000000000002 x20: ffff8000d866fb78 [ 952.244348] x19: ffff8000b8f8c200 x18: 0000000000002710 [ 952.245607] x17: 0000ffff90d42480 x16: ffff800000237dc0 [ 952.246719] x15: 0000ffff90de7510 x14: ab000c000a2faf08 [ 952.247835] x13: 0000000000577c2b x12: ab000c000b696665 [ 952.248981] x11: 2e65726f632f6966 x10: 652d34366d72612f [ 952.250101] x9 : 32627572672f746f x8 : ab000c00092f1671 [ 952.251352] x7 : 8000000000577c2b x6 : ffff800053eadf45 [ 952.252468] x5 : 0000000000000000 x4 : ffff80005e169494 [ 952.253582] x3 : 0000000000000004 x2 : ffff8000d866fb78 [ 952.254695] x1 : 000000000003e2a3 x0 : 000000000003e2a4 [ 952.255803] [ 952.256150] Process snapperd (pid: 12779, stack limit = 0xffff8000d866c020) [ 952.257516] Stack: (0xffff8000d866fa20 to 0xffff8000d8670000) [ 952.258654] fa20: ffff8000d866fae0 ffff7ffffc308fc0 ffff800092879140 ffff8000e66e92f0 [ 952.260219] fa40: 0000000000000035 ffff800055de6000 ffff8000b8c76800 ffff8000d866fb78 [ 952.261745] fa60: 0000000000000002 00000000000024a2 00000000000028d5 0000000000000019 [ 952.263269] fa80: ffff8000d866fae0 ffff7ffffc3090f0 ffff8000d866fae0 ffff7ffffc309128 [ 952.264797] faa0: ffff800092879140 ffff8000e66e92f0 0000000000000035 ffff800055de6000 [ 952.268261] fac0: ffff8000b8c76800 ffff8000d866fb78 0000000000000002 0000000000001000 [ 952.269822] fae0: ffff8000d866fbc0 ffff7ffffc39ecfc ffff8000b8f8c200 ffff8000b8f8c368 [ 952.271368] fb00: ffff8000b8f8c378 ffff800055de6000 0000000000000001 ffff8000ecb17500 [ 952.272893] fb20: ffff8000b8c76800 ffff800092879140 ffff800062b6d000 ffff80007a9e2470 [ 952.274420] fb40: ffff8000b8f8c208 0000000005784000 ffff8000580a8000 ffff8000b8f8c200 [ 952.276088] fb60: ffff7ffffc39d488 00000002b8f8c368 0000000000000000 000000000003e2a4 [ 952.280275] fb80: 000000000000006c ffff7ffffc39ec00 000000000003e2a4 000000000000006c [ 952.283219] fba0: ffff8000b8f8c300 0000000000000100 0000000000000001 ffff8000ecb17500 [ 952.286166] fbc0: ffff8000d866fcd0 ffff7ffffc3643c0 ffff8000f8842700 0000ffff8ffe9278 [ 952.289136] fbe0: 0000000040489426 ffff800055de6000 0000ffff8ffe9278 0000000040489426 [ 952.292083] fc00: 000000000000011d 000000000000001d ffff80007a9e4598 ffff80007a9e43e8 [ 952.294959] fc20: ffff8000b8c7693f 0000000000003b24 0000000000000019 ffff8000b8f8c218 [ 952.301161] fc40: 00000001d866fc70 ffff8000b8c76800 0000000000000128 ffffffffffffff84 [ 952.305749] fc60: ffff800058e941ff 0000000000003a58 ffff8000d866fcb0 ffff8000000f7390 [ 952.308875] fc80: 000000000000012a 0000000000010290 ffff8000d866fc00 000000000000007b [ 952.311915] fca0: 0000000000010290 ffff800046c1b100 74732d7366727462 000001006d616572 [ 952.314937] fcc0: ffff8000fffc4100 cb88537fdc8ba60e ffff8000d866fe10 ffff8000002499e8 [ 952.318008] fce0: 0000000040489426 ffff8000f8842700 0000ffff8ffe9278 ffff80007a9e4598 [ 952.321321] fd00: 0000ffff8ffe9278 0000000040489426 000000000000011d 000000000000001d [ 952.324280] fd20: ffff80000072c000 ffff8000d866c000 ffff8000d866fda0 ffff8000000e997c [ 952.327156] fd40: ffff8000fffc4180 00000000000031ed ffff8000fffc4180 ffff800046c1b7d4 [ 952.329895] fd60: 0000000000000140 0000ffff907ea170 000000000000011d 00000000000000dc [ 952.334641] fd80: ffff80000072c000 ffff8000d866c000 0000000000000000 0000000000000002 [ 952.338002] fda0: ffff8000d866fdd0 ffff8000000ebacc ffff800046c1b080 ffff800046c1b7d4 [ 952.340724] fdc0: ffff8000d866fdf0 ffff8000000db67c 0000000000000040 ffff800000e69198 [ 952.343415] fde0: 0000ffff8ffea790 00000000000031ed ffff8000d866fe20 ffff800000254000 [ 952.346101] fe00: 000000000000001d 0000000000000004 ffff8000d866fe90 ffff800000249d3c [ 952.348980] fe20: ffff8000f8842700 0000000000000000 ffff8000f8842701 0000000000000008 [ 952.351696] fe40: ffff8000d866fe70 0000000000000008 ffff8000d866fe90 ffff800000249cf8 [ 952.354387] fe60: ffff8000f8842700 0000ffff8ffe9170 ffff8000f8842701 0000000000000008 [ 952.357083] fe80: 0000ffff8ffe9278 ffff80008ff85500 0000ffff8ffe90c0 ffff800000085c84 [ 952.359800] fea0: 0000000000000000 0000ffff8ffe9170 ffffffffffffffff 0000ffff90d473bc [ 952.365351] fec0: 0000000000000000 0000000000000015 0000000000000008 0000000040489426 [ 952.369550] fee0: 0000ffff8ffe9278 0000ffff907ea790 0000ffff907ea170 0000ffff907ea790 [ 952.372416] ff00: 0000ffff907ea170 0000000000000000 000000000000001d 0000000000000004 [ 952.375223] ff20: 0000ffff90a32220 00000000003d0f00 0000ffff907ea0a0 0000ffff8ffe8f30 [ 952.378099] ff40: 0000ffff9100f554 0000ffff91147000 0000ffff91117bc0 0000ffff90d473b0 [ 952.381115] ff60: 0000ffff9100f620 0000ffff880069b0 0000ffff8ffe9170 0000ffff8ffe91a0 [ 952.384003] ff80: 0000ffff8ffe9160 0000ffff8ffe9140 0000ffff88006990 0000ffff8ffe9278 [ 952.386860] ffa0: 0000ffff88008a60 0000ffff8ffe9480 0000ffff88014ca0 0000ffff8ffe90c0 [ 952.389654] ffc0: 0000ffff910be8e8 0000ffff8ffe90c0 0000ffff90d473bc 0000000000000000 [ 952.410986] ffe0: 0000000000000008 000000000000001d 6e2079747265706f 72616d223d656d61 [ 952.415497] Call trace: [ 952.417403] [] changed_cb+0x9f4/0xa48 [btrfs] [ 952.420023] [] btrfs_compare_trees+0x500/0x6b0 [btrfs] [ 952.422759] [] btrfs_ioctl_send+0xb4c/0xe10 [btrfs] [ 952.425601] [] btrfs_ioctl+0x374/0x29a4 [btrfs] [ 952.428031] [] do_vfs_ioctl+0x33c/0x600 [ 952.430360] [] SyS_ioctl+0x90/0xa4 [ 952.432552] [] el0_svc_naked+0x38/0x3c [ 952.434803] Code: 2a1503e0 17fffdac b9404282 17ffff28 (d4210000) [ 952.437457] ---[ end trace 9afd7090c466cf15 ]--- Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit 02fffa116bda3353e41954e48ca5253f970b6c4d Author: Josef Bacik Date: Fri Sep 23 13:23:28 2016 +0200 Btrfs: don't BUG() during drop snapshot commit 4867268c57ff709a7b6b86ae6f6537d846d1443a upstream. Really there's lots of things that can go wrong here, kill all the BUG_ON()'s and replace the logic ones with ASSERT()'s and return EIO instead. Signed-off-by: Josef Bacik [ switched to btrfs_err, errors go to common label ] Reviewed-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 0f2e022db23739f537ca2abc66aae1e5ba9205e8 Author: Anand Jain Date: Thu Sep 22 12:56:13 2016 +0800 btrfs: fix a possible umount deadlock commit 0ccd05285e7f5a8e297e1d6dfc41e7c65757d6fa upstream. btrfs_show_devname() is using the device_list_mutex, sometimes a call to blkdev_put() leads vfs calling into this func. So call blkdev_put() outside of device_list_mutex, as of now. [ 983.284212] ====================================================== [ 983.290401] [ INFO: possible circular locking dependency detected ] [ 983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted [ 983.302081] ------------------------------------------------------- [ 983.308357] umount/21720 is trying to acquire lock: [ 983.313243] (&bdev->bd_mutex){+.+.+.}, at: [] blkdev_put+0x31/0x150 [ 983.321264] [ 983.321264] but task is already holding lock: [ 983.327101] (&fs_devs->device_list_mutex){+.+...}, at: [] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.337839] [ 983.337839] which lock already depends on the new lock. [ 983.337839] [ 983.346024] [ 983.346024] the existing dependency chain (in reverse order) is: [ 983.353512] -> #4 (&fs_devs->device_list_mutex){+.+...}: [ 983.359096] [] lock_acquire+0x1bc/0x1f0 [ 983.365143] [] mutex_lock_nested+0x65/0x350 [ 983.371521] [] btrfs_show_devname+0x36/0x1f0 [btrfs] [ 983.378710] [] show_vfsmnt+0x4e/0x150 [ 983.384593] [] m_show+0x17/0x20 [ 983.389957] [] seq_read+0x2b5/0x3b0 [ 983.395669] [] __vfs_read+0x28/0x100 [ 983.401464] [] vfs_read+0xab/0x150 [ 983.407080] [] SyS_read+0x52/0xb0 [ 983.412609] [] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.419617] -> #3 (namespace_sem){++++++}: [ 983.424024] [] lock_acquire+0x1bc/0x1f0 [ 983.430074] [] down_write+0x49/0x80 [ 983.435785] [] lock_mount+0x67/0x1c0 [ 983.441582] [] do_add_mount+0x32/0xf0 [ 983.447458] [] finish_automount+0x5a/0xc0 [ 983.453682] [] follow_managed+0x1b3/0x2a0 [ 983.459912] [] lookup_fast+0x300/0x350 [ 983.465875] [] path_openat+0x3a7/0xaa0 [ 983.471846] [] do_filp_open+0x85/0xe0 [ 983.477731] [] do_sys_open+0x14c/0x1f0 [ 983.483702] [] SyS_open+0x1e/0x20 [ 983.489240] [] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.496254] -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}: [ 983.501798] [] lock_acquire+0x1bc/0x1f0 [ 983.507855] [] down_write+0x49/0x80 [ 983.513558] [] start_creating+0x87/0x100 [ 983.519703] [] debugfs_create_dir+0x17/0x100 [ 983.526195] [] bdi_register+0x93/0x210 [ 983.532165] [] bdi_register_owner+0x43/0x70 [ 983.538570] [] device_add_disk+0x1fb/0x450 [ 983.544888] [] loop_add+0x1e6/0x290 [ 983.550596] [] loop_init+0x10b/0x14f [ 983.556394] [] do_one_initcall+0xa7/0x180 [ 983.562618] [] kernel_init_freeable+0x1cc/0x266 [ 983.569370] [] kernel_init+0xe/0x100 [ 983.575166] [] ret_from_fork+0x1f/0x40 [ 983.581131] -> #1 (loop_index_mutex){+.+.+.}: [ 983.585801] [] lock_acquire+0x1bc/0x1f0 [ 983.591858] [] mutex_lock_nested+0x65/0x350 [ 983.598256] [] lo_open+0x1f/0x60 [ 983.603704] [] __blkdev_get+0x123/0x400 [ 983.609757] [] blkdev_get+0x34a/0x350 [ 983.615639] [] blkdev_open+0x64/0x80 [ 983.621428] [] do_dentry_open+0x1c6/0x2d0 [ 983.627651] [] vfs_open+0x69/0x80 [ 983.633181] [] path_openat+0x834/0xaa0 [ 983.639152] [] do_filp_open+0x85/0xe0 [ 983.645035] [] do_sys_open+0x14c/0x1f0 [ 983.650999] [] SyS_open+0x1e/0x20 [ 983.656535] [] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.663541] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 983.668107] [] __lock_acquire+0x1003/0x17b0 [ 983.674510] [] lock_acquire+0x1bc/0x1f0 [ 983.680561] [] mutex_lock_nested+0x65/0x350 [ 983.686967] [] blkdev_put+0x31/0x150 [ 983.692761] [] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.699699] [] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.707178] [] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 983.714380] [] close_ctree+0x265/0x340 [btrfs] [ 983.721061] [] btrfs_put_super+0x19/0x20 [btrfs] [ 983.727908] [] generic_shutdown_super+0x6f/0x100 [ 983.734744] [] kill_anon_super+0x16/0x30 [ 983.740888] [] btrfs_kill_super+0x1e/0x130 [btrfs] [ 983.747909] [] deactivate_locked_super+0x49/0x80 [ 983.754745] [] deactivate_super+0x5d/0x70 [ 983.760977] [] cleanup_mnt+0x5c/0x80 [ 983.766773] [] __cleanup_mnt+0x12/0x20 [ 983.772738] [] task_work_run+0x7e/0xc0 [ 983.778708] [] exit_to_usermode_loop+0x7e/0xb4 [ 983.785373] [] syscall_return_slowpath+0xbb/0xd0 [ 983.792212] [] entry_SYSCALL_64_fastpath+0xbf/0xc1 [ 983.799225] [ 983.799225] other info that might help us debug this: [ 983.799225] [ 983.807291] Chain exists of: &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex [ 983.816521] Possible unsafe locking scenario: [ 983.816521] [ 983.822489] CPU0 CPU1 [ 983.827043] ---- ---- [ 983.831599] lock(&fs_devs->device_list_mutex); [ 983.836289] lock(namespace_sem); [ 983.842268] lock(&fs_devs->device_list_mutex); [ 983.849478] lock(&bdev->bd_mutex); [ 983.853127] [ 983.853127] *** DEADLOCK *** [ 983.853127] [ 983.859113] 3 locks held by umount/21720: [ 983.863145] #0: (&type->s_umount_key#35){++++..}, at: [] deactivate_super+0x55/0x70 [ 983.872713] #1: (uuid_mutex){+.+.+.}, at: [] btrfs_close_devices+0x23/0xa0 [btrfs] [ 983.882206] #2: (&fs_devs->device_list_mutex){+.+...}, at: [] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.893422] [ 983.893422] stack backtrace: [ 983.897824] CPU: 6 PID: 21720 Comm: umount Not tainted 4.8.0-rc5-ceph-00023-g1b39cec2 #1 [ 983.905958] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [ 983.913492] 0000000000000000 ffff8c8a53c17a38 ffffffff91429521 ffffffff9260f4f0 [ 983.921018] ffffffff92642760 ffff8c8a53c17a88 ffffffff911b2b04 0000000000000050 [ 983.928542] ffffffff9237d620 ffff8c8a5294aee0 ffff8c8a5294aeb8 ffff8c8a5294aee0 [ 983.936072] Call Trace: [ 983.938545] [] dump_stack+0x85/0xc4 [ 983.943715] [] print_circular_bug+0x1fb/0x20c [ 983.949748] [] __lock_acquire+0x1003/0x17b0 [ 983.955613] [] lock_acquire+0x1bc/0x1f0 [ 983.961123] [] ? blkdev_put+0x31/0x150 [ 983.966550] [] mutex_lock_nested+0x65/0x350 [ 983.972407] [] ? blkdev_put+0x31/0x150 [ 983.977832] [] blkdev_put+0x31/0x150 [ 983.983101] [] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.989500] [] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.996415] [] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 984.003068] [] close_ctree+0x265/0x340 [btrfs] [ 984.009189] [] ? evict_inodes+0x15e/0x170 [ 984.014881] [] btrfs_put_super+0x19/0x20 [btrfs] [ 984.021176] [] generic_shutdown_super+0x6f/0x100 [ 984.027476] [] kill_anon_super+0x16/0x30 [ 984.033082] [] btrfs_kill_super+0x1e/0x130 [btrfs] [ 984.039548] [] deactivate_locked_super+0x49/0x80 [ 984.045839] [] deactivate_super+0x5d/0x70 [ 984.051525] [] cleanup_mnt+0x5c/0x80 [ 984.056774] [] __cleanup_mnt+0x12/0x20 [ 984.062201] [] task_work_run+0x7e/0xc0 [ 984.067625] [] exit_to_usermode_loop+0x7e/0xb4 [ 984.073747] [] syscall_return_slowpath+0xbb/0xd0 [ 984.080038] [] entry_SYSCALL_64_fastpath+0xbf/0xc1 Reported-by: Ilya Dryomov Signed-off-by: Anand Jain Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 65563ab7271bf12683fcd4d668d1b37878fb795a Author: Liu Bo Date: Tue Sep 13 19:02:27 2016 -0700 Btrfs: fix memory leak in do_walk_down commit a958eab0ed7fdc1b977bc25d3af6efedaa945488 upstream. The extent buffer 'next' needs to be free'd conditionally. Signed-off-by: Liu Bo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 364b85c57d253c95498f57c9c02a430063883798 Author: Jeff Mahoney Date: Tue Sep 20 08:50:21 2016 -0400 btrfs: clean the old superblocks before freeing the device commit cea67ab92d3d4da9f2b4141d87cb8664757daca0 upstream. btrfs_rm_device frees the block device but then re-opens it using the saved device name. A race exists between the close and the re-open that allows the block size to be changed. The result is getting stuck forever in the reclaim loop in __getblk_slow. This patch moves the superblock cleanup before closing the block device, which is also consistent with other callers. We also don't need a private copy of dev_name as the whole routine operates under the uuid_mutex. Signed-off-by: Jeff Mahoney Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 6a6e9276f3f5e6670309470a712a10220bc9480a Author: Josef Bacik Date: Fri Sep 2 15:25:43 2016 -0400 Btrfs: don't leak reloc root nodes on error commit 6bdf131fac2336adb1a628f992ba32384f653a55 upstream. We don't track the reloc roots in any sort of normal way, so the only way the root/commit_root nodes get free'd is if the relocation finishes successfully and the reloc root is deleted. Fix this by free'ing them in free_reloc_roots. Thanks, Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 4d3d9b59d963d35f71e5e04cf3b312c9d6547f60 Author: Liu Bo Date: Wed Sep 14 08:51:46 2016 -0700 Btrfs: return gracefully from balance if fs tree is corrupted commit 3561b9db70928f207be4570b48fc19898eeaef54 upstream. When relocating tree blocks, we firstly get block information from back references in the extent tree, we then search fs tree to try to find all parents of a block. However, if fs tree is corrupted, eg. if there're some missing items, we could come across these WARN_ONs and BUG_ONs. This makes us print some error messages and return gracefully from balance. Signed-off-by: Liu Bo Reviewed-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit a6522e487b2c3c4834470ae6eea9a24bedcb7127 Author: Liu Bo Date: Thu Aug 25 18:08:27 2016 -0700 Btrfs: bail out if block group has different mixed flag commit 49303381f19ab16a371a061b67e783d3f570d56e upstream. Currently we allow inconsistence about mixed flag (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA). We'd get ENOSPC if block group has mixed flag and btrfs doesn't. If that happens, we have one space_info with mixed flag and another space_info only with BTRFS_BLOCK_GROUP_METADATA, and global_block_rsv.space_info points to the latter one, but all bytes from block_group contributes to the mixed space_info, thus all the allocation will fail with ENOSPC. This adds a check for the above case. Reported-by: Vegard Nossum Signed-off-by: Liu Bo [ updated message ] Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit d7839adcb0e310aef3a81a3992dcc3e1c8768e5f Author: Liu Bo Date: Wed Aug 3 12:33:01 2016 -0700 Btrfs: fix memory leak in reading btree blocks commit 2571e739677f1e4c0c63f5ed49adcc0857923625 upstream. So we can read a btree block via readahead or intentional read, and we can end up with a memory leak when something happens as follows, 1) readahead starts to read block A but does not wait for read completion, 2) btree_readpage_end_io_hook finds that block A is corrupted, and it needs to clear all block A's pages' uptodate bit. 3) meanwhile an intentional read kicks in and checks block A's pages' uptodate to decide which page needs to be read. 4) when some pages have the uptodate bit during 3)'s check so 3) doesn't count them for eb->io_pages, but they are later cleared by 2) so we has to readpage on the page, we get the wrong eb->io_pages which results in a memory leak of this block. This fixes the problem by firstly getting all pages's locking and then checking pages' uptodate bit. t1(readahead) t2(readahead endio) t3(the following read) read_extent_buffer_pages end_bio_extent_readpage for pg in eb: for page 0,1,2 in eb: if pg is uptodate: btree_readpage_end_io_hook(pg) num_reads++ if uptodate: eb->io_pages = num_reads SetPageUptodate(pg) _______________ for pg in eb: for page 3 in eb: read_extent_buffer_pages if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb: __extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate: clear_extent_buffer_uptodate(eb) num_reads++ for pg in eb: eb->io_pages = num_reads ClearPageUptodate(page) _______________ for pg in eb: if pg is NOT uptodate: __extent_read_full_page(pg) So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb. Signed-off-by: Liu Bo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 1a087cd869c6ac9e9d476df26ae527fd68d6c7e9 Author: Richard Watts Date: Fri Dec 2 23:14:38 2016 +0200 clk: ti: omap36xx: Work around sprz319 advisory 2.1 commit 035cd485a47dda64f25ccf8a90b11a07d0b7aa7a upstream. The OMAP36xx DPLL5, driving EHCI USB, can be subject to a long-term frequency drift. The frequency drift magnitude depends on the VCO update rate, which is inversely proportional to the PLL divider. The kernel DPLL configuration code results in a high value for the divider, leading to a long term drift high enough to cause USB transmission errors. In the worst case the USB PHY's ULPI interface can stop responding, breaking USB operation completely. This manifests itself on the Beagleboard xM by the LAN9514 reporting 'Cannot enable port 2. Maybe the cable is bad?' in the kernel log. Errata sprz319 advisory 2.1 documents PLL values that minimize the drift. Use them automatically when DPLL5 is used for USB operation, which we detect based on the requested clock rate. The clock framework will still compute the PLL parameters and resulting rate as usual, but the PLL M and N values will then be overridden. This can result in the effective clock rate being slightly different than the rate cached by the clock framework, but won't cause any adverse effect to USB operation. Signed-off-by: Richard Watts [Upported from v3.2 to v4.9] Signed-off-by: Laurent Pinchart Tested-by: Ladislav Michl Signed-off-by: Stephen Boyd Cc: Adam Ford Signed-off-by: Greg Kroah-Hartman commit 2b96c4b19e0a243475811c71f2b1c96fed0a7b11 Author: Kai-Heng Feng Date: Tue Dec 6 16:56:27 2016 +0800 ALSA: hda: when comparing pin configurations, ignore assoc in addition to seq commit 5e0ad0d8747f3e4803a9c3d96d64dd7332506d3c upstream. Commit [64047d7f4912 ALSA: hda - ignore the assoc and seq when comparing pin configurations] intented to ignore both seq and assoc at pin comparing, but it only ignored seq. So that commit may still fail to match pins on some machines. Change the bitmask to also ignore assoc. v2: Use macro to do bit masking. Thanks to Hui Wang for the analysis. Fixes: 64047d7f4912 ("ALSA: hda - ignore the assoc and seq when comparing...") Signed-off-by: Kai-Heng Feng Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit dc8144f49988b7e3f9ae40e9db21acdf87671f64 Author: Takashi Iwai Date: Tue Dec 6 11:55:17 2016 +0100 ALSA: hda - Gate the mic jack on HP Z1 Gen3 AiO commit f73cd43ac3b41c0f09a126387f302bbc0d9c726d upstream. HP Z1 Gen3 AiO with Conexant codec doesn't give an unsolicited event to the headset mic pin upon the jack plugging, it reports only to the headphone pin. It results in the missing mic switching. Let's fix up by simply gating the jack event. Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 685c4db17890f4af236fdb9402131a3c943150b5 Author: Hui Wang Date: Wed Nov 23 16:05:38 2016 +0800 ALSA: hda - fix headset-mic problem on a Dell laptop commit 989dbe4a30728c047316ab87e5fa8b609951ce7c upstream. This group of new pins is not in the pin quirk table yet, adding them to the pin quirk table to fix the headset-mic problem. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit b757fc019fc98ff2eed472d9e9bc2686aabad02d Author: Hui Wang Date: Wed Nov 23 16:05:37 2016 +0800 ALSA: hda - ignore the assoc and seq when comparing pin configurations commit 64047d7f4912de1769d1bf0d34c6322494b13779 upstream. More and more pin configurations have been adding to the pin quirk table, lots of them are only different from assoc and seq, but they all apply to the same QUIRK_FIXUP, if we don't compare assoc and seq when matching pin configurations, it will greatly reduce the pin quirk table size. We have tested this change on a couple of Dell laptops, it worked well. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 38c6095f48fe2b69c884e53e9eaaa43d584efdec Author: Sven Hahne Date: Fri Nov 25 14:16:43 2016 +0100 ALSA: hda/ca0132 - Add quirk for Alienware 15 R2 2016 commit b5337cfe067e96b8a98699da90c7dcd2bec21133 upstream. I'm using an Alienware 15 R2 and had to use the alienware quirks to get my headphone output working. I fixed it by adding, SND_PCI_QUIRK(0x1028, 0x0708, "Alienware 15 R2 2016", QUIRK_ALIENWARE) to the patch. Signed-off-by: Sven Hahne Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 5496ec423449d6415a1c14381dfa2a9056101d03 Author: Jussi Laako Date: Mon Nov 28 11:27:45 2016 +0200 ALSA: hiface: Fix M2Tech hiFace driver sampling rate change commit 995c6a7fd9b9212abdf01160f6ce3193176be503 upstream. Sampling rate changes after first set one are not reflected to the hardware, while driver and ALSA think the rate has been changed. Fix the problem by properly stopping the interface at the beginning of prepare call, allowing new rate to be set to the hardware. This keeps the hardware in sync with the driver. Signed-off-by: Jussi Laako Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit abf549a8b9693419c1a076eda3c859d646a2d887 Author: Con Kolivas Date: Fri Dec 9 15:15:57 2016 +1100 ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to volume_control_quirks commit 82ffb6fc637150b279f49e174166d2aa3853eaf4 upstream. The Logitech QuickCam Communicate Deluxe/S7500 microphone fails with the following warning. [ 6.778995] usb 2-1.2.2.2: Warning! Unlikely big volume range (=3072), cval->res is probably wrong. [ 6.778996] usb 2-1.2.2.2: [5] FU [Mic Capture Volume] ch = 1, val = 4608/7680/1 Adding it to the list of devices in volume_control_quirks makes it work properly, fixing related typo. Signed-off-by: Con Kolivas Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 6b0a56e9aaf3df0e310f8f0524b5fd9e45f89d09 Author: Krzysztof Opasiak Date: Thu Dec 1 19:14:37 2016 +0100 usbip: vudc: fix: Clear already_seen flag also for ep0 commit 3e448e13a662fb20145916636127995cbf37eb83 upstream. ep_list inside gadget structure doesn't contain ep0. It is stored separately in ep0 field. This causes an urb hang if gadget driver decides to delay setup handling. On host side this is visible as timeout error when setting configuration. This bug can be reproduced using for example any gadget with mass storage function. Fixes: abdb29574322 ("usbip: vudc: Add vudc_transfer") Signed-off-by: Krzysztof Opasiak Acked-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit a9143e58d3cc6181c3d7d14c5553c71de18f0625 Author: Alan Stern Date: Fri Oct 21 16:49:07 2016 -0400 USB: UHCI: report non-PME wakeup signalling for Intel hardware commit ccdb6be9ec6580ef69f68949ebe26e0fb58a6fb0 upstream. The UHCI controllers in Intel chipsets rely on a platform-specific non-PME mechanism for wakeup signalling. They can generate wakeup signals even though they don't support PME. We need to let the USB core know this so that it will enable runtime suspend for UHCI controllers. Signed-off-by: Alan Stern Signed-off-by: Bjorn Helgaas Acked-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit 4422a26936aee327c2c9dd692ee84b6dda4773c7 Author: Felipe Balbi Date: Wed Sep 28 10:38:11 2016 +0300 usb: gadget: composite: correctly initialize ep->maxpacket commit e8f29bb719b47a234f33b0af62974d7a9521a52c upstream. usb_endpoint_maxp() returns wMaxPacketSize in its raw form. Without taking into consideration that it also contains other bits reserved for isochronous endpoints. This patch fixes one occasion where this is a problem by making sure that we initialize ep->maxpacket only with lower 10 bits of the value returned by usb_endpoint_maxp(). Note that seperate patches will be necessary to audit all call sites of usb_endpoint_maxp() and make sure that usb_endpoint_maxp() only returns lower 10 bits of wMaxPacketSize. Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit be8f1c44177ad666729e917b0a7ad1eef6336157 Author: Peter Chen Date: Tue Nov 8 10:10:44 2016 +0800 usb: gadget: f_uac2: fix error handling at afunc_bind commit f1d3861d63a5d79b8968a02eea1dcb01bb684e62 upstream. The current error handling flow uses incorrect goto label, fix it Fixes: d12a8727171c ("usb: gadget: function: Remove redundant usb_free_all_descriptors") Signed-off-by: Peter Chen Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit 79d5a30e37abb360b914e4048d06d3a6434e1a82 Author: Mathias Nyman Date: Thu Nov 17 11:14:14 2016 +0200 usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices commit 37be66767e3cae4fd16e064d8bb7f9f72bf5c045 upstream. USB-3 does not have any link state that will avoid negotiating a connection with a plugged-in cable but will signal the host when the cable is unplugged. For USB-3 we used to first set the link to Disabled, then to RxDdetect to be able to detect cable connects or disconnects. But in RxDetect the connected device is detected again and eventually enabled. Instead set the link into U3 and disable remote wakeups for the device. This is what Windows does, and what Alan Stern suggested. Cc: Alan Stern Acked-by: Alan Stern Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman commit e3dfbc8904b335de4f96fb813dc338ef95e6c86a Author: Felipe Balbi Date: Thu Sep 22 11:01:01 2016 +0300 usb: dwc3: gadget: set PCM1 field of isochronous-first TRBs commit 6b9018d4c1e5c958625be94a160a5984351d4632 upstream. In case of High-Speed, High-Bandwidth endpoints, we need to tell DWC3 that we have more than one packet per interval. We do that by setting PCM1 field of Isochronous-First TRB. Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit 269edaef820f6f03e48564236fba5f898eb47b6f Author: Nathaniel Quillin Date: Mon Dec 5 06:53:00 2016 -0800 USB: cdc-acm: add device id for GW Instek AFG-125 commit 301216044e4c27d5a7323c1fa766266fad00db5e upstream. Add device-id entry for GW Instek AFG-125, which has a byte swapped bInterfaceSubClass (0x20). Signed-off-by: Nathaniel Quillin Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman commit 741523f3da82d6b32764a7fcb2d789b39597699f Author: Johan Hovold Date: Tue Nov 29 16:55:01 2016 +0100 USB: serial: kl5kusb105: fix open error path commit 6774d5f53271d5f60464f824748995b71da401ab upstream. Kill urbs and disable read before returning from open on failure to retrieve the line state. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit f34b7e027eeba14c811dce51bf848bfd69562abe Author: Giuseppe Lippolis Date: Tue Dec 6 21:24:19 2016 +0100 USB: serial: option: add dlink dwm-158 commit d8a12b7117b42fd708f1e908498350232bdbd5ff upstream. Adding registration for 3G modem DWM-158 in usb-serial-option Signed-off-by: Giuseppe Lippolis Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit e47e81c4e705ad5b5167e4ebd318e811776b9e19 Author: Daniele Palmas Date: Thu Dec 1 16:38:39 2016 +0100 USB: serial: option: add support for Telit LE922A PIDs 0x1040, 0x1041 commit 5b09eff0c379002527ad72ea5ea38f25da8a8650 upstream. This patch adds support for PIDs 0x1040, 0x1041 of Telit LE922A. Since the interface positions are the same than the ones used for other Telit compositions, previous defined blacklists are used. Signed-off-by: Daniele Palmas Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 3aa3cb940878cfe7d94c9eefc979dfd1b48f905d Author: Filipe Manana Date: Thu Nov 24 02:09:04 2016 +0000 Btrfs: fix qgroup rescan worker initialization commit 8d9eddad19467b008e0c881bc3133d7da94b7ec1 upstream. We were setting the qgroup_rescan_running flag to true only after the rescan worker started (which is a task run by a queue). So if a user space task starts a rescan and immediately after asks to wait for the rescan worker to finish, this second call might happen before the rescan worker task starts running, in which case the rescan wait ioctl returns immediatley, not waiting for the rescan worker to finish. This was making the fstest btrfs/022 fail very often. Fixes: d2c609b834d6 (btrfs: properly track when rescan worker is running) Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 691ea6c7eb1ef1ccc236b5f35876fbde9798b1ac Author: Filipe Manana Date: Wed Nov 23 16:21:18 2016 +0000 Btrfs: fix emptiness check for dirtied extent buffers at check_leaf() commit f177d73949bf758542ca15a1c1945bd2e802cc65 upstream. We can not simply use the owner field from an extent buffer's header to get the id of the respective tree when the extent buffer is from a relocation tree. When we create the root for a relocation tree we leave (on purpose) the owner field with the same value as the subvolume's tree root (we do this at ctree.c:btrfs_copy_root()). So we must ignore extent buffers from relocation trees, which have the BTRFS_HEADER_FLAG_RELOC flag set, because otherwise we will always consider the extent buffer as not being the root of the tree (the root of original subvolume tree is always different from the root of the respective relocation tree). This lead to assertion failures when running with the integrity checker enabled (CONFIG_BTRFS_FS_CHECK_INTEGRITY=y) such as the following: [ 643.393409] BTRFS critical (device sdg): corrupt leaf, non-root leaf's nritems is 0: block=38506496, root=260, slot=0 [ 643.397609] BTRFS info (device sdg): leaf 38506496 total ptrs 0 free space 3995 [ 643.407075] assertion failed: 0, file: fs/btrfs/disk-io.c, line: 4078 [ 643.408425] ------------[ cut here ]------------ [ 643.409112] kernel BUG at fs/btrfs/ctree.h:3419! [ 643.409773] invalid opcode: 0000 [#1] PREEMPT SMP [ 643.410447] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq ppdev psmouse acpi_cpufreq parport_pc evdev parport tpm_tis tpm_tis_core pcspkr serio_raw i2c_piix4 sg tpm i2c_core button processor loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring scsi_mod virtio e1000 floppy [ 643.414356] CPU: 11 PID: 32726 Comm: btrfs Not tainted 4.8.0-rc8-btrfs-next-35+ #1 [ 643.414356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 643.414356] task: ffff880145e95b00 task.stack: ffff88014826c000 [ 643.414356] RIP: 0010:[] [] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP: 0018:ffff88014826fa28 EFLAGS: 00010292 [ 643.414356] RAX: 0000000000000039 RBX: ffff88014e2d7c38 RCX: 0000000000000001 [ 643.414356] RDX: ffff88023f4d2f58 RSI: ffffffff81806c63 RDI: 00000000ffffffff [ 643.414356] RBP: ffff88014826fa28 R08: 0000000000000001 R09: 0000000000000000 [ 643.414356] R10: ffff88014826f918 R11: ffffffff82f3c5ed R12: ffff880172910000 [ 643.414356] R13: ffff880233992230 R14: ffff8801a68a3310 R15: fffffffffffffff8 [ 643.414356] FS: 00007f9ca305e8c0(0000) GS:ffff88023f4c0000(0000) knlGS:0000000000000000 [ 643.414356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 643.414356] CR2: 00007f9ca3071000 CR3: 000000015d01b000 CR4: 00000000000006e0 [ 643.414356] Stack: [ 643.414356] ffff88014826fa50 ffffffffa02d655a 000000000000000a ffff88014e2d7c38 [ 643.414356] 0000000000000000 ffff88014826faa8 ffffffffa02b72f3 ffff88014826fab8 [ 643.414356] 00ffffffa03228e4 0000000000000000 0000000000000000 ffff8801bbd4e000 [ 643.414356] Call Trace: [ 643.414356] [] btrfs_mark_buffer_dirty+0xdf/0xe5 [btrfs] [ 643.414356] [] btrfs_copy_root+0x18a/0x1d1 [btrfs] [ 643.414356] [] create_reloc_root+0x72/0x1ba [btrfs] [ 643.414356] [] btrfs_init_reloc_root+0x7b/0xa7 [btrfs] [ 643.414356] [] record_root_in_trans+0xdf/0xed [btrfs] [ 643.414356] [] btrfs_record_root_in_trans+0x50/0x6a [btrfs] [ 643.414356] [] create_subvol+0x472/0x773 [btrfs] [ 643.414356] [] btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [] ? btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [] ? preempt_count_add+0x65/0x68 [ 643.414356] [] ? __mnt_want_write+0x62/0x77 [ 643.414356] [] btrfs_ioctl_snap_create_transid+0xce/0x187 [btrfs] [ 643.414356] [] btrfs_ioctl_snap_create+0x67/0x81 [btrfs] [ 643.414356] [] btrfs_ioctl+0x508/0x20dd [btrfs] [ 643.414356] [] ? __this_cpu_preempt_check+0x13/0x15 [ 643.414356] [] ? handle_mm_fault+0x976/0x9ab [ 643.414356] [] ? arch_local_irq_save+0x9/0xc [ 643.414356] [] vfs_ioctl+0x18/0x34 [ 643.414356] [] do_vfs_ioctl+0x581/0x600 [ 643.414356] [] ? entry_SYSCALL_64_fastpath+0x5/0xa8 [ 643.414356] [] ? trace_hardirqs_on_caller+0x17b/0x197 [ 643.414356] [] SyS_ioctl+0x57/0x79 [ 643.414356] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 643.414356] [] ? trace_hardirqs_off_caller+0x3f/0xaa [ 643.414356] Code: 89 83 88 00 00 00 31 c0 5b 41 5c 41 5d 5d c3 55 89 f1 48 c7 c2 98 bc 35 a0 48 89 fe 48 c7 c7 05 be 35 a0 48 89 e5 e8 13 46 dd e0 <0f> 0b 55 89 f1 48 c7 c2 9f d3 35 a0 48 89 fe 48 c7 c7 7a d5 35 [ 643.414356] RIP [] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP [ 643.468267] ---[ end trace 6a1b3fb1a9d7d6e3 ]--- This can be easily reproduced by running xfstests with the integrity checker enabled. Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item) Signed-off-by: Filipe Manana Reviewed-by: Liu Bo Signed-off-by: Greg Kroah-Hartman commit 0695d8b10a88b31fa316e8d2da5978c3416e5430 Author: David Sterba Date: Tue Nov 1 14:21:23 2016 +0100 btrfs: store and load values of stripes_min/stripes_max in balance status item commit ed0df618b1b06d7431ee4d985317fc5419a5d559 upstream. The balance status item contains currently known filter values, but the stripes filter was unintentionally not among them. This would mean, that interrupted and automatically restarted balance does not apply the stripe filters. Fixes: dee32d0ac3719ef8d640efaf0884111df444730f Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 80f7d2836e37b2c76aeedffdc2803f05450887b5 Author: Filipe Manana Date: Tue Nov 1 11:23:31 2016 +0000 Btrfs: fix relocation incorrectly dropping data references commit 054570a1dc94de20e7a612cddcc5a97db9c37b6f upstream. During relocation of a data block group we create a relocation tree for each fs/subvol tree by making a snapshot of each tree using btrfs_copy_root() and the tree's commit root, and then setting the last snapshot field for the fs/subvol tree's root to the value of the current transaction id minus 1. However this can lead to relocation later dropping references that it did not create if we have qgroups enabled, leaving the filesystem in an inconsistent state that keeps aborting transactions. Lets consider the following example to explain the problem, which requires qgroups to be enabled. We are relocating data block group Y, we have a subvolume with id 258 that has a root at level 1, that subvolume is used to store directory entries for snapshots and we are currently at transaction 3404. When committing transaction 3404, we have a pending snapshot and therefore we call btrfs_run_delayed_items() at transaction.c:create_pending_snapshot() in order to create its dentry at subvolume 258. This results in COWing leaf A from root 258 in order to add the dentry. Note that leaf A also contains file extent items referring to extents from some other block group X (we are currently relocating block group Y). Later on, still at create_pending_snapshot() we call qgroup_account_snapshot(), which switches the commit root for root 258 when it calls switch_commit_roots(), so now the COWed version of leaf A, lets call it leaf A', is accessible from the commit root of tree 258. At the end of qgroup_account_snapshot(), we call record_root_in_trans() with 258 as its argument, which results in btrfs_init_reloc_root() being called, which in turn calls relocation.c:create_reloc_root() in order to create a relocation tree associated to root 258, which results in assigning the value of 3403 (which is the current transaction id minus 1 = 3404 - 1) to the last_snapshot field of root 258. When creating the relocation tree root at ctree.c:btrfs_copy_root() we add a shared reference for leaf A', corresponding to the relocation tree's root, when we call btrfs_inc_ref() against the COWed root (a copy of the commit root from tree 258), which is at level 1. So at this point leaf A' has 2 references, one normal reference corresponding to root 258 and one shared reference corresponding to the root of the relocation tree. Transaction 3404 finishes its commit and transaction 3405 is started by relocation when calling merge_reloc_root() for the relocation tree associated to root 258. In the meanwhile leaf A' is COWed again, in response to some filesystem operation, when we are still at transaction 3405. However when we COW leaf A', at ctree.c:update_ref_for_cow(), we call btrfs_block_can_be_shared() in order to figure out if other trees refer to the leaf and if any such trees exists, add a full back reference to leaf A' - but btrfs_block_can_be_shared() incorrectly returns false because the following condition is false: btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) which evaluates to 3404 <= 3403. So after leaf A' is COWed, it stays with only one reference, corresponding to the shared reference we created when we called btrfs_copy_root() to create the relocation tree's root and btrfs_inc_ref() ends up not being called for leaf A' nor we end up setting the flag BTRFS_BLOCK_FLAG_FULL_BACKREF in leaf A'. This results in not adding shared references for the extents from block group X that leaf A' refers to with its file extent items. Later, after merging the relocation root we do a call to to btrfs_drop_snapshot() in order to delete the relocation tree. This ends up calling do_walk_down() when path->slots[1] points to leaf A', which results in calling btrfs_lookup_extent_info() to get the number of references for leaf A', which is 1 at this time (only the shared reference exists) and this value is stored at wc->refs[0]. After this walk_up_proc() is called when wc->level is 0 and path->nodes[0] corresponds to leaf A'. Because the current level is 0 and wc->refs[0] is 1, it does call btrfs_dec_ref() against leaf A', which results in removing the single references that the extents from block group X have which are associated to root 258 - the expectation was to have each of these extents with 2 references - one reference for root 258 and one shared reference related to the root of the relocation tree, and so we would drop only the shared reference (because leaf A' was supposed to have the flag BTRFS_BLOCK_FLAG_FULL_BACKREF set). This leaves the filesystem in an inconsistent state as we now have file extent items in a subvolume tree that point to extents from block group X without references in the extent tree. So later on when we try to decrement the references for these extents, for example due to a file unlink operation, truncate operation or overwriting ranges of a file, we fail because the expected references do not exist in the extent tree. This leads to warnings and transaction aborts like the following: [ 588.965795] ------------[ cut here ]------------ [ 588.965815] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x432/0x5b0 [btrfs] [ 588.965816] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg [ 588.965831] CPU: 2 PID: 2479 Comm: kworker/u8:7 Not tainted 4.7.3-3-default-fdm+ #1 [ 588.965832] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 588.965844] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 588.965845] 0000000000000000 ffff8802263bfa28 ffffffff813af542 0000000000000000 [ 588.965847] 0000000000000000 ffff8802263bfa68 ffffffff81081e8b 0000065900000000 [ 588.965848] ffff8801db2af000 000000012bbe2000 0000000000000000 ffff880215703b48 [ 588.965849] Call Trace: [ 588.965852] [] dump_stack+0x63/0x81 [ 588.965854] [] __warn+0xcb/0xf0 [ 588.965855] [] warn_slowpath_null+0x1d/0x20 [ 588.965863] [] lookup_inline_extent_backref+0x432/0x5b0 [btrfs] [ 588.965865] [] ? trace_clock_local+0x10/0x30 [ 588.965867] [] ? rb_reserve_next_event+0x6f/0x460 [ 588.965875] [] insert_inline_extent_backref+0x55/0xd0 [btrfs] [ 588.965882] [] __btrfs_inc_extent_ref.isra.55+0x8f/0x240 [btrfs] [ 588.965890] [] __btrfs_run_delayed_refs+0x74a/0x1260 [btrfs] [ 588.965892] [] ? cpuacct_charge+0x86/0xa0 [ 588.965900] [] btrfs_run_delayed_refs+0x9f/0x2c0 [btrfs] [ 588.965908] [] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 588.965918] [] btrfs_scrubparity_helper+0xca/0x350 [btrfs] [ 588.965928] [] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [ 588.965930] [] process_one_work+0x1f3/0x4e0 [ 588.965931] [] worker_thread+0x48/0x4e0 [ 588.965932] [] ? process_one_work+0x4e0/0x4e0 [ 588.965934] [] kthread+0xc9/0xe0 [ 588.965936] [] ret_from_fork+0x1f/0x40 [ 588.965937] [] ? kthread_worker_fn+0x170/0x170 [ 588.965938] ---[ end trace 34e5232c933a1749 ]--- [ 588.966187] ------------[ cut here ]------------ [ 588.966196] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:2966 btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [ 588.966196] BTRFS: Transaction aborted (error -5) [ 588.966197] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg [ 588.966206] CPU: 2 PID: 2479 Comm: kworker/u8:7 Tainted: G W 4.7.3-3-default-fdm+ #1 [ 588.966207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 588.966217] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 588.966217] 0000000000000000 ffff8802263bfc98 ffffffff813af542 ffff8802263bfce8 [ 588.966219] 0000000000000000 ffff8802263bfcd8 ffffffff81081e8b 00000b96345ee000 [ 588.966220] ffffffffa021ae1c ffff880215703b48 00000000000005fe ffff8802345ee000 [ 588.966221] Call Trace: [ 588.966223] [] dump_stack+0x63/0x81 [ 588.966224] [] __warn+0xcb/0xf0 [ 588.966225] [] warn_slowpath_fmt+0x4f/0x60 [ 588.966233] [] btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [ 588.966241] [] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 588.966250] [] btrfs_scrubparity_helper+0xca/0x350 [btrfs] [ 588.966259] [] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [ 588.966260] [] process_one_work+0x1f3/0x4e0 [ 588.966261] [] worker_thread+0x48/0x4e0 [ 588.966263] [] ? process_one_work+0x4e0/0x4e0 [ 588.966264] [] kthread+0xc9/0xe0 [ 588.966265] [] ret_from_fork+0x1f/0x40 [ 588.966267] [] ? kthread_worker_fn+0x170/0x170 [ 588.966268] ---[ end trace 34e5232c933a174a ]--- [ 588.966269] BTRFS: error (device sda2) in btrfs_run_delayed_refs:2966: errno=-5 IO failure [ 588.966270] BTRFS info (device sda2): forced readonly This was happening often on openSUSE and SLE systems using btrfs as the root filesystem (with its default layout where multiple subvolumes are used) where balance happens in the background triggered by a cron job and snapshots are automatically created before/after package installations, upgrades and removals. The issue could be triggered simply by running the following loop on the first system boot post installation: while true; do zypper -n in nfs-kernel-server zypper -n rm nfs-kernel-server done (If we were fast enough and made that loop before the cron job triggered a balance operation and the balance finished) So fix by setting the last_snapshot field of the root to the value of the generation of its commit root. Like this btrfs_block_can_be_shared() behaves correctly for the case where the relocation root is created during a transaction commit and for the case where it's created before a transaction commit. Fixes: 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting when creating snapshot) Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik Signed-off-by: Greg Kroah-Hartman commit f1b268d7a8e23c001e992681907fd1eacd53d15a Author: Robbie Ko Date: Fri Oct 7 17:30:47 2016 +0800 Btrfs: fix tree search logic when replaying directory entry deletes commit 2a7bf53f577e49c43de4ffa7776056de26db65d9 upstream. If a log tree has a layout like the following: leaf N: ... item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8 dir log end 1275809046 leaf N + 1: item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8 dir log end 18446744073709551615 ... When we pass the value 1275809046 + 1 as the parameter start_ret to the function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we end up with path->slots[0] having the value 239 (points to the last item of leaf N, item 240). Because the dir log item in that position has an offset value smaller than *start_ret (1275809046 + 1) we need to move on to the next leaf, however the logic for that is wrong since it compares the current slot to the number of items in the leaf, which is smaller and therefore we don't lookup for the next leaf but instead we set the slot to point to an item that does not exist, at slot 240, and we later operate on that slot which has unexpected content or in the worst case can result in an invalid memory access (accessing beyond the last page of leaf N's extent buffer). So fix the logic that checks when we need to lookup at the next leaf by first incrementing the slot and only after to check if that slot is beyond the last item of the current leaf. Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana Fixes: e02119d5a7b4 (Btrfs: Add a write ahead tree log to optimize synchronous operations) Signed-off-by: Filipe Manana [Modified changelog for clarity and correctness] Signed-off-by: Greg Kroah-Hartman commit 65553a02a3136fe351e218fe4da48f07e5450574 Author: Robbie Ko Date: Fri Oct 28 10:48:26 2016 +0800 Btrfs: fix deadlock caused by fsync when logging directory entries commit ec125cfb7ae2157af3dd45dd8abe823e3e233eec upstream. While logging new directory entries, at tree-log.c:log_new_dir_dentries(), after we call btrfs_search_forward() we get a leaf with a read lock on it, and without unlocking that leaf we can end up calling btrfs_iget() to get an inode pointer. The later (btrfs_iget()) can end up doing a read-only search on the same tree again, if the inode is not in memory already, which ends up causing a deadlock if some other task in the meanwhile started a write search on the tree and is attempting to write lock the same leaf that btrfs_search_forward() locked while holding write locks on upper levels of the tree blocking the read search from btrfs_iget(). In this scenario we get a deadlock. So fix this by releasing the search path before calling btrfs_iget() at tree-log.c:log_new_dir_dentries(). Example trace of such deadlock: [ 4077.478852] kworker/u24:10 D ffff88107fc90640 0 14431 2 0x00000000 [ 4077.486752] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 4077.494346] ffff880ffa56bad0 0000000000000046 0000000000009000 ffff880ffa56bfd8 [ 4077.502629] ffff880ffa56bfd8 ffff881016ce21c0 ffffffffa06ecb26 ffff88101a5d6138 [ 4077.510915] ffff880ebb5173b0 ffff880ffa56baf8 ffff880ebb517410 ffff881016ce21c0 [ 4077.519202] Call Trace: [ 4077.528752] [] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs] [ 4077.536049] [] ? wake_up_atomic_t+0x30/0x30 [ 4077.542574] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs] [ 4077.550171] [] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs] [ 4077.558252] [] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs] [ 4077.566140] [] ? add_delayed_data_ref+0xe2/0x150 [btrfs] [ 4077.573928] [] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs] [ 4077.582399] [] ? __set_extent_bit+0x4c0/0x5c0 [btrfs] [ 4077.589896] [] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs] [ 4077.599632] [] ? start_transaction+0x8d/0x470 [btrfs] [ 4077.607134] [] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs] [ 4077.615329] [] ? process_one_work+0x142/0x3d0 [ 4077.622043] [] ? worker_thread+0x109/0x3b0 [ 4077.628459] [] ? manage_workers.isra.26+0x270/0x270 [ 4077.635759] [] ? kthread+0xaf/0xc0 [ 4077.641404] [] ? kthread_create_on_node+0x110/0x110 [ 4077.648696] [] ? ret_from_fork+0x58/0x90 [ 4077.654926] [] ? kthread_create_on_node+0x110/0x110 [ 4078.358087] kworker/u24:15 D ffff88107fcd0640 0 14436 2 0x00000000 [ 4078.365981] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 4078.373574] ffff880ffa57fad0 0000000000000046 0000000000009000 ffff880ffa57ffd8 [ 4078.381864] ffff880ffa57ffd8 ffff88103004d0a0 ffffffffa06ecb26 ffff88101a5d6138 [ 4078.390163] ffff880fbeffc298 ffff880ffa57faf8 ffff880fbeffc2f8 ffff88103004d0a0 [ 4078.398466] Call Trace: [ 4078.408019] [] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs] [ 4078.415322] [] ? wake_up_atomic_t+0x30/0x30 [ 4078.421844] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs] [ 4078.429438] [] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs] [ 4078.437518] [] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs] [ 4078.445404] [] ? add_delayed_data_ref+0xe2/0x150 [btrfs] [ 4078.453194] [] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs] [ 4078.461663] [] ? __set_extent_bit+0x4c0/0x5c0 [btrfs] [ 4078.469161] [] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs] [ 4078.478893] [] ? start_transaction+0x8d/0x470 [btrfs] [ 4078.486388] [] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs] [ 4078.494561] [] ? process_one_work+0x142/0x3d0 [ 4078.501278] [] ? pwq_activate_delayed_work+0x27/0x40 [ 4078.508673] [] ? worker_thread+0x109/0x3b0 [ 4078.515098] [] ? manage_workers.isra.26+0x270/0x270 [ 4078.522396] [] ? kthread+0xaf/0xc0 [ 4078.528032] [] ? kthread_create_on_node+0x110/0x110 [ 4078.535325] [] ? ret_from_fork+0x58/0x90 [ 4078.541552] [] ? kthread_create_on_node+0x110/0x110 [ 4079.355824] user-space-program D ffff88107fd30640 0 32020 1 0x00000000 [ 4079.363716] ffff880eae8eba10 0000000000000086 0000000000009000 ffff880eae8ebfd8 [ 4079.372003] ffff880eae8ebfd8 ffff881016c162c0 ffffffffa06ecb26 ffff88101a5d6138 [ 4079.380294] ffff880fbed4b4c8 ffff880eae8eba38 ffff880fbed4b528 ffff881016c162c0 [ 4079.388586] Call Trace: [ 4079.398134] [] ? btrfs_tree_lock+0x85/0x2f0 [btrfs] [ 4079.405431] [] ? wake_up_atomic_t+0x30/0x30 [ 4079.411955] [] ? btrfs_lock_root_node+0x2b/0x40 [btrfs] [ 4079.419644] [] ? btrfs_search_slot+0xa03/0xb10 [btrfs] [ 4079.427237] [] ? btrfs_buffer_uptodate+0x52/0x70 [btrfs] [ 4079.435041] [] ? generic_bin_search.constprop.38+0x80/0x190 [btrfs] [ 4079.443897] [] ? btrfs_insert_empty_items+0x74/0xd0 [btrfs] [ 4079.451975] [] ? copy_items+0x128/0x850 [btrfs] [ 4079.458890] [] ? btrfs_log_inode+0x629/0xbf3 [btrfs] [ 4079.466292] [] ? btrfs_log_inode_parent+0xc61/0xf30 [btrfs] [ 4079.474373] [] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs] [ 4079.482161] [] ? btrfs_sync_file+0x20d/0x330 [btrfs] [ 4079.489558] [] ? do_fsync+0x4c/0x80 [ 4079.495300] [] ? SyS_fdatasync+0xa/0x10 [ 4079.501422] [] ? system_call_fastpath+0x16/0x1b [ 4079.508334] user-space-program D ffff88107fc30640 0 32021 1 0x00000004 [ 4079.516226] ffff880eae8efbf8 0000000000000086 0000000000009000 ffff880eae8effd8 [ 4079.524513] ffff880eae8effd8 ffff881030279610 ffffffffa06ecb26 ffff88101a5d6138 [ 4079.532802] ffff880ebb671d88 ffff880eae8efc20 ffff880ebb671de8 ffff881030279610 [ 4079.541092] Call Trace: [ 4079.550642] [] ? btrfs_tree_lock+0x85/0x2f0 [btrfs] [ 4079.557941] [] ? wake_up_atomic_t+0x30/0x30 [ 4079.564463] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs] [ 4079.572058] [] ? btrfs_truncate_inode_items+0x168/0xb90 [btrfs] [ 4079.580526] [] ? join_transaction.isra.15+0x1e/0x3a0 [btrfs] [ 4079.588701] [] ? start_transaction+0x8d/0x470 [btrfs] [ 4079.596196] [] ? block_rsv_add_bytes+0x16/0x50 [btrfs] [ 4079.603789] [] ? btrfs_truncate+0xe9/0x2e0 [btrfs] [ 4079.610994] [] ? btrfs_setattr+0x30b/0x410 [btrfs] [ 4079.618197] [] ? notify_change+0x1dc/0x680 [ 4079.624625] [] ? aa_path_perm+0xd4/0x160 [ 4079.630854] [] ? do_truncate+0x5b/0x90 [ 4079.636889] [] ? do_sys_ftruncate.constprop.15+0x10a/0x160 [ 4079.644869] [] ? SyS_fcntl+0x5b/0x570 [ 4079.650805] [] ? system_call_fastpath+0x16/0x1b [ 4080.410607] user-space-program D ffff88107fc70640 0 32028 12639 0x00000004 [ 4080.418489] ffff880eaeccbbe0 0000000000000086 0000000000009000 ffff880eaeccbfd8 [ 4080.426778] ffff880eaeccbfd8 ffff880f317ef1e0 ffffffffa06ecb26 ffff88101a5d6138 [ 4080.435067] ffff880ef7e93928 ffff880f317ef1e0 ffff880eaeccbc08 ffff880f317ef1e0 [ 4080.443353] Call Trace: [ 4080.452920] [] ? btrfs_tree_read_lock+0xdd/0x190 [btrfs] [ 4080.460703] [] ? wake_up_atomic_t+0x30/0x30 [ 4080.467225] [] ? btrfs_read_lock_root_node+0x2b/0x40 [btrfs] [ 4080.475400] [] ? btrfs_search_slot+0x801/0xb10 [btrfs] [ 4080.482994] [] ? btrfs_clean_one_deleted_snapshot+0xe0/0xe0 [btrfs] [ 4080.491857] [] ? btrfs_lookup_inode+0x26/0x90 [btrfs] [ 4080.499353] [] ? kmem_cache_alloc+0xaf/0xc0 [ 4080.505879] [] ? btrfs_iget+0xd5/0x5d0 [btrfs] [ 4080.512696] [] ? btrfs_get_token_64+0x104/0x120 [btrfs] [ 4080.520387] [] ? btrfs_log_inode_parent+0xbdf/0xf30 [btrfs] [ 4080.528469] [] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs] [ 4080.536258] [] ? btrfs_sync_file+0x20d/0x330 [btrfs] [ 4080.543657] [] ? do_fsync+0x4c/0x80 [ 4080.549399] [] ? SyS_fdatasync+0xa/0x10 [ 4080.555534] [] ? system_call_fastpath+0x16/0x1b Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana Fixes: 2f2ff0ee5e43 (Btrfs: fix metadata inconsistencies after directory fsync) Signed-off-by: Filipe Manana [Modified changelog for clarity and correctness] Signed-off-by: Greg Kroah-Hartman commit 361e82137a2d43e0ddc87012ff66b7b80ebe0d33 Author: Liu Bo Date: Fri Sep 2 12:35:34 2016 -0700 Btrfs: fix BUG_ON in btrfs_mark_buffer_dirty commit ef85b25e982b5bba1530b936e283ef129f02ab9d upstream. This can only happen with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y. Commit 1ba98d0 ("Btrfs: detect corruption when non-root leaf has zero item") assumes that a leaf is its root when leaf->bytenr == btrfs_root_bytenr(root), however, we should not use btrfs_root_bytenr(root) since it's mainly got updated during committing transaction. So the check can fail when doing COW on this leaf while it is a root. This changes to use "if (leaf == btrfs_root_node(root))" instead, just like how we check whether leaf is a root in __btrfs_cow_block(). Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item) Reported-by: Jeff Mahoney Signed-off-by: Liu Bo Reviewed-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit 562de9c7ce2400e495557f543395063f3d0a9ac3 Author: Maxim Patlasov Date: Mon Dec 12 14:32:44 2016 -0800 btrfs: limit async_work allocation and worker func duration commit 2939e1a86f758b55cdba73e29397dd3d94df13bc upstream. Problem statement: unprivileged user who has read-write access to more than one btrfs subvolume may easily consume all kernel memory (eventually triggering oom-killer). Reproducer (./mkrmdir below essentially loops over mkdir/rmdir): [root@kteam1 ~]# cat prep.sh DEV=/dev/sdb mkfs.btrfs -f $DEV mount $DEV /mnt for i in `seq 1 16` do mkdir /mnt/$i btrfs subvolume create /mnt/SV_$i ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2` mount -t btrfs -o subvolid=$ID $DEV /mnt/$i chmod a+rwx /mnt/$i done [root@kteam1 ~]# sh prep.sh [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0 kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0 kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0 kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0 The huge numbers above come from insane number of async_work-s allocated and queued by btrfs_wq_run_delayed_node. The problem is caused by btrfs_wq_run_delayed_node() queuing more and more works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The worker func (btrfs_async_run_delayed_root) processes at least BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery works as expected while the list is almost empty. As soon as it is getting bigger, worker func starts to process more than one item at a time, it takes longer, and the chances to have async_works queued more than needed is getting higher. The problem above is worsened by another flaw of delayed-inode implementation: if async_work was queued in a throttling branch (number of items >= BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that the func occupies CPU infinitely (up to 30sec in my experiments): while the func is trying to drain the list, the user activity may add more and more items to the list. The patch fixes both problems in straightforward way: refuse queuing too many works in btrfs_wq_run_delayed_node and bail out of worker func if at least BTRFS_DELAYED_WRITEBACK items are processed. Changed in v2: remove support of thresh == NO_THRESHOLD. Signed-off-by: Maxim Patlasov Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman commit f080d70948382b2d0c680c0c65dd518cd7cf0b14 Author: Jens Axboe Date: Fri Nov 11 18:28:50 2016 -0700 aoe: fix crash in page count manipulation commit 0cbc72a1781250f373327dd7e306e33859a42154 upstream. aoeblk contains some mysterious code, that wants to elevate the bio vec page counts while it's under IO. That is not needed, it's fragile, and it's causing kernel oopses for some. Reported-by: Tested-by: Don Koch Tested-by: Tested-by: Don Koch Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman