Skip to content
This repository was archived by the owner on Jan 10, 2023. It is now read-only.

Revert Patch: Noise at DSD Playback start.#2

Open
karan889 wants to merge 5872 commits intoprojectceladon:masterfrom
karan889:master
Open

Revert Patch: Noise at DSD Playback start.#2
karan889 wants to merge 5872 commits intoprojectceladon:masterfrom
karan889:master

Conversation

@karan889
Copy link
Contributor

This patch reverts the patch
"ALSA: usb-audio: Eliminate noise at the start of DSD
playback". This is causing a 5sec delay at the USB
audio reconnect. This issue is also seen for video
streaming.

Test: Connect USB Head-set and play the audio.
Disconnect and reconnect the Headset.
Check the audio at reconnect immdiately
playing the paused audio stream

Change-Id: I8e45d79e239aa65d3a6be7ada2b11bbc09271a87
Signed-off-by: Karan B karanx.bhagoji@intel.com
Tracked-On: https://jira01.devtools.intel.com/browse/OAM-63318

plbossart and others added 30 commits April 13, 2018 00:37
The code scans all components looking for the default name
i2c-10EC5670:00, which of course doesn't work in platforms
where the BIOS uses a different HID such as Dell 5585

Since we already have the correct information available, just
use the actual codec name and length.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit e45e391)
For now reason the current card name is a ridiculous
'cherrytrailcraudio'. This isn't very useful or self-explanatory,
change to driver name cht-bsw-rt5672.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 57e9e87)
Signed-off-by: Ryan Lee <ryans.lee@maximintegrated.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 4eee202)
Signed-off-by: Ryan Lee <ryans.lee@maximintegrated.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d4a8bce)
…OS driver

Remove dead code which was missed in previous changes. This is not
visible with git diff but there is a test+return on the same condition
just above.

Also reuse local variable instead of fetching the jack information twice.

Tested on Acer R11 (cyan)

Fixes: 3bbda5a ('ASoC: Intel: cht_bsw_max98090_ti: Fix jack initialization')
Cc: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 0087f20)
Return proper error instead of 0 if the revision does not match.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d6e18b8)
Move set_sysclk to codec level and people can use it at both
codec and dai level.

Signed-off-by: Bard Liao <bardliao@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit fe01e5e)
Move set_pll function to codec level and people can use it at both
codec and dai level.

Signed-off-by: Bard Liao <bardliao@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit c8a04b5)
Create a helper function to remove duplicate code used in machine drivers

Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 7c761b5)
Same as for other codecs, enable MCLK by default. When it is not
present, e.g. on MinnowBoard B3 since it's not routed on the LSE
connector, we fall back to blck-based clocking.

The DMIC quirks are also fixed, there is a single DMIC input of the
codec.

reorder variables in reverse x-mas tree as suggested by Andy

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 02c0a3b)
Reorder variable names, change MCLK test, change quirks
No functional change

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 17b5273)
Reorder variable names, change MCLK test, change quirks
No functional change

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 6cdf01a)
Reorder variable names, change tests
No functional change

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 22a317a)
get_codec_dai() is not used, remove it

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 8a2eca5)
Remove duplicate code with a common helper in all Intel machine drivers.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit dfb6ec7)
For some reason the Atom/HiFi2 machine drivers use an id=1 instead
of zero as done on all other platforms. This gets in the way of
topology-based matching, realign for consistency. This should
not have any functional impact on existing solutions with don't rely
on topology.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 149f775)
The hard-coded compressed dailinks are not supported using
publicly-available firmwares, which creates unnecessary user
confusion [1]. Even if the firmware was available, the mainline
code does not have the required .dynamic=1 and .dpcm_playback=1
fields so probably never worked as is, and last and they conflict
with topology-defined streams.

Remove them and move on. This can be re-enabled with SOF later
in a more flexible manner.

[1] http://mailman.alsa-project.org/pipermail/alsa-devel/2017-August/124868.html

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 2193eb9)
The pointer dma_dev_name is assigned but never read, it is redundant
and can therefore be removed.

Cleans up clang warning:
sound/soc/intel/common/sst-firmware.c:288:3: warning: Value stored to
'dma_dev_name' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 271ef65)
Introduce a new custom dapm routes map to quirk platforms with the
internal mic connected to IN2P.

Signed-off-by: Carlo Caione <carlo@endlessm.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 39712db)
ACPI support is not specific to the Intel/SST driver. Move the enumeration
and matching code which is not hardware-dependent to sound/soc and rename
relevant sst_acpi_ structures and functions with snd_soc_acpi_ prefix

soc-acpi.h is protected by a #ifndef __LINUX_SND_SOC_ACPI_H for
consistency with all other SoC .h files:

grep -L __LINUX include/sound/soc* | wc -l
0
grep __LINUX include/sound/soc* | wc -l
14

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 7feb2f7)
Before we add new fields for SOF support, move to C99 syntax
as done for atom/sst and legacy hsw/bdw code
No functional change

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 928c8a8)
To prepare for SOF integration, we need new fields in the machine table.
It is intended that the same table is used for both closed-source and
open-source firmware to avoid repeating ACPI-related information
multiple times
No functional change

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 976b5a0)
First step of cleaning, move all tables to soc-acpi-intel-match module.
The tables remain in separate files per platform to keep them
manageable. Skylake+ platforms are still handled elsewhere since
there is no conflict with SOF for now, but this will have to be
handled at a later point.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 6004640)
Prepare for SOF integration, no functional change

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit c1a8ed6)
split Kconfig to prepare for reuse of machine drivers for
SOF support
no functional change or edits

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit ea2851b)
This file is a mess, order by generation with more recent last

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d4d86b8)
Introduce more logical dependencies, with the SOC selected first and the
relevant machine drivers are exposed.
The same mechanism will be used for SOF support.

Also select SND_SOC_ACPI_INTEL_MATCH for all machine drivers

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit f6a118a)
Make all Intel audio drivers dependent on X86 to avoid compilation
errors for s390 and xtensa architectures.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 468663a)
MICBIAS widget type has been deprecated. Convert it to a SUPPLY widget.

Signed-off-by: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit be96fc5)
Enable jack detection for the RT5651 codec on the JD* pins.

Signed-off-by: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 80bbe4a)
swaroopbalan pushed a commit that referenced this pull request Aug 16, 2018
[ Upstream commit 4f4616c ]

Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted.  This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.

Fixes the crash:

PID: 106676  TASK: ffff9a436aa90000  CPU: 12  COMMAND: "multipathd"
 #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
 #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
 #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
    [exception RIP: qedf_init_task+61]
    RIP: ffffffffc0e13c2d  RSP: ffff9a43567d38d0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffbe920472c738  RCX: ffff9a434fa0e3e8
    RDX: ffff9a434f695280  RSI: ffffbe920472c738  RDI: ffff9a43aa359c80
    RBP: ffff9a43567d3950   R8: 0000000000000c15   R9: ffff9a3fb09b9880
    R10: ffff9a434fa0e3e8  R11: ffff9a43567d35ce  R12: 0000000000000000
    R13: ffff9a434f695280  R14: ffff9a43aa359c80  R15: ffff9a3fb9e005c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
swaroopbalan pushed a commit that referenced this pull request Aug 16, 2018
[ Upstream commit 7e2556e ]

syzbot found that the following sequence produces a LOCKDEP splat [1]

ip link add bond10 type bond
ip link add bond11 type bond
ip link set bond11 master bond10

To fix this, we can use the already provided nest_level.

This patch also provides correct nesting for dev->addr_list_lock

[1]
WARNING: possible recursive locking detected
4.18.0-rc6+ #167 Not tainted
--------------------------------------------
syz-executor751/4439 is trying to acquire lock:
(____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
(____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426

but task is already holding lock:
(____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
(____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&bond->stats_lock)->rlock);
  lock(&(&bond->stats_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by syz-executor751/4439:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
 #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215

stack backtrace:
CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
 spin_lock include/linux/spinlock.h:310 [inline]
 bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
 dev_get_stats+0x10f/0x470 net/core/dev.c:8316
 bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432
 dev_get_stats+0x10f/0x470 net/core/dev.c:8316
 rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169
 rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611
 rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268
 rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300
 rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline]
 rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 netdev_features_change net/core/dev.c:1321 [inline]
 netdev_change_features+0xb3/0x110 net/core/dev.c:7759
 bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120
 bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755
 bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528
 dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327
 dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493
 sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992
 sock_ioctl+0x30d/0x680 net/socket.c:1093
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440859
Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859
RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390
R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
swaroopbalan pushed a commit that referenced this pull request Aug 16, 2018
commit 89da619 upstream.

Kernel panic when with high memory pressure, calltrace looks like,

PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
 #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
 #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
 #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
 #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
    [exception RIP: _raw_spin_lock_irqsave+47]
    RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
    RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
    RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
    RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
    R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.

Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.

It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.

Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Huang Chong <huang.chong@zte.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Aug 30, 2018
commit 4e1a720 upstream.

slub debug reported:

[  440.648642] =============================================================================
[  440.648649] BUG kmalloc-1024 (Tainted: G    BU     O   ): Poison overwritten
[  440.648651] -----------------------------------------------------------------------------

[  440.648655] INFO: 0xe70f4bec-0xe70f4bec. First byte 0x6a instead of 0x6b
[  440.648665] INFO: Allocated in sk_prot_alloc+0x6b/0xc6 age=33155 cpu=1 pid=1047
[  440.648671] 	___slab_alloc.constprop.24+0x1fc/0x292
[  440.648675] 	__slab_alloc.isra.18.constprop.23+0x1c/0x25
[  440.648677] 	__kmalloc+0xb6/0x17f
[  440.648680] 	sk_prot_alloc+0x6b/0xc6
[  440.648683] 	sk_alloc+0x1e/0xa1
[  440.648700] 	sco_sock_alloc.constprop.6+0x26/0xaf [bluetooth]
[  440.648716] 	sco_connect_cfm+0x166/0x281 [bluetooth]
[  440.648731] 	hci_conn_request_evt.isra.53+0x258/0x281 [bluetooth]
[  440.648746] 	hci_event_packet+0x28b/0x2326 [bluetooth]
[  440.648759] 	hci_rx_work+0x161/0x291 [bluetooth]
[  440.648764] 	process_one_work+0x163/0x2b2
[  440.648767] 	worker_thread+0x1a9/0x25c
[  440.648770] 	kthread+0xf8/0xfd
[  440.648774] 	ret_from_fork+0x2e/0x38
[  440.648779] INFO: Freed in __sk_destruct+0xd3/0xdf age=3815 cpu=1 pid=1047
[  440.648782] 	__slab_free+0x4b/0x27a
[  440.648784] 	kfree+0x12e/0x155
[  440.648787] 	__sk_destruct+0xd3/0xdf
[  440.648790] 	sk_destruct+0x27/0x29
[  440.648793] 	__sk_free+0x75/0x91
[  440.648795] 	sk_free+0x1c/0x1e
[  440.648810] 	sco_sock_kill+0x5a/0x5f [bluetooth]
[  440.648825] 	sco_conn_del+0x8e/0xba [bluetooth]
[  440.648840] 	sco_disconn_cfm+0x3a/0x41 [bluetooth]
[  440.648855] 	hci_event_packet+0x45e/0x2326 [bluetooth]
[  440.648868] 	hci_rx_work+0x161/0x291 [bluetooth]
[  440.648872] 	process_one_work+0x163/0x2b2
[  440.648875] 	worker_thread+0x1a9/0x25c
[  440.648877] 	kthread+0xf8/0xfd
[  440.648880] 	ret_from_fork+0x2e/0x38
[  440.648884] INFO: Slab 0xf4718580 objects=27 used=27 fp=0x  (null) flags=0x40008100
[  440.648886] INFO: Object 0xe70f4b88 @offset=19336 fp=0xe70f54f8

When KASAN was enabled, it reported:

[  210.096613] ==================================================================
[  210.096634] BUG: KASAN: use-after-free in ex_handler_refcount+0x5b/0x127
[  210.096641] Write of size 4 at addr ffff880107e17160 by task kworker/u9:1/2040

[  210.096651] CPU: 1 PID: 2040 Comm: kworker/u9:1 Tainted: G     U     O    4.14.47-20180606+ #2
[  210.096654] Hardware name: , BIOS 2017.01-00087-g43e04de 08/30/2017
[  210.096693] Workqueue: hci0 hci_rx_work [bluetooth]
[  210.096698] Call Trace:
[  210.096711]  dump_stack+0x46/0x59
[  210.096722]  print_address_description+0x6b/0x23b
[  210.096729]  ? ex_handler_refcount+0x5b/0x127
[  210.096736]  kasan_report+0x220/0x246
[  210.096744]  ex_handler_refcount+0x5b/0x127
[  210.096751]  ? ex_handler_clear_fs+0x85/0x85
[  210.096757]  fixup_exception+0x8c/0x96
[  210.096766]  do_trap+0x66/0x2c1
[  210.096773]  do_error_trap+0x152/0x180
[  210.096781]  ? fixup_bug+0x78/0x78
[  210.096817]  ? hci_debugfs_create_conn+0x244/0x26a [bluetooth]
[  210.096824]  ? __schedule+0x113b/0x1453
[  210.096830]  ? sysctl_net_exit+0xe/0xe
[  210.096837]  ? __wake_up_common+0x343/0x343
[  210.096843]  ? insert_work+0x107/0x163
[  210.096850]  invalid_op+0x1b/0x40
[  210.096888] RIP: 0010:hci_debugfs_create_conn+0x244/0x26a [bluetooth]
[  210.096892] RSP: 0018:ffff880094a0f970 EFLAGS: 00010296
[  210.096898] RAX: 0000000000000000 RBX: ffff880107e170e8 RCX: ffff880107e17160
[  210.096902] RDX: 000000000000002f RSI: ffff88013b80ed40 RDI: ffffffffa058b940
[  210.096906] RBP: ffff88011b2b0578 R08: 00000000852f0ec9 R09: ffffffff81cfcf9b
[  210.096909] R10: 00000000d21bdad7 R11: 0000000000000001 R12: ffff8800967b0488
[  210.096913] R13: ffff880107e17168 R14: 0000000000000068 R15: ffff8800949c0008
[  210.096920]  ? __sk_destruct+0x2c6/0x2d4
[  210.096959]  hci_event_packet+0xff5/0x7de2 [bluetooth]
[  210.096969]  ? __local_bh_enable_ip+0x43/0x5b
[  210.097004]  ? l2cap_sock_recv_cb+0x158/0x166 [bluetooth]
[  210.097039]  ? hci_le_meta_evt+0x2bb3/0x2bb3 [bluetooth]
[  210.097075]  ? l2cap_ertm_init+0x94e/0x94e [bluetooth]
[  210.097093]  ? xhci_urb_enqueue+0xbd8/0xcf5 [xhci_hcd]
[  210.097102]  ? __accumulate_pelt_segments+0x24/0x33
[  210.097109]  ? __accumulate_pelt_segments+0x24/0x33
[  210.097115]  ? __update_load_avg_se.isra.2+0x217/0x3a4
[  210.097122]  ? set_next_entity+0x7c3/0x12cd
[  210.097128]  ? pick_next_entity+0x25e/0x26c
[  210.097135]  ? pick_next_task_fair+0x2ca/0xc1a
[  210.097141]  ? switch_mm_irqs_off+0x346/0xb4f
[  210.097147]  ? __switch_to+0x769/0xbc4
[  210.097153]  ? compat_start_thread+0x66/0x66
[  210.097188]  ? hci_conn_check_link_mode+0x1cd/0x1cd [bluetooth]
[  210.097195]  ? finish_task_switch+0x392/0x431
[  210.097228]  ? hci_rx_work+0x154/0x487 [bluetooth]
[  210.097260]  hci_rx_work+0x154/0x487 [bluetooth]
[  210.097269]  process_one_work+0x579/0x9e9
[  210.097277]  worker_thread+0x68f/0x804
[  210.097285]  kthread+0x31c/0x32b
[  210.097292]  ? rescuer_thread+0x70c/0x70c
[  210.097299]  ? kthread_create_on_node+0xa3/0xa3
[  210.097306]  ret_from_fork+0x35/0x40

[  210.097314] Allocated by task 2040:
[  210.097323]  kasan_kmalloc.part.1+0x51/0xc7
[  210.097328]  __kmalloc+0x17f/0x1b6
[  210.097335]  sk_prot_alloc+0xf2/0x1a3
[  210.097340]  sk_alloc+0x22/0x297
[  210.097375]  sco_sock_alloc.constprop.7+0x23/0x202 [bluetooth]
[  210.097410]  sco_connect_cfm+0x2d0/0x566 [bluetooth]
[  210.097443]  hci_conn_request_evt.isra.53+0x6d3/0x762 [bluetooth]
[  210.097476]  hci_event_packet+0x85e/0x7de2 [bluetooth]
[  210.097507]  hci_rx_work+0x154/0x487 [bluetooth]
[  210.097512]  process_one_work+0x579/0x9e9
[  210.097517]  worker_thread+0x68f/0x804
[  210.097523]  kthread+0x31c/0x32b
[  210.097529]  ret_from_fork+0x35/0x40

[  210.097533] Freed by task 2040:
[  210.097539]  kasan_slab_free+0xb3/0x15e
[  210.097544]  kfree+0x103/0x1a9
[  210.097549]  __sk_destruct+0x2c6/0x2d4
[  210.097584]  sco_conn_del.isra.1+0xba/0x10e [bluetooth]
[  210.097617]  hci_event_packet+0xff5/0x7de2 [bluetooth]
[  210.097648]  hci_rx_work+0x154/0x487 [bluetooth]
[  210.097653]  process_one_work+0x579/0x9e9
[  210.097658]  worker_thread+0x68f/0x804
[  210.097663]  kthread+0x31c/0x32b
[  210.097670]  ret_from_fork+0x35/0x40

[  210.097676] The buggy address belongs to the object at ffff880107e170e8
 which belongs to the cache kmalloc-1024 of size 1024
[  210.097681] The buggy address is located 120 bytes inside of
 1024-byte region [ffff880107e170e8, ffff880107e174e8)
[  210.097683] The buggy address belongs to the page:
[  210.097689] page:ffffea00041f8400 count:1 mapcount:0 mapping:          (null) index:0xffff880107e15b68 compound_mapcount: 0
[  210.110194] flags: 0x8000000000008100(slab|head)
[  210.115441] raw: 8000000000008100 0000000000000000 ffff880107e15b68 0000000100170016
[  210.115448] raw: ffffea0004a47620 ffffea0004b48e20 ffff88013b80ed40 0000000000000000
[  210.115451] page dumped because: kasan: bad access detected

[  210.115454] Memory state around the buggy address:
[  210.115460]  ffff880107e17000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  210.115465]  ffff880107e17080: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
[  210.115469] >ffff880107e17100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  210.115472]                                                        ^
[  210.115477]  ffff880107e17180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  210.115481]  ffff880107e17200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  210.115483] ==================================================================

And finally when BT_DBG() and ftrace was enabled it showed:

       <...>-14979 [001] ....   186.104191: sco_sock_kill <-sco_sock_close
       <...>-14979 [001] ....   186.104191: sco_sock_kill <-sco_sock_release
       <...>-14979 [001] ....   186.104192: sco_sock_kill: sk ef0497a0 state 9
       <...>-14979 [001] ....   186.104193: bt_sock_unlink <-sco_sock_kill
kworker/u9:2-792   [001] ....   186.104246: sco_sock_kill <-sco_conn_del
kworker/u9:2-792   [001] ....   186.104248: sco_sock_kill: sk ef0497a0 state 9
kworker/u9:2-792   [001] ....   186.104249: bt_sock_unlink <-sco_sock_kill
kworker/u9:2-792   [001] ....   186.104250: sco_sock_destruct <-__sk_destruct
kworker/u9:2-792   [001] ....   186.104250: sco_sock_destruct: sk ef0497a0
kworker/u9:2-792   [001] ....   186.104860: hci_conn_del <-hci_event_packet
kworker/u9:2-792   [001] ....   186.104864: hci_conn_del: hci0 hcon ef0484c0 handle 266

Only in the failed case, sco_sock_kill() gets called with the same sock
pointer two times. Add a check for SOCK_DEAD to avoid continue killing
a socket which has already been killed.

Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Sep 5, 2018
…sfers

[ Upstream commit d52e4d0 ]

This bug happens only when the UDC needs to sleep during usb_ep_dequeue,
as is the case for (at least) dwc3.

[  382.200896] BUG: scheduling while atomic: screen/1808/0x00000100
[  382.207124] 4 locks held by screen/1808:
[  382.211266]  #0:  (rcu_callback){....}, at: [<c10b4ff0>] rcu_process_callbacks+0x260/0x440
[  382.219949]  #1:  (rcu_read_lock_sched){....}, at: [<c1358ba0>] percpu_ref_switch_to_atomic_rcu+0xb0/0x130
[  382.230034]  #2:  (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>] free_ioctx_users+0x23/0xd0
[  382.230096]  #3:  (&(&ffs->eps_lock)->rlock){....}, at: [<f81e7710>] ffs_aio_cancel+0x20/0x60 [usb_f_fs]
[  382.230160] Modules linked in: usb_f_fs libcomposite configfs bnep btsdio bluetooth ecdh_generic brcmfmac brcmutil intel_powerclamp coretemp dwc3 kvm_intel ulpi udc_core kvm irqbypass crc32_pclmul crc32c_intel pcbc dwc3_pci aesni_intel aes_i586 crypto_simd cryptd ehci_pci ehci_hcd gpio_keys usbcore basincove_gpadc industrialio usb_common
[  382.230407] CPU: 1 PID: 1808 Comm: screen Not tainted 4.14.0-edison+ #117
[  382.230416] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
[  382.230425] Call Trace:
[  382.230438]  <SOFTIRQ>
[  382.230466]  dump_stack+0x47/0x62
[  382.230498]  __schedule_bug+0x61/0x80
[  382.230522]  __schedule+0x43/0x7a0
[  382.230587]  schedule+0x5f/0x70
[  382.230625]  dwc3_gadget_ep_dequeue+0x14c/0x270 [dwc3]
[  382.230669]  ? do_wait_intr_irq+0x70/0x70
[  382.230724]  usb_ep_dequeue+0x19/0x90 [udc_core]
[  382.230770]  ffs_aio_cancel+0x37/0x60 [usb_f_fs]
[  382.230798]  kiocb_cancel+0x31/0x40
[  382.230822]  free_ioctx_users+0x4d/0xd0
[  382.230858]  percpu_ref_switch_to_atomic_rcu+0x10a/0x130
[  382.230881]  ? percpu_ref_exit+0x40/0x40
[  382.230904]  rcu_process_callbacks+0x2b3/0x440
[  382.230965]  __do_softirq+0xf8/0x26b
[  382.231011]  ? __softirqentry_text_start+0x8/0x8
[  382.231033]  do_softirq_own_stack+0x22/0x30
[  382.231042]  </SOFTIRQ>
[  382.231071]  irq_exit+0x45/0xc0
[  382.231089]  smp_apic_timer_interrupt+0x13c/0x150
[  382.231118]  apic_timer_interrupt+0x35/0x3c
[  382.231132] EIP: __copy_user_ll+0xe2/0xf0
[  382.231142] EFLAGS: 00210293 CPU: 1
[  382.231154] EAX: bfd4508c EBX: 00000004 ECX: 00000003 EDX: f3d8fe50
[  382.231165] ESI: f3d8fe51 EDI: bfd4508d EBP: f3d8fe14 ESP: f3d8fe08
[  382.231176]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  382.231265]  core_sys_select+0x25f/0x320
[  382.231346]  ? __wake_up_common_lock+0x62/0x80
[  382.231399]  ? tty_ldisc_deref+0x13/0x20
[  382.231438]  ? ldsem_up_read+0x1b/0x40
[  382.231459]  ? tty_ldisc_deref+0x13/0x20
[  382.231479]  ? tty_write+0x29f/0x2e0
[  382.231514]  ? n_tty_ioctl+0xe0/0xe0
[  382.231541]  ? tty_write_unlock+0x30/0x30
[  382.231566]  ? __vfs_write+0x22/0x110
[  382.231604]  ? security_file_permission+0x2f/0xd0
[  382.231635]  ? rw_verify_area+0xac/0x120
[  382.231677]  ? vfs_write+0x103/0x180
[  382.231711]  SyS_select+0x87/0xc0
[  382.231739]  ? SyS_write+0x42/0x90
[  382.231781]  do_fast_syscall_32+0xd6/0x1a0
[  382.231836]  entry_SYSENTER_32+0x47/0x71
[  382.231848] EIP: 0xb7f75b05
[  382.231857] EFLAGS: 00000246 CPU: 1
[  382.231868] EAX: ffffffda EBX: 00000400 ECX: bfd4508c EDX: bfd4510c
[  382.231878] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfd45020
[  382.231889]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  382.232281] softirq: huh, entered softirq 9 RCU c10b4d90 with preempt_count 00000100, exited with 00000000?

Tested-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit a5ba1d9 upstream.

We have reports of the following crash:

    PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
    #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
    #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
    #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
    #3 [ffff88085c6db860] no_context at ffffffff81050b8f
    #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
    #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
    #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
    #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
    #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
    [exception RIP: uart_put_char+149]
    RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
    RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
    RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
    RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
    R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
    R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
    #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
    #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
    #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
    #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
    #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
    #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
    #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
    #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
    #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
    #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
    #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
    #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​

after slogging through some dissasembly:

ffffffff814b6720 <uart_put_char>:
ffffffff814b6720:	55                   	push   %rbp
ffffffff814b6721:	48 89 e5             	mov    %rsp,%rbp
ffffffff814b6724:	48 83 ec 20          	sub    $0x20,%rsp
ffffffff814b6728:	48 89 1c 24          	mov    %rbx,(%rsp)
ffffffff814b672c:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
ffffffff814b6731:	4c 89 6c 24 10       	mov    %r13,0x10(%rsp)
ffffffff814b6736:	4c 89 74 24 18       	mov    %r14,0x18(%rsp)
ffffffff814b673b:	e8 b0 8e 58 00       	callq  ffffffff81a3f5f0 <mcount>
ffffffff814b6740:	4c 8b a7 88 02 00 00 	mov    0x288(%rdi),%r12
ffffffff814b6747:	45 31 ed             	xor    %r13d,%r13d
ffffffff814b674a:	41 89 f6             	mov    %esi,%r14d
ffffffff814b674d:	49 83 bc 24 70 01 00 	cmpq   $0x0,0x170(%r12)
ffffffff814b6754:	00 00
ffffffff814b6756:	49 8b 9c 24 80 01 00 	mov    0x180(%r12),%rbx
ffffffff814b675d:	00
ffffffff814b675e:	74 2f                	je     ffffffff814b678f <uart_put_char+0x6f>
ffffffff814b6760:	48 89 df             	mov    %rbx,%rdi
ffffffff814b6763:	e8 a8 67 58 00       	callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
ffffffff814b6768:	41 8b 8c 24 78 01 00 	mov    0x178(%r12),%ecx
ffffffff814b676f:	00
ffffffff814b6770:	89 ca                	mov    %ecx,%edx
ffffffff814b6772:	f7 d2                	not    %edx
ffffffff814b6774:	41 03 94 24 7c 01 00 	add    0x17c(%r12),%edx
ffffffff814b677b:	00
ffffffff814b677c:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b6782:	75 23                	jne    ffffffff814b67a7 <uart_put_char+0x87>
ffffffff814b6784:	48 89 c6             	mov    %rax,%rsi
ffffffff814b6787:	48 89 df             	mov    %rbx,%rdi
ffffffff814b678a:	e8 e1 64 58 00       	callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
ffffffff814b678f:	44 89 e8             	mov    %r13d,%eax
ffffffff814b6792:	48 8b 1c 24          	mov    (%rsp),%rbx
ffffffff814b6796:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
ffffffff814b679b:	4c 8b 6c 24 10       	mov    0x10(%rsp),%r13
ffffffff814b67a0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff814b67a5:	c9                   	leaveq
ffffffff814b67a6:	c3                   	retq
ffffffff814b67a7:	49 8b 94 24 70 01 00 	mov    0x170(%r12),%rdx
ffffffff814b67ae:	00
ffffffff814b67af:	48 63 c9             	movslq %ecx,%rcx
ffffffff814b67b2:	41 b5 01             	mov    $0x1,%r13b
ffffffff814b67b5:	44 88 34 0a          	mov    %r14b,(%rdx,%rcx,1)
ffffffff814b67b9:	41 8b 94 24 78 01 00 	mov    0x178(%r12),%edx
ffffffff814b67c0:	00
ffffffff814b67c1:	83 c2 01             	add    $0x1,%edx
ffffffff814b67c4:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b67ca:	41 89 94 24 78 01 00 	mov    %edx,0x178(%r12)
ffffffff814b67d1:	00
ffffffff814b67d2:	eb b0                	jmp    ffffffff814b6784 <uart_put_char+0x64>
ffffffff814b67d4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff814b67db:	00 00 00 00 00

for our build, this is crashing at:

    circ->buf[circ->head] = c;

Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.

Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.

To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.

v2: switch to locking uport->lock on allocation/deallocation instead of
    locking the per-port mutex in uart_put_char. Note that since
    uport->lock is a spin lock, we have to switch the allocation to
    GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
    GFP_KERNEL

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit d1c392c upstream.

I hit the following splat in my tests:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
 do_idle+0x33/0x202
 cpu_startup_entry+0x61/0x63
 start_secondary+0x18e/0x1ed
 startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---

After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

  do_idle {

    [interrupts enabled]

    <interrupt> [interrupts disabled]
	TRACE_IRQS_OFF [lockdep says irqs off]
	[...]
	TRACE_IRQS_IRET
	    test if pt_regs say return to interrupts enabled [yes]
	    TRACE_IRQS_ON [lockdep says irqs are on]

	    <nmi>
		nmi_enter() {
		    printk_nmi_enter() [traced by ftrace]
		    [ hit ftrace breakpoint ]
		    <breakpoint exception>
			TRACE_IRQS_OFF [lockdep says irqs off]
			[...]
			TRACE_IRQS_IRET [return from breakpoint]
			   test if pt_regs say interrupts enabled [no]
			   [iret back to interrupt]
	   [iret back to code]

    tick_nohz_idle_enter() {

	lockdep_assert_irqs_enabled() [lockdep say no!]

Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.

Cc: stable@vger.kernel.org
Fixes: 42a0bb3 ("printk/nmi: generic solution for safe printk in NMI")
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
[ Upstream commit e04e7a7 ]

This patch fixes the race between netvsc_probe() and
rndis_set_subchannel(), which can cause a deadlock.

These are the related 3 paths which show the deadlock:

path #1:
    Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
    Call Trace:
     schedule
     schedule_preempt_disabled
     __mutex_lock
     __device_attach
     bus_probe_device
     device_add
     vmbus_device_register
     vmbus_onoffer
     vmbus_onmessage_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #2:
    schedule
     schedule_preempt_disabled
     __mutex_lock
     netvsc_probe
     vmbus_probe
     really_probe
     __driver_attach
     bus_for_each_dev
     driver_attach_async
     async_run_entry_fn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #3:
    Workqueue: events netvsc_subchan_work [hv_netvsc]
    Call Trace:
     schedule
     rndis_set_subchannel
     netvsc_subchan_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

Before path #1 finishes, path #2 can start to run, because just before
the "bus_probe_device(dev);" in device_add() in path #1, there is a line
"object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
immediately try to load hv_netvsc and hence path #2 can start to run.

Next, path #2 offloads the subchannal's initialization to a workqueue,
i.e. path #3, so we can end up in a deadlock situation like this:

Path #2 gets the device lock, and is trying to get the rtnl lock;
Path #3 gets the rtnl lock and is waiting for all the subchannel messages
to be processed;
Path #1 is trying to get the device lock, but since #2 is not releasing
the device lock, path #1 has to sleep; since the VMBus messages are
processed one by one, this means the sub-channel messages can't be
procedded, so #3 has to sleep with the rtnl lock held, and finally #2
has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.

With the patch, we can make sure #2 gets both the device lock and the
rtnl lock together, gets its job done, and releases the locks, so #1
and #3 will not be blocked for ever.

Fixes: 8195b13 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
[ Upstream commit 46b3722 ]

We occasionaly hit following assert failure in 'perf top', when processing the
/proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the 'perf top' startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it removes the
struct comm_str from the list, the thread 1 can find this object on the list
with refcnt equls to 0 and hit the assert.

This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
that already dropped the refcnt to 0. For the rest of the objects we take the
refcnt before comparing its name and release it afterwards with comm_str__put,
which can also release the object completely.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
[ Upstream commit b71c69c ]

Fixes this warning that was provoked by a pairing:

[60258.016221] WARNING: possible recursive locking detected
[60258.021558] 4.15.0-RD1812-BSP #1 Tainted: G           O
[60258.027146] --------------------------------------------
[60258.032464] kworker/u5:0/70 is trying to acquire lock:
[60258.037609]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<87759073>] bt_accept_enqueue+0x3c/0x74
[60258.046863]
[60258.046863] but task is already holding lock:
[60258.052704]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88
[60258.062905]
[60258.062905] other info that might help us debug this:
[60258.069441]  Possible unsafe locking scenario:
[60258.069441]
[60258.075368]        CPU0
[60258.077821]        ----
[60258.080272]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[60258.085510]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[60258.090748]
[60258.090748]  *** DEADLOCK ***
[60258.090748]
[60258.096676]  May be due to missing lock nesting notation
[60258.096676]
[60258.103472] 5 locks held by kworker/u5:0/70:
[60258.107747]  #0:  ((wq_completion)%shdev->name#2){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc
[60258.117263]  #1:  ((work_completion)(&hdev->rx_work)){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc
[60258.126942]  #2:  (&conn->chan_lock){+.+.}, at: [<7877c8c3>] l2cap_connect+0x80/0x4f8
[60258.134806]  #3:  (&chan->lock/2){+.+.}, at: [<2e16c724>] l2cap_connect+0x8c/0x4f8
[60258.142410]  #4:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88
[60258.153043]
[60258.153043] stack backtrace:
[60258.157413] CPU: 1 PID: 70 Comm: kworker/u5:0 Tainted: G           O     4.15.0-RD1812-BSP #1
[60258.165945] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[60258.172485] Workqueue: hci0 hci_rx_work
[60258.176331] Backtrace:
[60258.178797] [<8010c9fc>] (dump_backtrace) from [<8010ccbc>] (show_stack+0x18/0x1c)
[60258.186379]  r7:80e55fe4 r6:80e55fe4 r5:20050093 r4:00000000
[60258.192058] [<8010cca4>] (show_stack) from [<809864e8>] (dump_stack+0xb0/0xdc)
[60258.199301] [<80986438>] (dump_stack) from [<8016ecc8>] (__lock_acquire+0xffc/0x11d4)
[60258.207144]  r9:5e2bb019 r8:630f974c r7:ba8a5940 r6:ba8a5ed8 r5:815b5220 r4:80fa081c
[60258.214901] [<8016dccc>] (__lock_acquire) from [<8016f620>] (lock_acquire+0x78/0x98)
[60258.222655]  r10:00000040 r9:00000040 r8:808729f0 r7:00000001 r6:00000000 r5:60050013
[60258.230491]  r4:00000000
[60258.233045] [<8016f5a8>] (lock_acquire) from [<806ee974>] (lock_sock_nested+0x64/0x88)
[60258.240970]  r7:00000000 r6:b796e870 r5:00000001 r4:b796e800
[60258.246643] [<806ee910>] (lock_sock_nested) from [<808729f0>] (bt_accept_enqueue+0x3c/0x74)
[60258.255004]  r8:00000001 r7:ba7d3c00 r6:ba7d3ea4 r5:ba7d2000 r4:b796e800
[60258.261717] [<808729b4>] (bt_accept_enqueue) from [<808aa39c>] (l2cap_sock_new_connection_cb+0x68/0x88)
[60258.271117]  r5:b796e800 r4:ba7d2000
[60258.274708] [<808aa334>] (l2cap_sock_new_connection_cb) from [<808a294c>] (l2cap_connect+0x190/0x4f8)
[60258.283933]  r5:00000001 r4:ba6dce00
[60258.287524] [<808a27bc>] (l2cap_connect) from [<808a4a14>] (l2cap_recv_frame+0x744/0x2cf8)
[60258.295800]  r10:ba6dcf24 r9:00000004 r8:b78d8014 r7:00000004 r6:bb05d000 r5:00000004
[60258.303635]  r4:bb05d008
[60258.306183] [<808a42d0>] (l2cap_recv_frame) from [<808a7808>] (l2cap_recv_acldata+0x210/0x214)
[60258.314805]  r10:b78e7800 r9:bb05d960 r8:00000001 r7:bb05d000 r6:0000000c r5:b7957a80
[60258.322641]  r4:ba6dce00
[60258.325188] [<808a75f8>] (l2cap_recv_acldata) from [<8087630c>] (hci_rx_work+0x35c/0x4e8)
[60258.333374]  r6:80e5743c r5:bb05d7c8 r4:b7957a80
[60258.338004] [<80875fb0>] (hci_rx_work) from [<8013dc7c>] (process_one_work+0x1a4/0x4fc)
[60258.346018]  r10:00000001 r9:00000000 r8:baabfef8 r7:ba997500 r6:baaba800 r5:baaa5d00
[60258.353853]  r4:bb05d7c8
[60258.356401] [<8013dad8>] (process_one_work) from [<8013e028>] (worker_thread+0x54/0x5cc)
[60258.364503]  r10:baabe038 r9:baaba834 r8:80e05900 r7:00000088 r6:baaa5d18 r5:baaba800
[60258.372338]  r4:baaa5d00
[60258.374888] [<8013dfd4>] (worker_thread) from [<801448f8>] (kthread+0x134/0x160)
[60258.382295]  r10:ba8310b8 r9:bb07dbfc r8:8013dfd4 r7:baaa5d00 r6:00000000 r5:baaa8ac0
[60258.390130]  r4:ba831080
[60258.392682] [<801447c4>] (kthread) from [<801080b4>] (ret_from_fork+0x14/0x20)
[60258.399915]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801447c4
[60258.407751]  r4:baaa8ac0 r3:baabe000

Signed-off-by: Philipp Puschmann <pp@emlix.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit b45d71f upstream.

Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
> 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit 1816494 upstream.

This change has the following effects, in order of descreasing importance:

1) Prevent a stack buffer overflow

2) Do not append an unnecessary NULL to an anyway binary buffer, which
   is writing one byte past client_digest when caller is:
   chap_string_to_hex(client_digest, chap_r, strlen(chap_r));

The latter was found by KASAN (see below) when input value hes expected size
(32 hex chars), and further analysis revealed a stack buffer overflow can
happen when network-received value is longer, allowing an unauthenticated
remote attacker to smash up to 17 bytes after destination buffer (16 bytes
attacker-controlled and one null).  As switching to hex2bin requires
specifying destination buffer length, and does not internally append any null,
it solves both issues.

This addresses CVE-2018-14633.

Beyond this:

- Validate received value length and check hex2bin accepted the input, to log
  this rejection reason instead of just failing authentication.

- Only log received CHAP_R and CHAP_C values once they passed sanity checks.

==================================================================
BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021

CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G           O      4.17.8kasan.sess.connops+ #2
Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014
Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod]
Call Trace:
 dump_stack+0x71/0xac
 print_address_description+0x65/0x22e
 ? chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 kasan_report.cold.6+0x241/0x2fd
 chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod]
 ? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod]
 ? ftrace_caller_op_ptr+0xe/0xe
 ? __orc_find+0x6f/0xc0
 ? unwind_next_frame+0x231/0x850
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? ret_from_fork+0x35/0x40
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? deref_stack_reg+0xd0/0xd0
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? is_module_text_address+0xa/0x11
 ? kernel_text_address+0x4c/0x110
 ? __save_stack_trace+0x82/0x100
 ? ret_from_fork+0x35/0x40
 ? save_stack+0x8c/0xb0
 ? 0xffffffffc1660000
 ? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod]
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? process_one_work+0x35c/0x640
 ? worker_thread+0x66/0x5d0
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod]
 ? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod]
 chap_main_loop+0x172/0x570 [iscsi_target_mod]
 ? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod]
 ? rx_data+0xd6/0x120 [iscsi_target_mod]
 ? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod]
 ? cyc2ns_read_begin.part.2+0x90/0x90
 ? _raw_spin_lock_irqsave+0x25/0x50
 ? memcmp+0x45/0x70
 iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod]
 ? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod]
 ? del_timer+0xe0/0xe0
 ? memset+0x1f/0x40
 ? flush_sigqueue+0x29/0xd0
 iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod]
 ? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod]
 process_one_work+0x35c/0x640
 worker_thread+0x66/0x5d0
 ? flush_rcu_work+0x40/0x40
 kthread+0x1a0/0x1c0
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

The buggy address belongs to the page:
page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x17fffc000000000()
raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00
 ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00
>ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00
                                              ^
 ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2
 ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00
==================================================================

Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit 234b69e upstream.

While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
[203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
[203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
[203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
[203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
[203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
[203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
[203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
[203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
[203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
[203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775]  RSP <ffff88006ff4b818>

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
commit 3e1a127 upstream.

When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this by making it so that normal hotplug handling /only/ happens
so long as the GPU is currently awake without any pending runtime PM
requests. In the event that a hotplug occurs while the device is
suspending or resuming, we can simply defer our response until the GPU
is fully runtime resumed again.

Changes since v4:
- Use a new trick I came up with using pm_runtime_get() instead of the
  hackish junk we had before

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
[ Upstream commit 3795214 ]

Fixes the following splat during boot:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 128, pid: 77, name: kworker/2:1
4 locks held by kworker/2:1/77:
 #0: (ptrval) ((wq_completion)"events"){+.+.}, at: process_one_work+0x1fc/0x8fc
 #1: (ptrval) (deferred_probe_work){+.+.}, at: process_one_work+0x1fc/0x8fc
 #2: (ptrval) (&dev->mutex){....}, at: __device_attach+0x40/0x178
 #3: (ptrval) (msm_iommu_lock){....}, at: msm_iommu_add_device+0x28/0xcc
irq event stamp: 348
hardirqs last  enabled at (347): [<c049dc18>] kfree+0xe0/0x3c0
hardirqs last disabled at (348): [<c0c35cac>] _raw_spin_lock_irqsave+0x2c/0x68
softirqs last  enabled at (0): [<c0322fd8>] copy_process.part.5+0x280/0x1a68
softirqs last disabled at (0): [<00000000>]   (null)
Preemption disabled at:
[<00000000>]   (null)
CPU: 2 PID: 77 Comm: kworker/2:1 Not tainted 4.17.0-rc5-wt-ath-01075-gaca0516bb4cf #239
Hardware name: Generic DT based system
Workqueue: events deferred_probe_work_func
[<c0314e00>] (unwind_backtrace) from [<c030fc70>] (show_stack+0x20/0x24)
[<c030fc70>] (show_stack) from [<c0c16ad8>] (dump_stack+0xa0/0xcc)
[<c0c16ad8>] (dump_stack) from [<c035a978>] (___might_sleep+0x1f8/0x2d4)
ath10k_sdio mmc2:0001:1: Direct firmware load for ath10k/QCA9377/hw1.0/board-2.bin failed with error -2
[<c035a978>] (___might_sleep) from [<c035aac4>] (__might_sleep+0x70/0xa8)
[<c035aac4>] (__might_sleep) from [<c0c3066c>] (__mutex_lock+0x50/0xb28)
[<c0c3066c>] (__mutex_lock) from [<c0c31170>] (mutex_lock_nested+0x2c/0x34)
ath10k_sdio mmc2:0001:1: board_file api 1 bmi_id N/A crc32 544289f7
[<c0c31170>] (mutex_lock_nested) from [<c052d798>] (kernfs_find_and_get_ns+0x30/0x5c)
[<c052d798>] (kernfs_find_and_get_ns) from [<c0531cc8>] (sysfs_add_link_to_group+0x28/0x58)
[<c0531cc8>] (sysfs_add_link_to_group) from [<c07ef75c>] (iommu_device_link+0x50/0xb4)
[<c07ef75c>] (iommu_device_link) from [<c07f2288>] (msm_iommu_add_device+0xa0/0xcc)
[<c07f2288>] (msm_iommu_add_device) from [<c07ec6d0>] (add_iommu_group+0x3c/0x64)
[<c07ec6d0>] (add_iommu_group) from [<c07f9d40>] (bus_for_each_dev+0x84/0xc4)
[<c07f9d40>] (bus_for_each_dev) from [<c07ec7c8>] (bus_set_iommu+0xd0/0x10c)
[<c07ec7c8>] (bus_set_iommu) from [<c07f1a68>] (msm_iommu_probe+0x5b8/0x66c)
[<c07f1a68>] (msm_iommu_probe) from [<c07feaa8>] (platform_drv_probe+0x60/0xbc)
[<c07feaa8>] (platform_drv_probe) from [<c07fc1fc>] (driver_probe_device+0x30c/0x4cc)
[<c07fc1fc>] (driver_probe_device) from [<c07fc59c>] (__device_attach_driver+0xac/0x14c)
[<c07fc59c>] (__device_attach_driver) from [<c07f9e14>] (bus_for_each_drv+0x68/0xc8)
[<c07f9e14>] (bus_for_each_drv) from [<c07fbd3c>] (__device_attach+0xe4/0x178)
[<c07fbd3c>] (__device_attach) from [<c07fc698>] (device_initial_probe+0x1c/0x20)
[<c07fc698>] (device_initial_probe) from [<c07faee8>] (bus_probe_device+0x98/0xa0)
[<c07faee8>] (bus_probe_device) from [<c07fb4f4>] (deferred_probe_work_func+0x74/0x198)
[<c07fb4f4>] (deferred_probe_work_func) from [<c0348eb4>] (process_one_work+0x2c4/0x8fc)
[<c0348eb4>] (process_one_work) from [<c03497b0>] (worker_thread+0x2c4/0x5cc)
[<c03497b0>] (worker_thread) from [<c0350d10>] (kthread+0x180/0x188)
[<c0350d10>] (kthread) from [<c03010b4>] (ret_from_fork+0x14/0x20)

Fixes: 42df43b ("iommu/msm: Make use of iommu_device_register interface")
Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
[ Upstream commit 36d4cb4 ]

The approach for adding a device to the devices_idr data structure and for
removing it is as follows:

* &dev->dev_group.cg_item is initialized before a device is added to
  devices_idr.

* If the reference count of a device drops to zero then
  target_free_device() removes the device from devices_idr.

* All devices_idr manipulations are protected by device_mutex.

This means that increasing the reference count of a device is sufficient to
prevent removal from devices_idr and also that it is safe access
dev_group.cg_item for any device that is referenced by devices_idr. Use
this to modify target_find_device() and target_for_each_device() such that
these functions no longer introduce a dependency between device_mutex and
the configfs root inode mutex.

Note: it is safe to pass a NULL pointer to config_item_put() and also to
config_item_get_unless_zero().

This patch prevents that lockdep reports the following complaint:

======================================================
WARNING: possible circular locking dependency detected
4.12.0-rc1-dbg+ #1 Not tainted
------------------------------------------------------
rmdir/12053 is trying to acquire lock:
 (device_mutex#2){+.+.+.}, at: [<ffffffffa010afce>]
target_free_device+0xae/0xf0 [target_core_mod]

but task is already holding lock:
 (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c5c30>]
vfs_rmdir+0x50/0x140

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#14){++++++}:
       lock_acquire+0x59/0x80
       down_write+0x36/0x70
       configfs_depend_item+0x3a/0xb0 [configfs]
       target_depend_item+0x13/0x20 [target_core_mod]
       target_xcopy_locate_se_dev_e4_iter+0x87/0x100 [target_core_mod]
       target_devices_idr_iter+0x16/0x20 [target_core_mod]
       idr_for_each+0x39/0xc0
       target_for_each_device+0x36/0x50 [target_core_mod]
       target_xcopy_locate_se_dev_e4+0x28/0x80 [target_core_mod]
       target_xcopy_do_work+0x2e9/0xdd0 [target_core_mod]
       process_one_work+0x1ca/0x3f0
       worker_thread+0x49/0x3b0
       kthread+0x109/0x140
       ret_from_fork+0x31/0x40

-> #0 (device_mutex#2){+.+.+.}:
       __lock_acquire+0x101f/0x11d0
       lock_acquire+0x59/0x80
       __mutex_lock+0x7e/0x950
       mutex_lock_nested+0x16/0x20
       target_free_device+0xae/0xf0 [target_core_mod]
       target_core_dev_release+0x10/0x20 [target_core_mod]
       config_item_put+0x6e/0xb0 [configfs]
       configfs_rmdir+0x1a6/0x300 [configfs]
       vfs_rmdir+0xb7/0x140
       do_rmdir+0x1f4/0x200
       SyS_rmdir+0x11/0x20
       entry_SYSCALL_64_fastpath+0x23/0xc2

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key#14);
                               lock(device_mutex#2);
                               lock(&sb->s_type->i_mutex_key#14);
  lock(device_mutex#2);

 *** DEADLOCK ***

3 locks held by rmdir/12053:
 #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff811e223f>]
mnt_want_write+0x1f/0x50
 #1:  (&sb->s_type->i_mutex_key#14/1){+.+.+.}, at: [<ffffffff811cb97e>]
do_rmdir+0x15e/0x200
 #2:  (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c5c30>]
vfs_rmdir+0x50/0x140

stack backtrace:
CPU: 3 PID: 12053 Comm: rmdir Not tainted 4.12.0-rc1-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x86/0xcf
 print_circular_bug+0x1c7/0x220
 __lock_acquire+0x101f/0x11d0
 lock_acquire+0x59/0x80
 __mutex_lock+0x7e/0x950
 mutex_lock_nested+0x16/0x20
 target_free_device+0xae/0xf0 [target_core_mod]
 target_core_dev_release+0x10/0x20 [target_core_mod]
 config_item_put+0x6e/0xb0 [configfs]
 configfs_rmdir+0x1a6/0x300 [configfs]
 vfs_rmdir+0xb7/0x140
 do_rmdir+0x1f4/0x200
 SyS_rmdir+0x11/0x20
 entry_SYSCALL_64_fastpath+0x23/0xc2

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
[Rebased to handle conflict withe target_find_device removal]
Signed-off-by: Mike Christie <mchristi@redhat.com>

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sgnanase pushed a commit that referenced this pull request Oct 17, 2018
Lockdep is reporting a possible circular locking dependency:

 ======================================================
 WARNING: possible circular locking dependency detected
 4.18.0-rc1-test-test+ #4 Not tainted
 ------------------------------------------------------
 user_example/766 is trying to acquire lock:
 0000000073479a0f (rdtgroup_mutex){+.+.}, at: pseudo_lock_dev_mmap

 but task is already holding lock:
 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&mm->mmap_sem){++++}:
        _copy_to_user+0x1e/0x70
        filldir+0x91/0x100
        dcache_readdir+0x54/0x160
        iterate_dir+0x142/0x190
        __x64_sys_getdents+0xb9/0x170
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 -> #1 (&sb->s_type->i_mutex_key#3){++++}:
        start_creating+0x60/0x100
        debugfs_create_dir+0xc/0xc0
        rdtgroup_pseudo_lock_create+0x217/0x4d0
        rdtgroup_schemata_write+0x313/0x3d0
        kernfs_fop_write+0xf0/0x1a0
        __vfs_write+0x36/0x190
        vfs_write+0xb7/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 -> #0 (rdtgroup_mutex){+.+.}:
        __mutex_lock+0x80/0x9b0
        pseudo_lock_dev_mmap+0x2f/0x170
        mmap_region+0x3d6/0x610
        do_mmap+0x387/0x580
        vm_mmap_pgoff+0xcf/0x110
        ksys_mmap_pgoff+0x170/0x1f0
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 other info that might help us debug this:

 Chain exists of:
   rdtgroup_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&mm->mmap_sem);
                                lock(&sb->s_type->i_mutex_key#3);
                                lock(&mm->mmap_sem);
   lock(rdtgroup_mutex);

  *** DEADLOCK ***

 1 lock held by user_example/766:
  #0: 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x110

rdtgroup_mutex is already being released temporarily during pseudo-lock
region creation to prevent the potential deadlock between rdtgroup_mutex
and mm->mmap_sem that is obtained during device_create(). Move the
debugfs creation into this area to avoid the same circular dependency.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fffb57f9c6b8285904c9a60cc91ce21591af17fe.1531332480.git.reinette.chatre@intel.com

(cherry picked from commit 2989360)
sgnanase pushed a commit that referenced this pull request Nov 14, 2018
If I attach a vfio-ccw device to my guest, I get the following warning
on the host when the host kernel is CONFIG_HARDENED_USERCOPY=y

[250757.595325] Bad or missing usercopy whitelist? Kernel memory overwrite attempt detected to SLUB object 'dma-kmalloc-512' (offset 64, size 124)!
[250757.595365] WARNING: CPU: 2 PID: 10958 at mm/usercopy.c:81 usercopy_warn+0xac/0xd8
[250757.595369] Modules linked in: kvm vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c devlink tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sunrpc dm_multipath s390_trng crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha1_s390 eadm_sch tape_3590 tape tape_class qeth_l2 qeth ccwgroup vfio_ccw vfio_mdev zcrypt_cex4 mdev vfio_iommu_type1 zcrypt vfio sha256_s390 sha_common zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod
[250757.595424] CPU: 2 PID: 10958 Comm: CPU 2/KVM Not tainted 4.18.0-derp #2
[250757.595426] Hardware name: IBM 3906 M05 780 (LPAR)
...snip regs...
[250757.595523] Call Trace:
[250757.595529] ([<0000000000349210>] usercopy_warn+0xa8/0xd8)
[250757.595535]  [<000000000032daaa>] __check_heap_object+0xfa/0x160
[250757.595540]  [<0000000000349396>] __check_object_size+0x156/0x1d0
[250757.595547]  [<000003ff80332d04>] vfio_ccw_mdev_write+0x74/0x148 [vfio_ccw]
[250757.595552]  [<000000000034ed12>] __vfs_write+0x3a/0x188
[250757.595556]  [<000000000034f040>] vfs_write+0xa8/0x1b8
[250757.595559]  [<000000000034f4e6>] ksys_pwrite64+0x86/0xc0
[250757.595568]  [<00000000008959a0>] system_call+0xdc/0x2b0
[250757.595570] Last Breaking-Event-Address:
[250757.595573]  [<0000000000349210>] usercopy_warn+0xa8/0xd8

While vfio_ccw_mdev_{write|read} validates that the input position/count
does not run over the ccw_io_region struct, the usercopy code that does
copy_{to|from}_user doesn't necessarily know this. It sees the variable
length and gets worried that it's affecting a normal kmalloc'd struct,
and generates the above warning.

Adjust how the ccw_io_region is alloc'd with a whitelist to remove this
warning. The boundary checking will continue to do its thing.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20180921204013.95804-3-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
sgnanase pushed a commit that referenced this pull request Nov 14, 2018
Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x00005555557bdc4a in callchain_srcline (ip=<error reading variable: Cannot access memory at address 0x38>, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
  #2  0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 <unwind_entry>, thread=<optimized out>, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 <unwind_entry>, arg=0x7ffff5642498, max_stack=<optimized out>) at util/unwind-libunwind-local.c:703
  #4  0x0000555555837192 in _unwind__get_entries (cb=<optimized out>, arg=<optimized out>, thread=<optimized out>, data=<optimized out>, max_stack=<optimized out>) at util/unwind-libunwind-local.c:725
  #5  0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
  #7  0x00005555557ba4ee in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fffffff97b8, evsel=<optimized out>, al=al@entry=0x7fffffff9750,
      max_stack=<optimized out>) at util/callchain.c:1085

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 2a9d505 ("perf script: Show correct offsets for DWARF-based unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
sgnanase pushed a commit that referenced this pull request Nov 14, 2018
…inux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm fixes for 4.19, take #2

- Correctly order GICv3 SGI registers in the cp15 array
sgnanase pushed a commit that referenced this pull request Nov 14, 2018
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:

  #0  0x0000555555895c01 in bfd_demangle ()
  #1  0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
  #2  dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
  #3  0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
  #4  inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
  #5  0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
      line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
  #6  0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
      at util/srcline.c:358

So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:

  $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
  main
  /usr/include/c++/8.2.1/complex:610
  ??
  /usr/include/c++/8.2.1/complex:618
  ??
  /usr/include/c++/8.2.1/complex:675
  ??
  /usr/include/c++/8.2.1/complex:685
  main
  /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there which
should fix this issue:

https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a64489c ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
  introduced, i.e.  using a2l->funcname without checking if it is NULL,
  but this current patch fixes the current codebase, i.e. multiple csets
  were applied after a64489c before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
sgnanase pushed a commit that referenced this pull request Nov 14, 2018
Calling dm_setup_drives() before the completion of raw block
device enumeration leads to "device lookup failed" error in
dm_table_add_target() function, which in turn leads to incomplete device
mapper configuration for the given block device. Without proper DM
setup, if we try to mount the DM device, it leads to following kernel
panic in mount_root() function due to invalid DM queue setup. This patch
fixes this issue by adding a logic to wait for block device enumeration
to complete before performing the verity checks.

[    0.875069] device-mapper: table: 253:0: verity: Data device lookup failed
[    0.883103] BUG: unable to handle kernel NULL pointer dereference at (null)
[    0.891854] IP:           (null)
[    0.895453] PGD 0
[    0.895454] P4D 0
[    0.897686]
[    0.901574] Oops: 0010 [#1] PREEMPT SMP
[    0.905856] Modules linked in:
[    0.909266] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G     U  W 4.13.0-rc4-quilt-2e5dc0ac #2
[    0.919083] task: ffff968eb6a06040 task.stack: ffff9b9f40010000
[    0.925698] RIP: 0010:          (null)
[    0.929880] RSP: 0000:ffff9b9f40013a30 EFLAGS: 00010246
[    0.935714] RAX: 0000000000000000 RBX: ffff968eb63620c0 RCX: 0000000000000000
[    0.943674] RDX: 0000000000000000 RSI: ffff968eb63620c0 RDI: ffff968eb4a55370
[    0.951649] RBP: ffff9b9f40013a88 R08: 0000000000000001 R09: ffff968eb6362130
[    0.959623] R10: fffff5b2c9d28d00 R11: ffff968eb51fbec0 R12: ffff968eb4a55370
[    0.967586] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
[    0.975557] FS:  0000000000000000(0000) GS:ffff968ebfc80000(0000) knlGS:0000000000000000
[    0.984588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.990997] CR2: 0000000000000000 CR3: 0000000185212000 CR4: 00000000003406e0
[    0.998958] Call Trace:
[    1.001692]  ? generic_make_request+0x122/0x320
[    1.006753]  submit_bio+0x73/0x160
[    1.010550]  submit_bh_wbc.isra.44+0x113/0x140
[    1.015518]  __bread_gfp+0x67/0x120
[    1.019414]  ext4_fill_super+0x184/0x3880
[    1.023891]  ? vsnprintf+0x201/0x490
[    1.027885]  ? set_bdev_super+0x30/0x30
[    1.032170]  ? snprintf+0x43/0x60
[    1.035869]  mount_bdev+0x17d/0x1b0
[    1.039762]  ? ext4_calculate_overhead+0x430/0x430
[    1.045113]  ext4_mount+0x15/0x20
[    1.048813]  mount_fs+0x153/0x180
[    1.052513]  vfs_kern_mount+0x90/0x180
[    1.056699]  do_mount+0x1e0/0xd00
[    1.060397]  ? _copy_from_user+0x60/0xb0
[    1.064780]  ? memdup_user+0x53/0x80
[    1.068770]  SyS_mount+0x94/0xd0
[    1.072378]  mount_block_root+0x105/0x2c4
[    1.076857]  mount_root+0x6d/0x71
[    1.080555]  prepare_namespace+0x172/0x19f
[    1.085127]  kernel_init_freeable+0x21f/0x243
[    1.089990]  ? rest_init+0xd0/0xd0
[    1.093777]  kernel_init+0xe/0x100
[    1.097575]  ret_from_fork+0x27/0x40
[    1.101563] Code:  Bad RIP value.
[    1.105266] RIP:           (null) RSP: ffff9b9f40013a30
[    1.111089] CR2: 0000000000000000
[    1.114789] ---[ end trace 7a237ef06917bfe3 ]---
[    1.123581] Kernel panic - not syncing: Fatal exception
[    1.129510] reboot: panic mode set: p,w
[    1.133784] Kernel Offset: 0x25000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: e97f06fcd04b ("CHROMIUM: dm: boot time specification of dm=")
Change-Id: I48ac7fc8ac752b4ebac25b8b21267f9ff9ba07bd
Signed-off-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@intel.com>
sgnanase pushed a commit that referenced this pull request Nov 15, 2018
Calling dm_setup_drives() before the completion of raw block
device enumeration leads to "device lookup failed" error in
dm_table_add_target() function, which in turn leads to incomplete device
mapper configuration for the given block device. Without proper DM
setup, if we try to mount the DM device, it leads to following kernel
panic in mount_root() function due to invalid DM queue setup. This patch
fixes this issue by adding a logic to wait for block device enumeration
to complete before performing the verity checks.

[    0.875069] device-mapper: table: 253:0: verity: Data device lookup failed
[    0.883103] BUG: unable to handle kernel NULL pointer dereference at (null)
[    0.891854] IP:           (null)
[    0.895453] PGD 0
[    0.895454] P4D 0
[    0.897686]
[    0.901574] Oops: 0010 [#1] PREEMPT SMP
[    0.905856] Modules linked in:
[    0.909266] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G     U  W 4.13.0-rc4-quilt-2e5dc0ac #2
[    0.919083] task: ffff968eb6a06040 task.stack: ffff9b9f40010000
[    0.925698] RIP: 0010:          (null)
[    0.929880] RSP: 0000:ffff9b9f40013a30 EFLAGS: 00010246
[    0.935714] RAX: 0000000000000000 RBX: ffff968eb63620c0 RCX: 0000000000000000
[    0.943674] RDX: 0000000000000000 RSI: ffff968eb63620c0 RDI: ffff968eb4a55370
[    0.951649] RBP: ffff9b9f40013a88 R08: 0000000000000001 R09: ffff968eb6362130
[    0.959623] R10: fffff5b2c9d28d00 R11: ffff968eb51fbec0 R12: ffff968eb4a55370
[    0.967586] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
[    0.975557] FS:  0000000000000000(0000) GS:ffff968ebfc80000(0000) knlGS:0000000000000000
[    0.984588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.990997] CR2: 0000000000000000 CR3: 0000000185212000 CR4: 00000000003406e0
[    0.998958] Call Trace:
[    1.001692]  ? generic_make_request+0x122/0x320
[    1.006753]  submit_bio+0x73/0x160
[    1.010550]  submit_bh_wbc.isra.44+0x113/0x140
[    1.015518]  __bread_gfp+0x67/0x120
[    1.019414]  ext4_fill_super+0x184/0x3880
[    1.023891]  ? vsnprintf+0x201/0x490
[    1.027885]  ? set_bdev_super+0x30/0x30
[    1.032170]  ? snprintf+0x43/0x60
[    1.035869]  mount_bdev+0x17d/0x1b0
[    1.039762]  ? ext4_calculate_overhead+0x430/0x430
[    1.045113]  ext4_mount+0x15/0x20
[    1.048813]  mount_fs+0x153/0x180
[    1.052513]  vfs_kern_mount+0x90/0x180
[    1.056699]  do_mount+0x1e0/0xd00
[    1.060397]  ? _copy_from_user+0x60/0xb0
[    1.064780]  ? memdup_user+0x53/0x80
[    1.068770]  SyS_mount+0x94/0xd0
[    1.072378]  mount_block_root+0x105/0x2c4
[    1.076857]  mount_root+0x6d/0x71
[    1.080555]  prepare_namespace+0x172/0x19f
[    1.085127]  kernel_init_freeable+0x21f/0x243
[    1.089990]  ? rest_init+0xd0/0xd0
[    1.093777]  kernel_init+0xe/0x100
[    1.097575]  ret_from_fork+0x27/0x40
[    1.101563] Code:  Bad RIP value.
[    1.105266] RIP:           (null) RSP: ffff9b9f40013a30
[    1.111089] CR2: 0000000000000000
[    1.114789] ---[ end trace 7a237ef06917bfe3 ]---
[    1.123581] Kernel panic - not syncing: Fatal exception
[    1.129510] reboot: panic mode set: p,w
[    1.133784] Kernel Offset: 0x25000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: e97f06fcd04b ("CHROMIUM: dm: boot time specification of dm=")
Change-Id: I48ac7fc8ac752b4ebac25b8b21267f9ff9ba07bd
Signed-off-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.