linux-toradex.git/drivers/char/ipmi, branch v4.9.91

ipmi/watchdog: fix wdog hang on panic waiting for ipmi response

2018-03-24T10:00:18+00:00

[ Upstream commit 2c1175c2e8e5487233cabde358a19577562ac83e ]

Commit c49c097610fe ("ipmi: Don't call receive handler in the
panic context") means that the panic_recv_free is not called during a
panic and the atomic count does not drop to 0.

Fix this by only expecting one decrement of the atomic variable
which comes from panic_smi_free.

Signed-off-by: Robert Lippert 
Signed-off-by: Corey Minyard 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

ipmi: Stop timers before cleaning up the module

2017-12-16T15:25:46+00:00

commit 4f7f5551a760eb0124267be65763008169db7087 upstream.

System may crash after unloading ipmi_si.ko module
because a timer may remain and fire after the module cleaned up resources.

cleanup_one_si() contains the following processing.

        /*
         * Make sure that interrupts, the timer and the thread are
         * stopped and will not run again.
         */
        if (to_clean->irq_cleanup)
                to_clean->irq_cleanup(to_clean);
        wait_for_timer_and_thread(to_clean);

        /*
         * Timeouts are stopped, now make sure the interrupts are off
         * in the BMC.  Note that timers and CPU interrupts are off,
         * so no need for locks.
         */
        while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
                poll(to_clean);
                schedule_timeout_uninterruptible(1);
        }

si_state changes as following in the while loop calling poll(to_clean).

  SI_GETTING_MESSAGES
    => SI_CHECKING_ENABLES
     => SI_SETTING_ENABLES
      => SI_GETTING_EVENTS
       => SI_NORMAL

As written in the code comments above,
timers are expected to stop before the polling loop and not to run again.
But the timer is set again in the following process
when si_state becomes SI_SETTING_ENABLES.

  => poll
     => smi_event_handler
       => handle_transaction_done
          // smi_info->si_state == SI_SETTING_ENABLES
         => start_getting_events
           => start_new_msg
            => smi_mod_timer
              => mod_timer

As a result, before the timer set in start_new_msg() expires,
the polling loop may see si_state becoming SI_NORMAL
and the module clean-up finishes.

For example, hard LOCKUP and panic occurred as following.
smi_timeout was called after smi_event_handler,
kcs_event and hangs at port_inb()
trying to access I/O port after release.

    [exception RIP: port_inb+19]
    RIP: ffffffffc0473053  RSP: ffff88069fdc3d80  RFLAGS: 00000006
    RAX: ffff8806800f8e00  RBX: ffff880682bd9400  RCX: 0000000000000000
    RDX: 0000000000000ca3  RSI: 0000000000000ca3  RDI: ffff8806800f8e40
    RBP: ffff88069fdc3d80   R8: ffffffff81d86dfc   R9: ffffffff81e36426
    R10: 00000000000509f0  R11: 0000000000100000  R12: 0000000000]:000000
    R13: 0000000000000000  R14: 0000000000000246  R15: ffff8806800f8e00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 ---  ---

To fix the problem I defined a flag, timer_can_start,
as member of struct smi_info.
The flag is enabled immediately after initializing the timer
and disabled immediately before waiting for timer deletion.

Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
Signed-off-by: Yamazaki Masamitsu 
[Adjusted for recent changes in the driver.]
[Some fairly major changes went into the IPMI driver in 4.15, so this
 required a backport as the code had changed and moved to a different
 file.  The 4.14 version of this patch moved some code under an
 if statement causing it to not apply to 4.7-4.13.]
Signed-off-by: Corey Minyard 
Signed-off-by: Greg Kroah-Hartman

ipmi: fix unsigned long underflow

2017-11-24T07:33:42+00:00

commit 392a17b10ec4320d3c0e96e2a23ebaad1123b989 upstream.

When I set the timeout to a specific value such as 500ms, the timeout
event will not happen in time due to the overflow in function
check_msg_timeout:
...
	ent->timeout -= timeout_period;
	if (ent->timeout > 0)
		return;
...

The type of timeout_period is long, but ent->timeout is unsigned long.
This patch makes the type consistent.

Reported-by: Weilong Chen 
Signed-off-by: Corey Minyard 
Tested-by: Weilong Chen 
Signed-off-by: Greg Kroah-Hartman

ipmi/watchdog: fix watchdog timeout set on reboot

2017-08-07T01:59:43+00:00

commit 860f01e96981a68553f3ca49f574ff14fe955e72 upstream.

systemd by default starts watchdog on reboot and sets the timer to
ShutdownWatchdogSec=10min.  Reboot handler in ipmi_watchdog than reduces
the timer to 120s which is not enough time to boot a Xen machine with
a lot of RAM.  As a result the machine is rebooted the second time
during the long run of (XEN) Scrubbing Free RAM.....

Fix this by setting the timer to 120s only if it was previously
set to a low value.

Signed-off-by: Valentin Vidic 
Signed-off-by: Corey Minyard 
Signed-off-by: Greg Kroah-Hartman

ipmi:ssif: Add missing unlock in error branch

2017-07-27T22:08:02+00:00

commit 4495ec6d770e1bca7a04e93ac453ab6720c56c5d upstream.

When getting flags, a response to a different message would
result in a deadlock because of a missing unlock.  Add that
unlock and a comment.  Found by static analysis.

Reported-by: Dan Carpenter 
Signed-off-by: Corey Minyard 
Signed-off-by: Greg Kroah-Hartman

ipmi: use rcu lock around call to intf->handlers->sender()

2017-07-27T22:08:02+00:00

commit cdea46566bb21ce309725a024208322a409055cc upstream.

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso 
Signed-off-by: Greg Kroah-Hartman 

This was first introduced in 7ea0ed2b5be817 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard

ipmi: Fix kernel panic at ipmi_ssif_thread()

2017-05-20T12:28:41+00:00

commit 6de65fcfdb51835789b245203d1bfc8d14cb1e06 upstream.

msg_written_handler() may set ssif_info->multi_data to NULL
when using ipmitool to write fru.

Before setting ssif_info->multi_data to NULL, add new local
pointer "data_to_send" and store correct i2c data pointer to
it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos.

Signed-off-by: Joeseph Chang 
Signed-off-by: Corey Minyard 
Signed-off-by: Greg Kroah-Hartman

ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'

2016-11-18T00:31:09+00:00

The Aspeed SoCs have two BT interfaces : one is IPMI compliant and the
other is H8S/2168 compliant.

The current ipmi/bt-bmc driver implements the IPMI version and we
should reflect its nature in the compatible node name using
'aspeed,ast2400-ibt-bmc' instead of 'aspeed,ast2400-bt-bmc'. The
latter should be used for a H8S interface driver if it is implemented
one day.

Signed-off-by: Cédric Le Goater 
Signed-off-by: Olof Johansson

ipmi: fix crash on reading version from proc after unregisted bmc

2016-10-03T14:09:47+00:00

I meet a crash, which could be reproduce:
1) while true; do cat /proc/ipmi/0/version; done
2) modprobe -rv ipmi_si ipmi_msghandler ipmi_devintf

[82761.021137] IPMI BT: req2rsp=5 secs retries=2
[82761.034524] ipmi device interface
[82761.222218] ipmi_si ipmi_si.0: Found new BMC (man_id: 0x0007db, prod_id: 0x0001, dev_id: 0x01)
[82761.222230] ipmi_si ipmi_si.0: IPMI bt interface initialized
[82903.922740] BUG: unable to handle kernel NULL pointer dereference at 00000000000002d4
[82903.930952] IP: [] smi_version_proc_show+0x18/0x40 [ipmi_msghandler]
[82903.939220] PGD 86693a067 PUD 865304067 PMD 0
[82903.943893] Thread overran stack, or stack corrupted
[82903.949034] Oops: 0000 [#1] SMP
[82903.983091] Modules linked in: ipmi_si(-) ipmi_msghandler binfmt_misc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
...
[82904.057285]  pps_core scsi_transport_sas dm_mod vfio_iommu_type1 vfio xt_sctp nf_conntrack_proto_sctp nf_nat_proto_sctp
                nf_nat nf_conntrack sctp libcrc32c [last unloaded: ipmi_devintf]
[82904.073169] CPU: 37 PID: 28089 Comm: cat Tainted: GF          O   ---- -------   3.10.0-327.28.3.el7.x86_64 #1
[82904.083373] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.22 05/16/2016
[82904.090592] task: ffff880101cc2e00 ti: ffff880369c54000 task.ti: ffff880369c54000
[82904.098414] RIP: 0010:[]  [] smi_version_proc_show+0x18/0x40 [ipmi_msghandler]
[82904.109124] RSP: 0018:ffff880369c57e70  EFLAGS: 00010203
[82904.114608] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000024688470
[82904.121912] RDX: fffffffffffffff4 RSI: ffffffffa0313404 RDI: ffff8808670ce200
[82904.129218] RBP: ffff880369c57e70 R08: 0000000000019720 R09: ffffffff81204a27
[82904.136521] R10: ffff88046f803300 R11: 0000000000000246 R12: ffff880662399700
[82904.143828] R13: 0000000000000001 R14: ffff880369c57f48 R15: ffff8808670ce200
[82904.151128] FS:  00007fb70c9ca740(0000) GS:ffff88086e340000(0000) knlGS:0000000000000000
[82904.159557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[82904.165473] CR2: 00000000000002d4 CR3: 0000000864c0c000 CR4: 00000000003407e0
[82904.172778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[82904.180084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[82904.187385] Stack:
[82904.189573]  ffff880369c57ee0 ffffffff81204f1a 00000000122a2427 0000000001426000
[82904.197392]  ffff8808670ce238 0000000000010000 0000000000000000 0000000000000fff
[82904.205198]  00000000122a2427 ffff880862079600 0000000001426000 ffff880369c57f48
[82904.212962] Call Trace:
[82904.219667]  [] seq_read+0xfa/0x3a0
[82904.224893]  [] proc_reg_read+0x3d/0x80
[82904.230468]  [] vfs_read+0x9c/0x170
[82904.235689]  [] SyS_read+0x7f/0xe0
[82904.240816]  [] system_call_fastpath+0x16/0x1b
[82904.246991] Code: 30 a0 e8 0c 6f ef e0 5b 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f
               44 00 00 48 8b 47 78 55 48 c7 c6 04 34 31 a0 48 89 e5 48 8b 40 50 <0f>
	       b6 90 d4 02 00 00 31 c0 89 d1 83 e2 0f c0 e9 04 0f b6 c9 e8
[82904.267710] RIP  [] smi_version_proc_show+0x18/0x40 [ipmi_msghandler]
[82904.276079]  RSP 
[82904.279734] CR2: 00000000000002d4
[82904.283731] ---[ end trace a69e4328b49dd7c4 ]---
[82904.328118] Kernel panic - not syncing: Fatal exception

Reading versin from /proc need bmc device struct available. So in this patch
we move add/remove_proc_entries between ipmi_bmc_register and ipmi_bmc_unregister.

Cc: Kefeng Wang 
Signed-off-by: Xie XiuQi 
Signed-off-by: Corey Minyard

ipmi/bt-bmc: remove redundant return value check of platform_get_resource()

2016-09-30T14:23:22+00:00

Remove unneeded error handling on the result of a call
to platform_get_resource() when the value is passed to
devm_ioremap_resource().

Signed-off-by: Wei Yongjun 
Acked-by: Cédric Le Goater 
Signed-off-by: Corey Minyard