【问题标题】:how to troubleshoot apache storm worker crash如何解决 apachestorm worker 崩溃问题
【发布时间】:2020-01-02 04:56:57
【问题描述】:

我在 Apache Storm 1.1.1 上运行了一个 python 代码(通过 streamparse),最近注意到 Storm 工作人员不断崩溃。以下是我从工作人员日志中找到的内容。我想不出什么是罪魁祸首,因为日志没有给我足够的线索。拓扑之前工作得很好。知道我还可以从哪里开始研究吗?

2019-08-28 15:05:32.947 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Launched subprocess with pid 10054
2019-08-28 15:05:32.951 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Opened spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Activating spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Start checking heartbeat...
2019-08-28 15:05:32.961 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Async loop died!
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
        at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
        ... 6 more
2019-08-28 15:05:32.968 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [ERROR]
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
        at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
        ... 6 more
2019-08-28 15:05:33.009 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
        at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
        at org.apache.storm.daemon.worker$fn__5632$fn__5633.invoke(worker.clj:763) [storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$mk_executor_data$fn__4848$fn__4849.invoke(executor.clj:276) [storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2019-08-28 15:05:33.018 o.a.s.d.worker Thread-16 [INFO] Shutting down worker tmon-4-1567019114 ba5b3695-b390-4c3e-9d92-af0771f17b86 6700

【问题讨论】:

    标签: python apache-storm streamparse


    【解决方案1】:

    每当我在外部进程螺栓(例如 Python 螺栓)中看到序列化程序异常时,我怀疑外部进程正在将某些内容打印到标准输出流。

    Storm 利用bolt 进程的stdin/stdout 进行自己的通信,Python bolts 中的任何日志记录都应该写入stderr 或文件。

    【讨论】:

    • 谢谢@Re'em!我查看了我们的 Python 代码,但没有看到任何打印语句。但是,您是对的,之前我认为这是网络问题,但后来我缩小到一个节点仍然看到错误,我也认为您所说的这些进程之间的 JSON 序列化有问题。我查看了 streamparse 日志和storm worker 日志,但仍然无法找出罪魁祸首。有关更多信息,它之前运行良好,然后我部署了我的更改并开始查看错误,但即使将其恢复回来,它仍然给出相同的错误,所以可能不是由于我的更改。
    猜你喜欢
    • 1970-01-01
    • 2012-02-13
    • 1970-01-01
    • 1970-01-01
    • 2018-09-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多