【问题标题】:Splunk: How to Compute Incident Duration Records?Splunk:如何计算事件持续时间记录?
【发布时间】:2020-09-03 16:38:57
【问题描述】:

我在 Splunk 中有以下事件:

_time                           Agent_Hostname      alarm               status
2020-08-23T03:04:05.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:08:17.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:09:24.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:10:31.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:10:32.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:11:12.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:19:13.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:20:10.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:21:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:22:22.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:23:29.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:25:09.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:25:58.000-0700    m50-ups.a_domain    upsAlarmOnBypass    cleared

我的问题是如何计算每个主机和每种警报类型的事件持续时间记录,例如, 从上述事件中,我将通过算法获得以下内容,而不仅仅是硬编码特定示例中的值:

start                        end                          Agent_Hostname   alarm
2020-08-23T03:04:05.000-0700 2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass
2020-08-23T03:07:16.000-0700                              m50-ups.a_domain upsTrapOnBattery
2020-08-23T03:07:16.000-0700 2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad
2020-08-23T03:07:39.000-0700 2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery

其中 start 是主机警报首次发出的最早时间,并且 end 是清除同一告警/主机的时间。

我的第二个问题是如何在那些封闭的跨度中找到最大的持续时间跨度,忽略那些没有结束时间的。

我的问题是如何在 Splunk 的框架内实现?

【问题讨论】:

    标签: splunk splunk-query splunk-formula splunk-calculation


    【解决方案1】:

    transaction 命令可以处理大部分情况。唯一我做不到的就是显示未完成的警报。

    | makeresults 
    | eval _raw="time                            Agent_Hostname      alarm               status
    2020-08-23T03:04:05.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
    2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
    2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
    2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
    2020-08-23T03:08:17.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:09:24.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:10:31.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
    2020-08-23T03:10:32.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
    2020-08-23T03:11:12.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
    2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
    2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:19:13.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
    2020-08-23T03:20:10.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:21:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:22:22.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:23:29.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
    2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
    2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
    2020-08-23T03:25:09.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
    2020-08-23T03:25:58.000-0700    m50-ups.a_domain    upsAlarmOnBypass    cleared" 
    | multikv forceheader=1 
    | eval _time=strptime(time,"%Y-%m-%dT%H:%M:%S.%3N%z")
    | fields _time Agent_Hostname alarm status 
    ```Everything above just defines test data - Remove Before Flight```
    ```Omit the reverse command if events are in descending order (the default)```
    | reverse
    ```Set the start and end times based on status```
    | eval start=if(status="raised",_time, NULL), end=if(status="cleared",_time, NULL)
    ```Define transactions based on "raised/cleared" pairs within host and alarm names```
    | transaction Agent_Hostname alarm startswith="raised" endswith="cleared"
    ```Change duration display to hh:mm:ss```
    | fieldformat duration=tostring(duration,"duration")
    | table start end Agent_Hostname alarm duration
    

    【讨论】:

    • 要保留没有对应结束事件的事件,使用事务的keeporphans参数|事务 Agent_Hostname 警报开始于="raised" endswith="cleared" keeporphans=true
    • 我试过了,但没有得到预期的结果。
    • @RichG 我想给你多点展示交易的强大工具!一个非常令人印象深刻的解决方案。
    • 我也很惊讶将 keeporphans=true 添加到事务中导致很少有事件被分组为事务。
    • transaction 命令功能强大,但能力越大责任越大。它可能是一种资源消耗,因此仅在必要时才使用它。
    猜你喜欢
    • 1970-01-01
    • 2019-07-21
    • 1970-01-01
    • 1970-01-01
    • 2013-03-07
    • 2016-01-18
    • 2018-03-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多