【问题标题】:Sidekiq job doesn't seem to be parsingSidekiq 作业似乎没有解析
【发布时间】:2025-12-19 03:50:15
【问题描述】:

我试图模仿以前的开发人员在我的 Rails 应用程序中解析 XML 文件时所做的事情,但我被卡住了。据我所知,我的工作完成了,但没有按应有的方式发布,所以我猜我的解析文件不正确(但是,在我的本地主机上使用原始文件进行测试时它工作正常)。那么,我哪里错了?

这是 Sidekiq 日志输出,只是为了确认作业正在发生并且在处理过程中没有显示任何错误:

2016-05-25T13:51:04.499Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: start
2016-05-25T13:51:04.781Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: start
2016-05-25T13:51:04.797Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: done: 0.297 sec
2016-05-25T13:51:04.824Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: done: 0.043 sec

这是我的 Sidekiq 作业文件,它遍历通过我的 API 提交的压缩文件。我正在处理的文件是nmap_poodle_scan.xml:

class ParseTestData
  include Sidekiq::Worker

  # Order matters. Parse network hosts first to ensure we uniquely identify network hosts by their mac address.
  PARSERS = {
    "network_hosts.xml" => Parsers::NetworkHostParser,
    "nmap_tcp_service_scan.xml" => Parsers::TcpServiceScanParser,
    "nmap_shellshock_scan.xml" => Parsers::ShellshockScanParser,
    "hydra.out" => Parsers::HydraParser,
    "events.log" => Parsers::EventParser,
    "nmap_poodle_scan.xml" => Parsers::PoodleScanParser
  }

  def perform(test_id)
    test = Test.find(test_id)

    gzip = if Rails.env.development?
      Zlib::GzipReader.open(test.data.path)
    else
      file = Net::HTTP.get(URI.parse(test.data.url))
      Zlib::GzipReader.new(StringIO.new(file))
    end

    # Collect entries from tarball
    entries = {}
    tar_extract = Gem::Package::TarReader.new(gzip)
    tar_extract.rewind
    tar_extract.each do |entry|
      entries[File.basename(entry.full_name)] = entry.read
    end

    # Preserve parse order by using the parser hash to initiate parser executions.
    PARSERS.each_pair do |filename, parser|
      next unless entry = entries[filename]
      parser.run!(test, entry)
    end
  end
end

抓取 nmap_poodle_scan.xml:

<host starttime="1464180941" endtime="1464180941"><status state="up" reason="arp-response" reason_ttl="0"/>
<address addr="10.10.10.1" addrtype="ipv4"/>
<address addr="4C:E6:76:3F:2F:77" addrtype="mac" vendor="Buffalo.inc"/>
<hostnames>
<hostname name="DD-WRT" type="PTR"/>
</hostnames>
Nmap scan report for DD-WRT (10.10.10.1)
<ports><extraports state="closed" count="996">
<extrareasons reason="resets" count="996"/>
</extraports>
<table key="CVE-2014-3566">
<elem key="title">SSL POODLE information leak</elem>
<elem key="state">VULNERABLE</elem>
<table key="ids">
<elem>OSVDB:113251</elem>
<elem>CVE:CVE-2014-3566</elem>
</table>
<table key="description">
<elem>    The SSL protocol 3.0, as used in OpenSSL through 1.0.1i and&#xa;    other products, uses nondeterministic CBC padding, which makes it easier&#xa;    for man-in-the-middle attackers to obtain cleartext data via a&#xa;    padding-oracle attack, aka the &quot;POODLE&quot; issue.</elem>
</table>
<table key="dates">
<table key="disclosure">
<elem key="year">2014</elem>
<elem key="month">10</elem>
<elem key="day">14</elem>
</table>
</table>
<elem key="disclosure">2014-10-14</elem>
<table key="check_results">
<elem>TLS_RSA_WITH_3DES_EDE_CBC_SHA</elem>
</table>
<table key="refs">
<elem>https://www.imperialviolet.org/2014/10/14/poodle.html</elem>
<elem>http://osvdb.org/113251</elem>
<elem>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3566</elem>
<elem>https://www.openssl.org/~bodo/ssl-poodle.pdf</elem>
</table>
</table>
</script></port>
</ports>
<times srtt="4665" rttvar="556" to="100000"/>
</host>

应该提交给 PoodleScanParser:

module Parsers
  class PoodleScanParser < NmapScanParser
    def self.run!(test, content)    
      super(test, content, "//host//ports[.//elem[@key='state'][contains(text(), 'VULNERABLE')]]") do |host, network_host_test|
        logger.info "Something cool"
        IssueFinder.match(cve_id: "CVE-2014-3566").each do |issue|
          Result.generate!(network_host_test.id, issue.id)
        end
      end
    end
  end
end

继承自 NmapScanParser。此文件已确认解析器可以正常工作,所以我知道这不是问题:

module Parsers
  class NmapScanParser

    def self.run!(test, content, xpath)
      document = Nokogiri::XML(content)
      document.remove_namespaces!

      document.xpath(xpath).each do |host|
        ip_address = host.at_xpath("address[@addrtype='ipv4']").at_xpath("@addr").value
        vendor = host.at_xpath("address[@addrtype='mac']").at_xpath("@vendor").value rescue "Unknown"
        hostname = host.at_xpath("hostnames/hostname").at_xpath("@name").value rescue "Hostname Unknown"
        os = host.at_xpath("os/osmatch").at_xpath("@name").value rescue "Unknown"
        os_vendor = host.at_xpath("os/osmatch/osclass").at_xpath("@vendor").value rescue "Unknown"

        network_host_test = NetworkHostTest.generate!(test, ip_address: ip_address, hostname: hostname, vendor: vendor, os: os, os_vendor: os_vendor)

        # If we didn't find a network host, that's because our network_hosts file didn't have this entry.
        next unless network_host_test

        yield(host, network_host_test)
      end
    end

  end
end

我已经确认解析器可以在我的本地主机上使用一个普通的 ruby​​ 文件并运行ruby poodle_parser.rb

require 'nokogiri'

document = Nokogiri::XML(File.open("poodle_results.xml"))
document.remove_namespaces!

document.xpath("//host[.//elem/@key='state']").each do |host|
  ip_address = host.at_xpath("address[@addrtype='ipv4']").at_xpath("@addr").value
  result =  host.at_xpath("//ports//elem[@key='state']").content
  puts "#{ip_address} #{result}"
end

在终端中输出我期望的结果:

10.10.10.1 VULNERABLE

所以,最后,我希望生成一个Result,但事实并非如此。我在本地主机上的 Rails 日志中没有看到任何错误,也没有在 Sidekiq 日志中看到任何指示错误的信息!


我决定在我的PoodleScanParser 中添加一个logger.info 行,以查看解析器是否正常运行。假设我正确执行了此操作,解析器看起来不像在运行。

【问题讨论】:

  • 您确定在部署时 sidekiq 会重新启动吗?许多问题都是从这个假设不正确开始的。
  • @msergeant 是的,我正在手动重启它。
  • 尝试使用 .new.perform 而不是 .perform_async 运行您的作业,这样您就可以使用您喜欢的调试器(例如 pry)来监控它,而不是让它通过 Sidekiq 运行。

标签: ruby-on-rails ruby xml nokogiri sidekiq


【解决方案1】:

嗯,答案与 Sidekiq 无关,而是输出,Nokogiri 正在死去。原来 Nmap 在 XML 文件“Starting Nmap 7.12”的开头添加了一个非 XML 行。所以,Nokogiri 简直就是死在那里。

我想这个故事的寓意是确保你的 XML 输出是你 Nokogiri 想要的!

【讨论】:

    最近更新 更多