【问题标题】:extract value from xml using python bs4 and lxml使用python bs4和lxml从xml中提取值
【发布时间】:2018-03-20 12:38:42
【问题描述】:

如何从下面的 xml 文件中提取侦听器的数量 <listeners>10</listeners>,我的代码不起作用。

import bs4
import urllib2
import lxml
 bs4.BeautifulSoup(urllib2.urlopen('http://admin:mashytamam@192.168.0.31:8382/admin/').read(), 'lxml')    
SERVER = 'http://192.168.0.31:8382/admin/'
authinfo = urllib2.HTTPPasswordMgrWithDefaultRealm()
authinfo.add_password(None, SERVER, 'admin', 'mypassword')
page = 'http://192.168.0.31:8382/admin/'
handler = urllib2.HTTPBasicAuthHandler(authinfo)
myopener = urllib2.build_opener(handler)
opened = urllib2.install_opener(myopener)
output = urllib2.urlopen(page)
print output.read()
soup = bs4.BeautifulSoup(output.read(), 'lxml')
print soup.find('listeners')

xml如下

<icestats>
<admin>icemaster@localhost</admin>
<banned_IPs>0</banned_IPs>
<build>20140902200316</build>
<client_connections>289</client_connections>
<clients>2</clients>
<connections>291</connections>
<file_connections>13</file_connections>
<host>localhost</host>
<listener_connections>0</listener_connections>
<listeners>10</listeners>
<location>Earth</location>
<outgoing_kbitrate>0</outgoing_kbitrate>
<server_id>Icecast 2.3.3-kh11</server_id>
<server_start>08/Oct/2017:08:43:08 +1100</server_start>
<source_client_connections>1</source_client_connections>
<source_relay_connections>0</source_relay_connections>
<source_total_connections>1</source_total_connections>
<sources>1</sources>
<stats>0</stats>
<stats_connections>0</stats_connections>
<stream_kbytes_read>185119</stream_kbytes_read>
<stream_kbytes_sent>0</stream_kbytes_sent>
<source mount="/listen.mp3">
<audio_codecid>2</audio_codecid>
<audio_info>bitrate=60</audio_info>
<bitrate>60</bitrate>
<connected>42056</connected>
<genre>Islam</genre>
<incoming_bitrate>35976</incoming_bitrate>
<listener_connections>0</listener_connections>
<listener_peak>0</listener_peak>
<listeners>0</listeners>
<listenurl>http://localhost:8382/listen.mp3</listenurl>
<max_listeners>unlimited</max_listeners>
<mpeg_channels>2</mpeg_channels>
<mpeg_samplerate>22050</mpeg_samplerate>
<outgoing_kbitrate>0</outgoing_kbitrate>
<public>1</public>
<queue_size>64523</queue_size>
<server_description>Quran Kareem Radio</server_description>
<server_name>Quran Kareem Radio</server_name>
<server_type>audio/mpeg</server_type>
<server_url>http://qkradio.com.au</server_url>
<slow_listeners>0</slow_listeners>
<source_ip>139.218.241.112</source_ip>
<stream_start>08/Oct/2017:08:43:16 +1100</stream_start>
<total_bytes_read>189563392</total_bytes_read>
<total_bytes_sent>0</total_bytes_sent>
<total_mbytes_sent>0</total_mbytes_sent>
<user_agent>instreamer</user_agent>
</source>
</icestats>

【问题讨论】:

    标签: python beautifulsoup lxml


    【解决方案1】:

    试试这个:

    soup = BeautifulSoup(output.read(), 'xml')
    for value in soup.find_all('listeners'):
        print(value.get_text())
    

    【讨论】:

    • 如何处理多个listeners 项目?你能扩展你的代码吗?
    • 对不起,你当然需要使用find_all。像这样:'soup = BeautifulSoup(page, 'xml') for value in soup.find_all('listeners'): print(value.get_text())`
    【解决方案2】:

    使用这个:

    soup = BeautifulSoup(output.read())
    soup.select('listeners')
    [<listeners>10</listeners>, <listeners>0</listeners>]
    

    【讨论】:

    • 我得到的只是[]?
    • 请检查你的汤对象,它是否包含任何数据?因为这应该工作
    • 在上面的代码中使用 print output.read() 会打印 xml 文件
    • 在他们的标签 52 上使用 soup.select('connections') 仍然会打印出 []
    • 使用 .textsoup.select('connections')[0].text 。注意.select 返回一个数组,所以必须定义它的索引
    猜你喜欢
    • 2018-09-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-01-06
    • 1970-01-01
    • 2021-12-24
    • 1970-01-01
    相关资源
    最近更新 更多