【问题标题】:Python webscraping cannot accest span elementsPython网页抓取无法访问span元素
【发布时间】:2019-03-25 23:04:11
【问题描述】:

我正在尝试从以下链接解析数据 https://www.sec.gov/Archives/edgar/data/1652707/000165270718000002/xslFormDX01/primary_doc.xml

我想根据复选框找到行业组,但无法从以下代码访问 span 元素

<td><table border="0" summary="Table with single CheckBox"><tr>
<td class="CheckBox"><span class="FormData">X</span></td>
<td align="left" class="FormText">Other Health Care</td>
</tr></table></td>

这是我尝试过的

import csv
from datetime import datetime
from bs4 import BeautifulSoup
from selenium import webdriver

chromedriver = '/usr/local/bin/chromedriver'
browser = webdriver.Chrome(chromedriver)
#specifying the url of the page
browser.get('https://www.sec.gov/Archives/edgar/data/1753852/000175385218000001/xslFormDX01/primary_doc.xml')
#specifying the url of the page
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'summary':'Issuer Identity Information'})
td = table.find_all('td',{'class':'FormData'})
industry = soup.find('table',{'summary':'Industry Group'})
industrylist = industry.find_all('table',{'summary':'Table with single CheckBox'})
spanelement = industrylist[10]
print(spanelement)

结果中没有 span 元素,我想访问它来查找行业

<table border="0" summary="Table with single CheckBox"><tbody><tr>
<td class="CheckBox">  </td>
<td align="left" class="FormText">Other Health Care</td>
</tr></tbody></table>

我是网络抓取的新手,有人可以帮忙吗!

【问题讨论】:

    标签: python html selenium-webdriver web-scraping beautifulsoup


    【解决方案1】:

    不完美但非常接近。试试下面的脚本:

    import requests
    from bs4 import BeautifulSoup
    
    link = "https://www.sec.gov/Archives/edgar/data/1753852/000175385218000001/xslFormDX01/primary_doc.xml"
    
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for items in soup.select("table[summary='Industry Group'] .CheckBox"):
        if "X" in items.text:
            industry = items.find_parent().select_one(".FormText").text
            print(industry)
    

    输出:

    Pooled Investment Fund
    Other Investment Fund
    Yes
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-12-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-06-21
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多