【问题标题】:Unable to retrieve row data using beautiful soup无法使用漂亮的汤检索行数据
【发布时间】:2021-02-21 18:04:52
【问题描述】:

我一直在尝试提取表格,但它只检索表格的标题。这是我检索表的第一种方法。

url = r"https://www.sec.gov/edgar/search/#/q=Women&dateRange=custom&entityName=Infosys&startdt=2010-03-01&enddt=2020-03-01"

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find_all("table")[1]

#Extracting heading of the columns of the table.

rows = table.find_all('tr')
columns=[]
headings = rows[0].find_all('th')
for col in headings:
    columns.append(col.text.strip())
print(columns)

#Extracting all data of the table row wise.

all_data=[]
for row in rows[1:]:
    data = row.find_all('td')
    lst=[]
    for d in data:
        lst.append(d.text.strip())
    all_data.append(lst)

 #Creating the dataframe out of the extracted data.

ds = pd.DataFrame(all_data, columns=columns)
ds

第二种方式:

ds1 = pd.read_html(url)[0]
ds1

当我尝试搜索表格时,我得到了thead标签中的所有列标题,但我得到一个空的tbody。

table = soup.find_all("table", class_='table')
table

输出:

 [<table class="table table-hover entity-hints" id="asdf"></table>,
 <table class="table">
 <thead>
 <tr>
 <th class="filetype" id="filetype">Form &amp; File</th>
 <th class="filed">Filed</th>
 <th class="enddate">Reporting for</th>
 <th class="entity-name">Filing entity/person</th>
 <th class="cik">CIK</th>
 <th class="located">Located</th>
 <th class="incorporated">Incorporated</th>
 <th class="file-num">File number</th>
 <th class="film-num">Film number</th>
 </tr>
 </thead>
 <tbody>
 </tbody>
 </table>]

为什么tbody标签是空的?

桌子截图:

【问题讨论】:

标签: python beautifulsoup


【解决方案1】:

通过向https://efts.sec.gov/LATEST/search-index 发送POST 请求来加载表。您可以按如下方式抓取数据:

import json
import requests
from bs4 import BeautifulSoup


URL = "https://efts.sec.gov/LATEST/search-index"
data = {
    "q": "Women",
    "dateRange": "custom",
    "entityName": "Infosys",
    "startdt": "2010-03-01",
    "enddt": "2020-03-01",
}

soup = BeautifulSoup(requests.post(URL, data=json.dumps(data)).content, "html.parser")

json_data = json.loads(str(soup))

fmt_string = "{:<25} {:<20} {:<20} {:<20}"
print(
    fmt_string.format("Form & File", "Filed", "Reporting for", "Filing/entity person")
)
print("-" * 100)

for data in json_data["hits"]["hits"]:
    form = data["_source"]["root_form"] + data["_source"]["file_type"]
    filed = data["_source"]["file_date"]
    reporting_for = data["_source"]["period_ending"]
    entity = data["_source"]["display_names"][0].split("(CIK")[0]

    print(fmt_string.format(form, filed, reporting_for, entity))

输出:

Form & File               Filed                Reporting for        Filing/entity person
----------------------------------------------------------------------------------------------------
6-KEX-99.1 CHARTER        2016-01-14           2015-12-31           Infosys Ltd  (INFY)  
6-KEX-99.3 VOTING TRUST   2016-07-20           2016-06-30           Infosys Ltd  (INFY)  
6-KEX-99.1 CHARTER        2014-01-15           2013-12-31           Infosys Ltd  (INFY)  
6-KEX-99.1                2014-01-10           2013-12-31           Infosys Ltd  (INFY)  
6-KEX-99.1 CHARTER        2019-10-11           2019-09-30           Infosys Ltd  (INFY)  
6-KEX-99.2 BYLAWS         2019-10-16           2019-09-30           Infosys Ltd  (INFY)  
20-F20-F                  2016-05-18           2016-03-31           Infosys Ltd  (INFY)  
6-KEX-99.2                2016-01-19           2015-12-31           Infosys Ltd  (INFY)  
20-F20-F                  2019-06-19           2019-03-31           Infosys Ltd  (INFY)  
6-KEX-99.1 CHARTER        2013-12-20           2013-12-20           Infosys Ltd  (INFY)  
20-F20-F                  2017-06-12           2017-03-31           Infosys Ltd  (INFY)  
20-F20-F                  2014-05-09           2014-03-31           Infosys Ltd  (INFY)  
6-KEX-99.2 BYLAWS         2014-01-15           2013-12-31           Infosys Ltd  (INFY)  
6-KEX-99.1 CHARTER        2019-10-16           2019-09-30           Infosys Ltd  (INFY)  
20-F20-F                  2018-07-19           2018-03-31           Infosys Ltd  (INFY)  
6-K6-K                    2013-12-20           2013-12-20           Infosys Ltd  (INFY)  
6-KEX-99.1                2016-01-19           2015-12-31           Infosys Ltd  (INFY)  
6-K6-K                    2014-03-28           2014-03-28           Infosys Ltd  (INFY)  
20-F20-F                  2015-05-20           2015-03-31           Infosys Ltd  (INFY)  
6-KEX-99.3 VOTING TRUST   2010-07-16           2010-06-30           INFOSYS TECHNOLOGIES LTD  (INFY)  

【讨论】:

  • 我想获得每个表格的链接,无论如何我都会设法从上表中获得。非常感谢您的帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-05-22
  • 1970-01-01
  • 1970-01-01
  • 2021-01-12
  • 2019-09-28
  • 1970-01-01
相关资源
最近更新 更多