【问题标题】:How to append the data in DataFrame如何在 DataFrame 中追加数据
【发布时间】:2021-11-17 01:26:25
【问题描述】:

我希望“数据框”中的数据能够完美运行,请解决这些问题并在数据框中提供数据我尝试解决但未能做到这些


            from selenium import webdriver
            from bs4 import BeautifulSoup
            import pandas as pd
            import time
                
            browser = webdriver.Chrome('F:\chromedriver.exe')
            browser.get("https://capitalonebank2.bluematrix.com/sellside/Disclosures.action")
            
            headers = {
                    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"}
            
            for title in browser.find_elements_by_css_selector('option'):
                title.click()
                time.sleep(1)
                browser.switch_to.frame(browser.find_elements_by_css_selector("iframe")[1])
                table = browser.find_element_by_css_selector("table table")
                
                soup = BeautifulSoup(table.get_attribute("innerHTML"), "lxml")
                all_data = []
                ratings = {"BUY":[], "HOLD":[], "SELL":[]}
                lists_ = []
                for row in soup.select("tr")[-4:-1]:
                    info_list = row.select("td")
                    count = info_list[1].text
                    percent = info_list[2].text
                    
                    IBServ_count = info_list[4].text
                    IBServ_percent = info_list[5].text
                    
                    lists_.append([count, percent, IBServ_count, IBServ_percent])
                    
                ratings["BUY"] = lists_[0]
                ratings["HOLD"] = lists_[1]
                ratings["SELL"] = lists_[2]

【问题讨论】:

  • 那么你想要一个从你在这里刮到的丢失的熊猫数据框吗?
  • 是的,我想要一个 pandas 数据框

标签: python dataframe web-scraping


【解决方案1】:

我尝试了很多来从您的代码中获取输出,但由于存在巨大的错误而没有得到。实际上,从该代码中获得所需的结果几乎是不可能的。然而,这是我最好的尝试和有效的解决方案。

代码:

from bs4 import BeautifulSoup
import time
from selenium import webdriver
import pandas as pd

data = []
driver = webdriver.Chrome('chromedriver.exe')
#driver.maximize_window()
driver.set_window_size(1920, 1080)
time.sleep(10)

url = 'https://capitalonebank2.bluematrix.com/sellside/Disclosures.action'
driver.get(url)
time.sleep(5)

title = driver.find_elements_by_xpath('//option')
title[0].click()
time.sleep(5)

driver.switch_to.frame(driver.find_elements_by_css_selector("iframe")[1])
table = driver.find_element_by_xpath('//*[@bgcolor="#ffffff"]/table/tbody/tr/td/table')
time.sleep(5)

driver.find_element_by_xpath('//*[@bgcolor="#ffffff"]//table//tr[6]')


soup = BeautifulSoup(driver.page_source, 'lxml')

trs = soup.select('table table tr')
for tr in trs[3:8]:
  
    data.append(tr.stripped_strings)
df = pd.DataFrame(data).to_csv('table_data.csv', index = False)
#print(df)

输出:

0      1        2      3        4
0     Rating  Count  Percent  Count  Percent
1       None   None     None   None     None
2   BUY [OW]     89    67.94     41    46.07
3  HOLD [EW]     42    32.06     12    28.57
4  SELL [UW]      0     0.00      0     0.00

以 csv 格式输出:

Rating  Count   Percent Count   Percent
BUY [OW]    89  67.94   41  46.07
HOLD [EW]   42  32.06   12  28.57
SELL [UW]   0     0      0   0

【讨论】:

  • 大哥,CSV format不提供数据太费时间了
  • 就我而言,它可以正常工作。我已经评论了 print 并且未评论 csv。请再次运行,并确保一切正常。
猜你喜欢
  • 2013-01-22
  • 1970-01-01
  • 2020-09-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-02-16
  • 1970-01-01
相关资源
最近更新 更多