【问题标题】:pandas dataframe indexing KeyError: 'Release Date'熊猫数据框索引 KeyError:“发布日期”
【发布时间】:2019-03-11 16:44:47
【问题描述】:

这段代码的目的是抓取一堆数据表,把它们变成pandas数据框,去掉一些不必要的列,固定日期,然后将它们连接到一个统一的数据框中,制作“发布日期”数据框作为统一数据框的索引。

除了索引之外,上述所有工作都可以。

这是数据样本:

发布日期美国 52 周票据拍卖土耳其国内生产总值 (GDP) 同比

2018-06-19 18:30:00+02:00 2.275% NaN

2018-07-17 18:30:00+02:00 2.335% NaN

2018-08-14 18:30:00+02:00 2.365% NaN

2018-09-10 10:00:00+02:00 NaN 5.2%

2018-09-11 18:30:00+02:00 2.465% NaN

2018-10-09 18:30:00+02:00 NaN

代码如下:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
from datetime import datetime
from tzlocal import get_localzone


class DataEngine:
    def __init__(self):
        self.urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
        self.driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
        self.wait = WebDriverWait(self.driver, 10)

    def title(self):
        names = []
        for url in self.urls:
            self.driver.get(url)
            title = self.driver.find_element_by_xpath('//*[@id="leftColumn"]/h1').text
            names.append(title)
        return names

    def table(self):
        DataFrames = []
        for url in self.urls:
            self.driver.get(url)
            while True:
                try:
                    item = self.wait.until(
                        ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
                    self.driver.execute_script("arguments[0].click();", item)
                except Exception:
                    break

            df = pd.DataFrame(columns=['Release Date', 'Time', 'Actual', 'Forecast', 'Previous'])
            pos = 0
            for table in self.wait.until(
                    ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
                data = [item.text for item in table.find_elements_by_xpath(".//*[self::td]")]
                if data:
                    df.loc[pos] = data[0:5]
                    pos += 1
            df = df.head(10)
            DataFrames.append(df)
        return DataFrames

    def date(self):

        dfs = []
        tables = self.table()
        for df in tables:
            Dates = []
            df["Date"] = df["Release Date"].apply(lambda date: date[:12]) + " " + df["Time"]
            for date in df["Date"]:
                date = datetime.strptime(date.strip(), '%b %d, %Y %H:%M')
                Dates.append(date)
            df["Date"] = Dates
            df['Date'] = df['Date'].dt.tz_localize('EST').dt.tz_convert(get_localzone())
            df = df[['Date', 'Actual', 'Forecast', 'Previous', 'Release Date', 'Time']]
            df = df.drop(df.columns[-4:], axis=1).reset_index(drop=True)

            dfs.append(df)



        return dfs

    def rename(self):
        FinalDataFrames = []
        tables = self.date()
        names = self.title()
        for name, table in zip(names, tables):
            table.rename(columns={'Date': 'Release Date', 'Actual': name}, inplace=True)
            table['Release Date'] = pd.to_datetime(table['Release Date'])
            table = table.set_index('Release Date')
            FinalDataFrames.append(table)

        return FinalDataFrames

    def finalDF(self):
        dfs = self.rename()
        df = pd.concat(dfs, axis=1, join='outer', sort=True)
        df = df.set_index('Release Date', inplace=True)
        print(df)

这是错误:

eTraceback (most recent call last):

 return self._engine.get_loc(key)
   File "pandas\_libs\index.pyx", line 140, in 

 pandas._libs.index.IndexEngine.get_loc
 File "pandas\_libs\index.pyx", line 162, in 

  pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in 

  pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in 

  pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'Release Date'

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):

  File "D:/Projects/Tutorial/database.py", line 96, in <module>
    DataEngine().finalDF()

  File "D:/Projects/Tutorial/database.py", line 85, in finalDF
  df = df.set_index('Release Date', inplace=True)

  File "C:\Users\Sayed\Anaconda3\lib\site- 
   packages\pandas\core\frame.py", line 
  3909, in set_index

  level = frame[col]._values

  File "C:\Users\Sayed\Anaconda3\lib\site- 
  packages\pandas\core\frame.py", line 2688, in __getitem__

  return self._getitem_column(key)

  File "C:\Users\Sayed\Anaconda3\lib\site- 
  packages\pandas\core\frame.py", line 2695, in _getitem_column

  return self._get_item_cache(key)

  File "C:\Users\Sayed\Anaconda3\lib\site- 
  packages\pandas\core\generic.py", line 2489, in _get_item_cache

  values = self._data.get(item)

  File "C:\Users\Sayed\Anaconda3\lib\site- 
   packages\pandas\core\internals.py", line 4115, in get

  loc = self.items.get_loc(item)

  File "C:\Users\Sayed\Anaconda3\lib\site- 
  packages\pandas\core\indexes\base.py", line 3080, in get_loc

  return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 140, in 
  pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 162, in 
  pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in 
  pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in 
  pandas._libs.hashtable.PyObjectHashTable.get_item

  KeyError: 'Release Date'

【问题讨论】:

    标签: python pandas selenium dataframe


    【解决方案1】:

    您已经在rename() 中设置了索引(这会删除Release Date 列),因此当您再次尝试在finalDF() 中设置索引时,pandas 找不到Release Date 列并引发异常。

    注意:
    df = df.set_index('Release Date', inplace=True) 就地设置索引,因此在执行该行后df 将是None。要么删除inplace=True,要么不进行变量赋值。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-20
      • 2021-03-19
      • 2017-02-08
      • 2021-05-03
      • 2021-03-19
      • 1970-01-01
      相关资源
      最近更新 更多