【问题标题】:Importing timeseries data from a .csv into a dataframe that is between two dates将时间序列数据从 .csv 导入到两个日期之间的数据框中
【发布时间】:2016-04-09 22:08:50
【问题描述】:

只有当数据在两个日期之间时,有没有办法将时间序列数据导入 .csv?

下面的代码可以导入一系列 .csv 文件中的所有数据,但是否可以只在两个日期之间导入?

def getTimeseriesData(DataPath, startDate, endDate):
    colNames = ['date']

    path = DataPath
    filePath = path, "*.csv"
    allfiles = glob.glob(os.path.join(path, "*.csv"))
    for fname in allfiles:
        name = os.path.splitext(fname)[0]
        name = os.path.split(name)[1]

        colNames.append(name)
    print(colNames)

    dataframes = [pd.read_csv(fname, header=None) for fname in allfiles]


    reduce(partial(pd.merge, on=0, how='outer'), dataframes)

    timeseriesData = reduce(partial(pd.merge, on=0, how='outer'), dataframes)

    timeseriesData.columns=colNames

    return timeseriesData

    print(type(timeseriesData))

【问题讨论】:

    标签: python csv python-3.x pandas


    【解决方案1】:
    import glob
    
    def getTimeseriesData(data_path, start_date, end_date):
        dfs = []
        for f_name in glob.glob(os.path.join(data_path, "*.csv")):
            df = pd.read_csv(f_name, header=None)
            # Date filter (assumes filter column is named 'date').
            dfs.append(df.loc[(df['date'] >= start_date) & (df['date'] <= end_date), :])
        dfs = pd.concat(dfs)
        return dfs
    

    【讨论】:

      【解决方案2】:

      我会给你一个一般性的答案。

      首先,您的日期应以日期时间格式保存。如果您从 Excel 以“day.month.year”或“day-month-year”等格式导入,我将使用此类函数返回日期时间

      def to_date(date, split_sign):
          date = date.split(split_sign)
          day = date[0].replace(split_sign, ' ')
          month = date[1].replace(split_sign, ' ')
          if len(date[2].replace(split_sign, ' ')) < 4:
              year = '20' + date[2].replace(split_sign, ' ')
          else:
              year = date[2].replace(split_sign, ' ')
          date = str(day + month + year)
          return datetime.datetime.strptime(date, '%d%m%Y').date()
      

      Pandas 有一个函数 pandas.to_datetime 可以将日期转换为日期时间,但对我来说它并不总是有效。

      然后像 [day,month,year] 这样插入日期的地方

      def filter_df(df, date_from, date_to):
          date1 = datetime.datetime(date_from[2], date_from[1], date_from[0])
          date2 = datetime.datetime(date_to[2], date_to[1], date_to[0])
          return df[(df['date']>=date1) & (df['date']<=date2)]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-10-29
        • 2012-07-24
        相关资源
        最近更新 更多