【问题标题】:Merge hourly data with 15 minute data将每小时数据与 15 分钟数据合并
【发布时间】:2022-11-13 16:13:29
【问题描述】:

通过使用非常低效的字符串操作(将分钟替换为零,即“06:15:00”->“06:00:00”),我能够将每小时数据与 15 分钟数据合并。我想知道是否有更优雅的方式来合并数据。

提前致谢!

import ccxt
import pandas as pd

ex = ccxt.binance({'enableRateLimit': True})

df_15m = pd.DataFrame(ex.fetch_ohlcv(symbol='BTC/USDT', timeframe='15m', limit=9), columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
df_1h = pd.DataFrame(ex.fetch_ohlcv(symbol='BTC/USDT', timeframe='1h', limit=3), columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])

df_15m = df_15m.loc[:, ['timestamp', 'close']]
df_1h = df_1h.loc[:, ['timestamp', 'close']]

df_15m['timestamp'] = pd.to_datetime(df_15m['timestamp'], unit='ms')
df_1h['timestamp'] = pd.to_datetime(df_1h['timestamp'], unit='ms')

df_15m['timestamp_h'] = df_15m['timestamp'].astype("string").str[:14] + '00:00'
df_1h.rename(columns={"timestamp": "timestamp_h"}, inplace=True)
df_1h['timestamp_h'] = df_1h['timestamp_h'].astype("string")

df_15m.rename(columns={"close": "close_15m"}, inplace=True)
df_1h.rename(columns={"close": "close_h"}, inplace=True)

print('Hourly Data:\n', df_1h, '\n')
print('15m Data:\n', df_15m, '\n')

df_merged = pd.merge(left=df_15m, right=df_1h, how='left', on=['timestamp_h'])

print('Merged Data:\n', df_merged, '\n')

输出:

Hourly Data:
            timestamp_h   close_h
0  2022-11-13 05:00:00  16853.68
1  2022-11-13 06:00:00  16684.45
2  2022-11-13 07:00:00  16731.94 

15m Data:
             timestamp  close_15m          timestamp_h
0 2022-11-13 05:00:00   16857.53  2022-11-13 05:00:00
1 2022-11-13 05:15:00   16849.16  2022-11-13 05:00:00
2 2022-11-13 05:30:00   16856.41  2022-11-13 05:00:00
3 2022-11-13 05:45:00   16853.68  2022-11-13 05:00:00
4 2022-11-13 06:00:00   16862.98  2022-11-13 06:00:00
5 2022-11-13 06:15:00   16807.98  2022-11-13 06:00:00
6 2022-11-13 06:30:00   16806.79  2022-11-13 06:00:00
7 2022-11-13 06:45:00   16684.45  2022-11-13 06:00:00
8 2022-11-13 07:00:00   16731.94  2022-11-13 07:00:00 

Merged Data:
             timestamp  close_15m          timestamp_h   close_h
0 2022-11-13 05:00:00   16857.53  2022-11-13 05:00:00  16853.68
1 2022-11-13 05:15:00   16849.16  2022-11-13 05:00:00  16853.68
2 2022-11-13 05:30:00   16856.41  2022-11-13 05:00:00  16853.68
3 2022-11-13 05:45:00   16853.68  2022-11-13 05:00:00  16853.68
4 2022-11-13 06:00:00   16862.98  2022-11-13 06:00:00  16684.45
5 2022-11-13 06:15:00   16807.98  2022-11-13 06:00:00  16684.45
6 2022-11-13 06:30:00   16806.79  2022-11-13 06:00:00  16684.45
7 2022-11-13 06:45:00   16684.45  2022-11-13 06:00:00  16684.45
8 2022-11-13 07:00:00   16731.94  2022-11-13 07:00:00  16731.94

【问题讨论】:

    标签: python pandas dataframe bitcoin ccxt


    【解决方案1】:

    例子

    df

    data = [['2022-11-13 05:00:00', 16853.68],
            ['2022-11-13 06:00:00', 16684.45],
            ['2022-11-13 07:00:00', 16731.94]]
    df = pd.DataFrame(data, columns=['timestamp_h', 'close_h'])
    
        timestamp_h             close_h
    0   2022-11-13  05:00:00    16853.6800
    1   2022-11-13  06:00:00    16684.4500
    2   2022-11-13  07:00:00    16731.9400
    

    df1

    data1 = [['2022-11-13 05:00:00', 16857.53],
             ['2022-11-13 05:15:00', 16849.16],
             ['2022-11-13 05:30:00', 16856.41],
             ['2022-11-13 05:45:00', 16853.68],
             ['2022-11-13 06:00:00', 16862.98],
             ['2022-11-13 06:15:00', 16807.98],
             ['2022-11-13 06:30:00', 16806.79],
             ['2022-11-13 06:45:00', 16684.45],
             ['2022-11-13 07:00:00', 16731.94]]
    df1 = pd.DataFrame(data1, columns=['timestamp', 'close'])
    
        timestamp           close
    0   2022-11-13 05:00:00 16857.5300
    1   2022-11-13 05:15:00 16849.1600
    2   2022-11-13 05:30:00 16856.4100
    3   2022-11-13 05:45:00 16853.6800
    4   2022-11-13 06:00:00 16862.9800
    5   2022-11-13 06:15:00 16807.9800
    6   2022-11-13 06:30:00 16806.7900
    7   2022-11-13 06:45:00 16684.4500
    8   2022-11-13 07:00:00 16731.9400
    



    第一的

    df1中创建timestamp_h列(时间戳数据类型)

    df1.assign(timestamp_h=pd.PeriodIndex(df1['timestamp'], freq='1H').to_timestamp())
    

    输出:

        timestamp           close       timestamp_h
    0   2022-11-13 05:00:00 16857.5300  2022-11-13 05:00:00
    1   2022-11-13 05:15:00 16849.1600  2022-11-13 05:00:00
    2   2022-11-13 05:30:00 16856.4100  2022-11-13 05:00:00
    3   2022-11-13 05:45:00 16853.6800  2022-11-13 05:00:00
    4   2022-11-13 06:00:00 16862.9800  2022-11-13 06:00:00
    5   2022-11-13 06:15:00 16807.9800  2022-11-13 06:00:00
    6   2022-11-13 06:30:00 16806.7900  2022-11-13 06:00:00
    7   2022-11-13 06:45:00 16684.4500  2022-11-13 06:00:00
    8   2022-11-13 07:00:00 16731.9400  2022-11-13 07:00:00
    

    第二

    使timstamp_h成为df中的时间戳dtype

    df.assign(timestamp_h=pd.to_datetime(df['timestamp_h']))
    

    最后的

    merge(完整代码包括第一秒)

    (df1
     .assign(timestamp_h=(pd.PeriodIndex(df1['timestamp'], freq='H').to_timestamp()))
     .merge(df.assign(timestamp_h=pd.to_datetime(df['timestamp_h']))))
    

    输出:

        timestamp           close       timestamp_h         close_h
    0   2022-11-13 05:00:00 16857.5300  2022-11-13 05:00:00 16853.6800
    1   2022-11-13 05:15:00 16849.1600  2022-11-13 05:00:00 16853.6800
    2   2022-11-13 05:30:00 16856.4100  2022-11-13 05:00:00 16853.6800
    3   2022-11-13 05:45:00 16853.6800  2022-11-13 05:00:00 16853.6800
    4   2022-11-13 06:00:00 16862.9800  2022-11-13 06:00:00 16684.4500
    5   2022-11-13 06:15:00 16807.9800  2022-11-13 06:00:00 16684.4500
    6   2022-11-13 06:30:00 16806.7900  2022-11-13 06:00:00 16684.4500
    7   2022-11-13 06:45:00 16684.4500  2022-11-13 06:00:00 16684.4500
    8   2022-11-13 07:00:00 16731.9400  2022-11-13 07:00:00 16731.9400
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-03-21
      • 2017-08-26
      • 1970-01-01
      • 2021-09-19
      • 1970-01-01
      • 2017-12-17
      • 2022-09-30
      相关资源
      最近更新 更多