自定义日期时间解析以在读取 csv 后结合日期和时间 - Pandas答案

【问题标题】：Custom datetimeparsing to combine date and time after reading csv - Pandas自定义日期时间解析以在读取 csv 后结合日期和时间 - Pandas
【发布时间】：2016-11-02 14:56:49
【问题描述】：

在阅读文本文件时，我看到一种奇怪的格式，其中日期和时间包含在单独的列中，如下所示（文件是制表符作为分隔符）。

        temp
        room 1
Date    Time    simulation
Fri, 01/Jan 00:30   11.94
    01:30   12
    02:30   12.04
    03:30   12.06
    04:30   12.08
    05:30   12.09
    06:30   11.99
    07:30   12.01
    08:30   12.29
    09:30   12.46
    10:30   12.35
    11:30   12.25
    12:30   12.19
    13:30   12.12
    14:30   12.04
    15:30   11.96
    16:30   11.9
    17:30   11.92
    18:30   11.87
    19:30   11.79
    20:30   12
    21:30   12.16
    22:30   12.27
    23:30   12.3
Sat, 02/Jan 00:30   12.25
    01:30   12.19
    02:30   12.14
    03:30   12.11
etc.

我想：

通过两列解析日期和时间 ([0],[1])；
将所有时间戳提前 30 分钟，即将:30 替换为:00；

我使用了以下代码：

timeparse = lambda x: pd.datetime.strptime(x.replace(':30',':00'), '%H:%M')

df = pd.read_csv('Chart_1.txt',
    sep='\t',
    skiprows=1,
    date_parser=timeparse,
    parse_dates=['Time'],
    header=1)

这似乎是解析时间而不是日期（显然，这是我告诉它要做的）。此外，跳过行对于查找 Date 和 Time 标头很有用，但它会丢弃我需要的标头 temp 和 room 1。

【问题讨论】：

您的数据的tabs 副本存在问题 - 我不知道哪里有，哪里没有。是否可以通过 wetransfer、gdocs、dropbox 将您的样本归档并共享？
或另一个问题 - 来自5. 第行的数据被正确解析为Time 和simulation 列而不是？
@jezrael，我已将文件上传到 [dropbox.com/s/o1b7aa83s2mh0km/Chart_1.txt?dl=0] (dropbox)

标签： csv parsing datetime pandas time

【解决方案1】：

你可以使用：

import pandas as pd


df = pd.read_csv('Chart_1.txt', sep='\t')
#get temperature to variable tempfrom third column
temp = df.columns[2]
print (temp)
Dry resultant temperature (°C)

#get aps to variable aps from second row and third column
aps = df.iloc[1, 2]
print (aps)
AE4854c_Campshill_openings reduced_communal areas increased openings2.aps

#create mask from first column - all values contains / - dates
mask = df.iloc[:, 0].str.contains('/',na=False)
#shift all rows to right NOT contain dates
df1 = df[~mask].shift(1, axis=1)
#get rows with dates
df2 = df[mask]
#concat df1 and df2, sort unsorted indexes
df = pd.concat([df1, df2]).sort_index()
#create new column names by assign
#first 3 are custom, other are from first row and fourth to end columns 
df.columns = ['date','time','no name'] + df.iloc[0, 3:].tolist()
#remove first 2 row
df = df[2:]
#fill NaN values in column date by forward filling
df.date = df.date.ffill()
#convert column to datetime
df.date = pd.to_datetime(df.date, format='%a, %d/%b')
#replace 30 minutes to 00
df.time = df.time.str.replace(':30', ':00')

print (df.head())
       date   time no name 3F_T09_SE_SW_Bed1 GF_office_S GF_office_W_tea  \
2 1900-01-01  00:00   11.94             11.47       14.72           16.66   
3 1900-01-01  01:00   12.00             11.63       14.83           16.69   
4 1900-01-01  02:00   12.04             11.73       14.85           16.68   
5 1900-01-01  03:00   12.06             11.80       14.83           16.65   
6 1900-01-01  04:00   12.08             11.84       14.79           16.62   

  GF_Act.Room GF_Communal areas GF_Reception GF_Ent Lobby   ...    \
2       17.41             12.74        12.93        10.85   ...     
3       17.45             12.74        13.14        11.00   ...     
4       17.44             12.71        13.23        11.09   ...     
5       17.41             12.68        13.27        11.16   ...     
6       17.36             12.65        13.28        11.21   ...     

  2F_S01_SE_SW_Bedroom 2F_S01_SE_SW_Int Circ 2F_S01_SE_SW_Storage_int circ  \
2                12.58                 12.17                         12.54   
3                12.64                 12.22                         12.49   
4                12.68                 12.27                         12.48   
5                12.70                 12.30                         12.49   
6                12.71                 12.31                         12.51   

  GF_G01_SE_SW_Bedroom GF_G01_SE_SW_Storage_Bed 3F_T09_SE_SW_Bathroom  \
2                14.51                    14.61                 11.49   
3                14.55                    14.59                 11.50   
4                14.56                    14.59                 11.52   
5                14.55                    14.58                 11.54   
6                14.54                    14.57                 11.56   

  3F_T09_SE_SW_Circ 3F_T09_SE_SW_Storage_int circ GF_Lounge GF_Cafe  
2             11.52                         11.38     12.83   12.86  
3             11.56                         11.35     13.03   13.03  
4             11.61                         11.36     13.13   13.13  
5             11.65                         11.39     13.17   13.17  
6             11.68                         11.42     13.18   13.18  

[5 rows x 31 columns]

【讨论】：