python pandas日期时间输出日期相同答案

【问题标题】：python pandas date time output dates the samepython pandas日期时间输出日期相同
【发布时间】：2019-03-01 13:12:26
【问题描述】：

这段代码的目的是读取一个 CSV 文件，该文件有五列 ['Release Date', 'Time', 'Actual', 'Forecast', 'Previous'], 和 'Release Date' 列有两种日期形状：

• 2018 年 9 月 9 日（8 月）

• 2018 年 9 月 24 日

所以我不能因为日期形状不匹配而简单地抓取日期，所以我决定基于“发布日期”、“时间”列创建一个新列，然后将其添加到原始数据框中。

我试过这段代码：

import pandas as pd
df = pd.read_csv(r"C:\Users\Sayed\Desktop\script\data.csv")
for date, time in zip(df['Release Date'], df['Time']):
    Date = date[:12] + ' ' + time
    df['Date'] = Date
print(df.head())

但我得到了这个输出：

发布日期时间实际预测上一个日期

2018 年 10 月 15 日（9 月）21:30 0.5% 0.7% 1996 年 2 月 1 日 05:00

2018 年 9 月 9 日（8 月）21:30 0.7% 0.5% 0.3% 1996 年 2 月 1 日 05:00

2018年8月8日（7月）21:30 0.3% 0.2% -0.1% 1996年2月1日05:00

2018年7月9日（6月）21:30 -0.1% 0.1% -0.2% 1996年2月1日05:00

2018年6月8日（5月）21:30 -0.2% -0.1% -0.2% 1996年2月1日 05:00

【问题讨论】：

你没有告诉我们原始数据是什么样子的。
@IMCoins 相同但没有日期列
你希望你的输出是什么样子的？此外，使用循环并遍历数据帧的行也不是一个好主意。尝试改用df.apply。

标签： python

【解决方案1】：

这行代码：

df['Date'] = Date

在循环的每次迭代中，更改“日期”列中的每一行以接收最后一个日期输入的值。

尝试将其用作 lambda 函数。您也会注意到性能的提升：

def GetDate(row):
    return row['Release Date'][:12] + ' ' + row['Time']

df['Date'] = df.apply(lambda x: GetDate(x), axis=1)

【讨论】：

【解决方案2】：

您的循环是错误且不必要的。

试试这个：

df["Date"] = df["Release Date"].apply(lambda x: x[:12]) + " " + df["Time"]

【讨论】：

【解决方案3】：

我不喜欢 pandas 中的 .apply() 方法，因为它确实效率不高。

这是我的另一个解决方案，可帮助您有效地处理问题。我还做了一个基准测试来证明.apply() 确实效率低下。而当涉及到大数据时，您必须仅在必要时使用它。

df['Date'] = df.loc[:, 'Release Date'][:12] + ' ' + df['Time']

这一行的意思是：从 0 到 12 的所有索引（不包括），从“发布日期”列的所有行中，添加一个空格，添加“时间”列（隐含的意思是所有行） .

import pandas as pd
import timeit
from matplotlib import pyplot as plt

def IMCoins(df):
    df['Date'] = df.loc[:, 'Release Date'][:12] + ' ' + df['Time']

def petezurich(df):
    df['Date'] = df['Release Date'].apply(lambda x: x[:12]) + ' ' + df['Time']

def benchmark(x_ticks, time_arr_1, time_arr_2):
    """ Displays difference between all the time_arr.
    """
    X = range(len(time_arr_1))

    plt.figure()
    plt.plot(X, time_arr_1, marker='o', color='g', label='IMCoins')
    plt.plot(X, time_arr_2, marker='o', color='r', label='petezurich')
    plt.ylabel('Time in seconds')
    plt.xlabel('Number of elements to iterate on')
    plt.xticks( [nb for nb in range(len(x_ticks))], x_ticks, rotation=30)
    plt.legend()
    plt.tight_layout()
    plt.show()

if __name__ == '__main__':
    #   Iterations are the number of tests run by timeit.
    n_iter = 10

    #   Elements modifies the shape of the DataFrame
    n_elements = 10

    #   Number of time n_elements will get multiplied by factor.
    n_increase = 7
    factor = 10

    time_arr_1, time_arr_2, x_ticks = [], [], []
    for idx in range(n_increase):
        #   Preparing data inside the loop because we need to
        #   increase its size.
        data = {
            'Release Date' : ['a' * 20 for _ in range(n_elements)],
            'Time' : ['b' * 10 for _ in range(n_elements)]
        }
        df = pd.DataFrame(data)

        #   We check the both functions are giving the same results.
        assert IMCoins(df) == petezurich(df), 'results are different'

        t1 = timeit.timeit(stmt = 'IMCoins(df)',
                           setup = 'from __main__ import df, IMCoins',
                           number= n_iter)
        time_arr_1.append(t1)

        t2 = timeit.timeit(stmt = 'petezurich(df)',
                           setup = 'from __main__ import df, petezurich',
                           number = n_iter)
        time_arr_2.append(t2)

        #   We want to correctly display the number of elements computer on
        #   some later plots.
        x_ticks.append(n_elements)

        # In order to increase the data...
        n_elements *= factorx

    benchmark(x_ticks, time_arr_1, time_arr_2)

【讨论】：