【问题标题】:Creating a column in a dataframe based on inputs of other columns根据其他列的输入在数据框中创建列
【发布时间】:2020-07-10 23:15:54
【问题描述】:

我想在下面的 df 上创建一个新列,一旦 C>55 天高点返回 1。每个后续行也将显示 1,即使 C

df = pd.DataFrame([
    ['2020-01-01 01:01:00', 7147.69, 7163.32, 7147.69],
    ['2020-01-01 01:02:00', 7158.31, 7163.32, 7147.69],
    ['2020-01-01 01:03:00', 7157.08, 7163.32, 7147.69],
    ['2020-01-01 01:04:00', 7157.01, 7163.32, 7147.69],
    ['2020-01-01 01:05:00', 7159.85, 7163.32, 7147.69],
    ['2020-01-01 01:06:00', 7161.29, 7163.32, 7147.69],
    ['2020-01-01 01:07:00', 7161.29, 7163.32, 7147.69],
    ['2020-01-01 01:08:00', 7161.28, 7162.03, 7147.69],
    ['2020-01-01 01:09:00', 7161.29, 7162.03, 7147.69],
], columns=['date', 'C', '55 day high', '20 day low'])

数据帧:

                               C  55 day high  20 day low
date                                                 
2020-01-01 01:01:00  7147.69      7163.32     7147.69
2020-01-01 01:02:00  7158.31      7163.32     7147.69
2020-01-01 01:03:00  7157.08      7163.32     7147.69
2020-01-01 01:04:00  7157.01      7163.32     7147.69
2020-01-01 01:05:00  7159.85      7163.32     7147.69
2020-01-01 01:06:00  7161.29      7163.32     7147.69
2020-01-01 01:07:00  7161.29      7163.32     7147.69
2020-01-01 01:08:00  7161.28      7162.03     7147.69
2020-01-01 01:09:00  7161.29      7162.03     7147.69

【问题讨论】:

  • 请附上所需的输出结果集。

标签: python function dataframe


【解决方案1】:

我想不出办法做到这一点apply(),但您可以使用iterrows() 循环遍历带有条件的行,输出结果列表,然后将其放入新的数据框列。请注意,我已更改课程数据以模拟您描述的开始和停止条件:

import pandas as pd 
df = pd.DataFrame([
    ['2020-01-01 01:01:00', 7147.69, 7163.32, 7147.69],
    ['2020-01-01 01:02:00', 7158.31, 7163.32, 7147.69],
    ['2020-01-01 01:03:00', 7164.08, 7163.32, 7147.69],
    ['2020-01-01 01:04:00', 7157.01, 7163.32, 7147.69],
    ['2020-01-01 01:05:00', 7159.85, 7163.32, 7147.69],
    ['2020-01-01 01:06:00', 7161.29, 7163.32, 7147.69],
    ['2020-01-01 01:07:00', 7161.29, 7163.32, 7147.69],
    ['2020-01-01 01:08:00', 7145.28, 7162.03, 7147.69],
    ['2020-01-01 01:09:00', 7161.29, 7162.03, 7147.69],
], columns=['date', 'C', '55 day high', '20 day low'])

new_col = []
state = 0
for row in df.iterrows():
    if row[1]['C'] > row[1]['55 day high']:
        state = 1
    if row[1]['C'] < row[1]['20 day low']:
        state = 0
    new_col.append(state)

df['result'] = new_col
df

date    C   55 day high 20 day low  result
0   2020-01-01 01:01:00 7147.69 7163.32 7147.69 0
1   2020-01-01 01:02:00 7158.31 7163.32 7147.69 0
2   2020-01-01 01:03:00 7164.08 7163.32 7147.69 1
3   2020-01-01 01:04:00 7157.01 7163.32 7147.69 1
4   2020-01-01 01:05:00 7159.85 7163.32 7147.69 1
5   2020-01-01 01:06:00 7161.29 7163.32 7147.69 1
6   2020-01-01 01:07:00 7161.29 7163.32 7147.69 1
7   2020-01-01 01:08:00 7145.28 7162.03 7147.69 0
8   2020-01-01 01:09:00 7161.29 7162.03 7147.69 0

【讨论】:

    【解决方案2】:

    这可能会帮助您解决问题。不确定我的逻辑是否在这里,但希望这能让你更接近解决你的问题。

    def logic(C,H,L):
        if (C > H and C < L):
            return(1)
        if (C < L):
            return(0)
        else:
            return(-1)
    for C, H, L in df[['C', '55 day high', '20 day low']].itertuples(index=False):
        print(logic(C,H,L)
    

    【讨论】:

      猜你喜欢
      • 2017-01-03
      • 1970-01-01
      • 2020-10-03
      • 1970-01-01
      • 2021-10-03
      • 2021-09-15
      • 1970-01-01
      • 2021-08-27
      • 1970-01-01
      相关资源
      最近更新 更多