【发布时间】:2023-02-05 23:12:22
【问题描述】:
我正在研究下面的 df
timestamp conversationId UserId MessageId tpMessage Message
1614578324 ceb9004ae9d3 1c376ef 5bbd34859329 question Where do you live?
1614578881 ceb9004ae9d3 1c376ef d3b5d3884152 answer Brooklyn
1614583764 ceb9004ae9d3 1c376ef 0e4501fcd61f question What's your name?
1614590885 ceb9004ae9d3 1c376ef 97d841b79ff7 answer Phill
1614594952 ceb9004ae9d3 1c376ef 11ed3fd24767 question What's your gender?
1614602036 ceb9004ae9d3 1c376ef 601538860004 answer Male
1614602581 ceb9004ae9d3 1c376ef 8bc8d9089609 question How old are you?
1614606219 ceb9004ae9d3 1c376ef a2bd45e64b7c answer 35
1614606240 jto9034pe0i5 1c489rl o6bd35e64b5j question What's your name?
1614606250 jto9034pe0i5 1c489rl 96jd89i55b72 answer Robert
1614606267 jto9034pe0i5 1c489rl 33yd1445d6ut answer Brandom
1614606287 jto9034pe0i5 1c489rl b7q489iae77t answer Connor
我需要根据 tpMessage 列“拆分”2 中的时间戳列,条件是:
df['ts_question'] = np.where(df['tpMessage']=='question', df['timestamp'],0)
df['ts_answer'] = np.where(df['tpMessage']=='answer', df['timestamp'],0)
当条件不匹配时,这为我提供了两列的“0”值,之后我陷入了如何前进的困境
我的目标是获得此输出:
ts_question ts_answer conversationId UserId
1614578324 1614578881 ceb9004ae9d3 1c376ef
1614583764 1614590885 ceb9004ae9d3 1c376ef
1614594952 1614602036 ceb9004ae9d3 1c376ef
1614602581 1614606219 ceb9004ae9d3 1c376ef
1614606240 1614606250 jto9034pe0i5 1c489rl
1614606240 1614606267 jto9034pe0i5 1c489rl
1614606240 1614606287 jto9034pe0i5 1c489rl
请注意,对于“你叫什么名字”这个问题,我可以有 1 个或多个答案?
【问题讨论】:
-
您可以使用 apply 函数并向其传递一个 lambda 函数,该函数将行作为参数。见here