【问题标题】:Changing a pandas dataframe column value according to conditions根据条件更改熊猫数据框列值
【发布时间】:2023-02-21 00:34:32
【问题描述】:

我有一个包含评论的熊猫数据框。对于每条评论,我都有不同的词和特定的分数如下:

import pandas as pd
df = pd.DataFrame({
    "review_num": [1,1,1,1,1,2,2,2],
    "review": ["This is the first review","This is the first review","This is the first review","This is the first review","This is the first review",
               "And another one","And another one","And another one"],
    "token_num":[1,2,3,4,5,1,2,3],
    "token":["This","is","the","first","review","And","another","one"],
    "score":[0.3,-0.6,0.5,0.4,0.2,-0.7,0.5,0.4]
})

#The initial dataframe====================================================
#   review_num                    review  token_num    token  score
#0           1  This is the first review          1     This    0.3
#1           1  This is the first review          2       is   -0.6
#2           1  This is the first review          3      the    0.5
#3           1  This is the first review          4    first    0.2
#4           1  This is the first review          5   review    0.4
#5           2           And another one          1      And   -0.7
#6           2           And another one          2  another    0.1
#7           2           And another one          3      one    0.5

我需要按照以下规则更改每条评论: 1- 每条评论改变得分最高的世界 2-如果得分最高的单词包含字符“t”,则将“t”替换为“f” 3-如果它不包含字符“t”则传递给后面的单词(分数最重要)

预期结果是以下数据框:


# == the modified df ============================================================
#  review_num            initial_review                     Modified_review
#0           1    This is the first review             This is the firsf review
#1           2           And another one                     And anofher one

有人可以帮我做这个吗? 谢谢

【问题讨论】:

  • 第一审改的词应该是“the”,而不是“first”。

标签: python pandas dataframe machine-learning


【解决方案1】:

可以预过滤token中带“t”的行,然后用groupby.idxmax得到得分最高的行,最后用列表推导式进行替换,join回到原来的样子:

m = df['token'].str.contains('t')
idx = df[m].groupby('review_num')['score'].idxmax()

out = df.loc[idx, ['review_num', 'review']].join(
    pd.DataFrame({'Modified_review': [txt.replace(w, w.replace('t', 'f'))
                                      for w, txt in zip(df.loc[idx, 'token'],
                                                    df.loc[idx, 'review'])]
                  }, index=idx)
)

输出:

   review_num                    review           Modified_review
2           1  This is the first review  This is fhe first review
6           2           And another one           And anofher one

【讨论】:

    猜你喜欢
    • 2023-02-02
    • 1970-01-01
    • 2019-06-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-28
    • 1970-01-01
    相关资源
    最近更新 更多