获取计数的总和并减去计数python答案

【问题标题】：Get the sum of the count and do the subtract of the count python获取计数的总和并减去计数python
【发布时间】：2018-10-17 08:02:50
【问题描述】：

我想得到关键字的计数总和减去相反单词的总和，然后返回句子。这是我所拥有的：

df = pd.read_excel('C:/Test.xlsx')
df.drop_duplicates(['Content'],inplace=True)
a = df['Content'].str.lower()
searchfor =['heating','lagging',... and 100+words]
opposite = ['no heating','no lagging',...and 100+words]
b = a[a.str.contains(searchfor)]
c = a[a.str.contains(opposite)]

例如在 Content 中，我有句子 ['The phone is heating but not lagging'，'The phone is not heating and not lagging'...] 第一句包含 searchfor 中的 2 个词和相反的 1 个词。第二句包含来自 searchfor 的 2 个词和来自相反的 2 个词。我想要做的是计算 searchfor 和 reverse 中单词的总和。然后（搜索中的关键字总和减去相反的关键字总和。如果为零，则返回句子。

这是我尝试过的，但它不起作用

d = c.str.split()
def check_it(sentences):
   find_words = []
   for word in searchfor:
        if word in sentences:
            find_words.append(d.count(word))
   return sentences
d = d.apply(lambda x:check_it(x))

再做一次 def 检查。它不起作用并给我错误。

如果有人能提供帮助，我将不胜感激

【问题讨论】：

能否提供错误信息？
它说“级别滞后必须与名称（无）相同”。但我在想也许我的 def check it 方法正在运行
我真正想要的是将搜索中的关键词和相反的关键词相加，然后做减法

标签： python pandas lambda split count

【解决方案1】：

[使用 Python 3，需要 Pandas]

最好查看您的实际数据样本，但是，我假设您的数据框会有如下所示的样本（如果不是这种情况，请纠正我）：

+-----+----------------------------------------+
|index|content                                 |
+-----+----------------------------------------+
|0    |the phone is heating but not lagging    |
|1    |the phone is not heating and not lagging|
+-----+----------------------------------------+

我们现在创建一个用作 lambda 的函数，如下所示：

def get_difference_of_keywords(content_string, searchfor, opposite):
    searchfor_matches = len([keyword for keyword in searchfor if keyword in content_string])
    opposite_matches = len([keyword for keyword in opposite if keyword in content_string])
    difference = searchfor_matches - opposite_matches
    if not difference == 0:
        return str(difference)
    return content_string

这使用 python 的列表推导来获取 'searchfor' 和 'opposite' 的匹配数，然后返回差异，如果差异不为零，或者如果差异等于零，则返回原始输入句子。

注意：我已将返回的数字从大于零的差值转换为字符串，以确保您在新列中没有混合数据类型。这是可选的，由您决定。

然后我们应用上面的：

df['get_difference_result'] = df.apply(
    lambda row: get_difference_of_keywords(row['content'], searchfor, opposite),
    axis=1
)

这将导致以下结果：

+-----+----------------------------------------+----------------------------------------+
|index|content                                 |get_difference_result                   
|
+-----+----------------------------------------+----------------------------------------+
|0    |the phone is heating but not lagging    |1                                       |
|1    |the phone is not heating and not lagging|the phone is not heating and not lagging|
+-----+----------------------------------------+----------------------------------------+

【讨论】：

这很有帮助。谢谢
很高兴我能帮上忙！如果这解决了您的问题，您介意接受我的回答吗？