比较两列并过滤列与相邻类答案

【问题标题】：Comparing two columns and filtering columns with neighboring classes比较两列并过滤列与相邻类
【发布时间】：2020-06-17 22:08:49
【问题描述】：

所以这里的课程来自“八点二十”。这些数字是用字符写的……当分类器预测类时，我得到了一个表，其中预测不等于实际值。现在我想要一张表，其中分类器错过了一个相邻类的类。例如从上表我只想要列

    predictions   actual
8013  fifteen     sixteen
5146  sixteen     seventeen
5691  seventeen   sixteen
13255 sixteen     fifteen
13921 thirteen    fourteen
13077 fourteen    fifteen

【问题讨论】：

标签： python pandas indexing filter

【解决方案1】：

您可以使用以下代码将字符串中的数字更改为 int： Is there a way to convert number words to Integers?

或者，如果您的范围有限，可以使用两本词典手动完成喜欢

prev_dict = {'sixteen':'fifteen', 'seventeen'}
next_dict = {'sixteen':'seventeen'}

然后：

predict[(predict['prediction'] == predict['actual'].map(prev_dict)) | (predict['prediction'] == predict['actual'].map(next_dict))]

【讨论】：

太棒了！这肯定以一种方式回答了我的问题。但是，如果我将单词转换为整数，我可以得到代码如何实现这一点，我应该从那一点得到结果
另外，当我运行该代码时，我收到错误“系列”对象是可变的，因此它们不能被散列。为什么会这样？提前谢谢您
我做错了，只成功了两个字典，错误是由于字典没有识别键，所以试图添加系列作为键。我编辑了我的答案，但 jezrael 的答案更好

【解决方案2】：

使用boolean indexing 将两列都转换为数字并过滤添加1 并从actual 列链接的| 中减去1 用于按位OR，Series.eq 用于检查是否相等的值：

print (df)
      predictions     actual
8013      fifteen     twenty
5146      sixteen  seventeen
5691    seventeen    sixteen
13255     sixteen    fifteen
13921    nineteen   fourteen
13077    fourteen    fifteen

#https://stackoverflow.com/a/493788/2901002
def text2int(textnum, numwords={}):
    if not numwords:
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
      ]

      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      numwords["and"] = (1, 0)
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    current = result = 0
    for word in textnum.split():
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]
        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

p = df['predictions'].apply(text2int) 
a = df['actual'].apply(text2int) 

df1 = df[p.eq(a+1) | p.eq(a-1)]

或者：

df1 = df[(p == a+1) | (p == a-1)]

print (df1)
      predictions     actual
5146      sixteen  seventeen
5691    seventeen    sixteen
13255     sixteen    fifteen
13077    fourteen    fifteen

【讨论】：