【问题标题】:Comparing two columns and filtering columns with neighboring classes比较两列并过滤列与相邻类
【发布时间】:2020-06-17 22:08:49
【问题描述】:

所以这里的课程来自“八点二十”。这些数字是用字符写的……当分类器预测类时,我得到了一个表,其中预测不等于实际值。现在我想要一张表,其中分类器错过了一个相邻类的类。例如从上表我只想要列

    predictions   actual
8013  fifteen     sixteen
5146  sixteen     seventeen
5691  seventeen   sixteen
13255 sixteen     fifteen
13921 thirteen    fourteen
13077 fourteen    fifteen

【问题讨论】:

    标签: python pandas indexing filter


    【解决方案1】:

    您可以使用以下代码将字符串中的数字更改为 int: Is there a way to convert number words to Integers?

    或者,如果您的范围有限,可以使用两本词典手动完成 喜欢

    prev_dict = {'sixteen':'fifteen', 'seventeen'}
    next_dict = {'sixteen':'seventeen'}
    

    然后:

    predict[(predict['prediction'] == predict['actual'].map(prev_dict)) | (predict['prediction'] == predict['actual'].map(next_dict))]
    

    【讨论】:

    • 太棒了!这肯定以一种方式回答了我的问题。但是,如果我将单词转换为整数,我可以得到代码如何实现这一点,我应该从那一点得到结果
    • 另外,当我运行该代码时,我收到错误“系列”对象是可变的,因此它们不能被散列。为什么会这样?提前谢谢您
    • 我做错了,只成功了两个字典,错误是由于字典没有识别键,所以试图添加系列作为键。我编辑了我的答案,但 jezrael 的答案更好
    【解决方案2】:

    使用boolean indexing 将两列都转换为数字并过滤添加1 并从actual 列链接的| 中减去1 用于按位ORSeries.eq 用于检查是否相等的值:

    print (df)
          predictions     actual
    8013      fifteen     twenty
    5146      sixteen  seventeen
    5691    seventeen    sixteen
    13255     sixteen    fifteen
    13921    nineteen   fourteen
    13077    fourteen    fifteen
    

    #https://stackoverflow.com/a/493788/2901002
    def text2int(textnum, numwords={}):
        if not numwords:
          units = [
            "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
            "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
            "sixteen", "seventeen", "eighteen", "nineteen",
          ]
    
          tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
    
          scales = ["hundred", "thousand", "million", "billion", "trillion"]
    
          numwords["and"] = (1, 0)
          for idx, word in enumerate(units):    numwords[word] = (1, idx)
          for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
          for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)
    
        current = result = 0
        for word in textnum.split():
            if word not in numwords:
              raise Exception("Illegal word: " + word)
    
            scale, increment = numwords[word]
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
    
        return result + current
    

    p = df['predictions'].apply(text2int) 
    a = df['actual'].apply(text2int) 
    
    df1 = df[p.eq(a+1) | p.eq(a-1)]
    

    或者:

    df1 = df[(p == a+1) | (p == a-1)]
    

    print (df1)
          predictions     actual
    5146      sixteen  seventeen
    5691    seventeen    sixteen
    13255     sixteen    fifteen
    13077    fourteen    fifteen
    

    【讨论】:

      猜你喜欢
      • 2013-10-24
      • 1970-01-01
      • 2020-01-18
      • 1970-01-01
      • 2020-08-25
      • 1970-01-01
      • 2015-09-09
      • 2017-04-03
      • 2022-11-27
      相关资源
      最近更新 更多