【问题标题】:Replacing 'NA's in a nested list替换嵌套列表中的 'NA'
【发布时间】:2020-01-04 20:22:02
【问题描述】:

我正在尝试执行以下操作:确定嵌套列表中是否存在“NA”值,如果存在,则将其替换为列表中其他元素之和的平均值。列表的元素应该是浮点数。例如:

[["1.2","3.1","0.2"],["44.0","NA","90.0"]] 

应该返回

[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]

下面的代码虽然冗长且冗长,但可以正常工作:

def convert_data(data):
    first = []
    second = []
    third = []
    fourth = []
    count = 0
    for i in data:
        for y in i:
            if 'NA' not in i:
                y = float(y)
                first.append(y)
            elif 'NA' in i:
                a = i.index('NA')
                second.append(y)
    second[a] = 0

    for q in second:
        q = float(q)
        third.append(q)
        count+= q

    length = len(third)
    count = count/(length-1)
    third[a] = count
    fourth.extend([first,third])
    return fourth

data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]]
convert_data(data)

例如:

data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]] 
convert_data(data)

返回所需的输出:

[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]

但如果“NA”在第一个列表中,例如

data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]

那么它不会。有人可以解释一下如何解决这个问题吗?

【问题讨论】:

  • 有些人建议您需要导入第三方计算包来处理这样的简单计算,这太疯狂了。
  • @BenQuigley 然而,这完全取决于 OP 程序的其余部分。如果他们对明显是表格数据的内容执行大量操作,他们应该使用更合适的工具。
  • OP,我认为我们需要您提供更多信息才能正确回答此问题。除了我上面提到的关于你的程序和数据的其余部分,例如,子列表是否总是包含 3 个元素?
  • 我忘了补充:如果子列表完全由 'NA' 值组成会怎样?当我们讨论这个话题时,像这样使用字符串'NA' 可能是个坏主意。
  • @AMC 它应该返回 0。 Bhosale Shrikant 分享的代码就像一个魅力。关于元素,它们不必在列表中始终为 3。我的确实有效,但每个列表不超过一个“NA”。

标签: python nested-loops nested-lists


【解决方案1】:
data_var = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]


def replace_na_with_mean(list_entry):
    for i in range(len(list_entry)):
        index_list = []
        m = 0
        while 'NA' in list_entry[i]:
            index_list.append(list_entry[i].index('NA') + m)
            del list_entry[i][list_entry[i].index('NA')]
        if list_entry[i]:
            for n in range(len(list_entry[i])):
                list_entry[i][n] = float(list_entry[i][n])
        if index_list:
            if list_entry[i]:
                avg = sum(list_entry[i]) / len(list_entry[i])
            else:
                avg = 0
            for l in index_list:
                list_entry[i].insert(l, avg)
    return list_entry


print(replace_na_with_mean(data_var))

【讨论】:

    【解决方案2】:

    我建议使用 pandas 的功能,因为这些类型的操作正是 pandas 的开发目的。只需几行代码即可轻松实现您想要的:

    import pandas as pd
    data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]
    df = pd.DataFrame(data).T.replace("NA", pd.np.nan).astype('<f8')
    res = df.fillna(df.mean()).T.values.tolist()
    

    返回想要的输出:

    [[1.2, 0.7, 0.2], [44.0, 67.0, 90.0]]
    

    顺便说一句,在这种简单的情况下,您的代码对我来说效果很好:

    convert_data(data)
    > [[44.0, 67.0, 90.0], [1.2, 0.7, 0.2]]
    

    在更复杂的情况下,它肯定会开始失败或给出错误的结果,例如如果嵌套列表中有超过 1 个 "NA" 值,则会出现 ValueError 异常(您将尝试将字符串转换为浮点数)。

    【讨论】:

      【解决方案3】:

      这应该可以解决问题,使用 numpy:

      import numpy as np
      
      x=[["1.2","3.1","0.2"],["44.0","NA","90.0"]] 
      
      #convert to float
      x=np.char.replace(np.array(x), "NA", "nan").astype(np.float)
      
      #replace nan-s with mean
      mask=x.astype(str)=="nan"
      x[mask]=np.nanmean(x, axis=1)[mask.any(axis=1)]
      

      输出:

      [[ 1.2  3.1  0.2]
       [44.  67.  90. ]]
      

      【讨论】:

        【解决方案4】:

        您的代码最终有点过于复杂的一个原因是您试图从解决“嵌套列表”问题开始。但实际上,您只需要一个函数来处理具有一些“NA”值的数字字符串列表,然后您可以将该函数应用于列表中的每个项目。

        def float_or_average(list_of_num_strings):
            # First, convert every item that you can to a number. You need to do this
            # before you can handle even ONE "NA" value, because the "NA" values need
            # to be replaced with the average of all the numbers in the collection.
            # So for now, convert ["1.2", "NA", "2.0"] to [1.2, "NA", 2.0]
        
            parsed = []
        
            # While we're at it, let's record the sum of the floats and their count,
            # so that we can compute that average.
            numeric_sum = 0.0
            numeric_count = 0
        
            for item in list_of_num_strings:
                if item == "NA":
                    parsed.append(item)
                else:
                    floating_point_value = float(item)
                    parsed.append(floating_point_value)
                    numeric_sum += floating_point_value
                    numeric_count += 1
            # Now we can calculate the average:
            average = numeric_sum / numeric_count
        
            # And replace the "NA" values with them.
            for i, item in enumerate(parsed):
                if item == "NA":
                    parsed[i] == average
            return parsed
            # Or, with a list comprehension (replacing the previous four lines of
            # code):
            return [number if number != "NA" else average for number in parsed]
        
        
        # Using this function on a nested list is as easy as
        example_data = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]
        parsed_nested_list = []
        for sublist in example_data:
            parsed_nested_list.append(float_or_average(sublist))
        # Or, using a list comprehension (replacing the previous three lines of code):
        parsed_nested_list = [float_or_average(sublist) for sublist in example_data]
        

        【讨论】:

          【解决方案5】:
          def convert_data(data):
          
              for lst in data:
          
                  sum = 0
                  index_na = list()
          
                  for elem in range(len(lst)):
          
                      if lst[elem] != 'NA':
                          sum += float(lst[elem])
                          lst[elem] = float(lst[elem])
                      else:
                          index_na.append(elem)
          
                  if len(index_na) > 0:
                      len_values = sum / (len(lst)-len(index_na))
          
                      for i in index_na:
                          lst[i] = float("{0:.2f}".format(len_values))
          
              return data
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2018-08-16
            • 1970-01-01
            • 2012-10-30
            • 1970-01-01
            • 1970-01-01
            • 2016-04-03
            • 2020-07-01
            相关资源
            最近更新 更多