替换嵌套列表中的 'NA'答案

【问题标题】：Replacing 'NA's in a nested list替换嵌套列表中的 'NA'
【发布时间】：2020-01-04 20:22:02
【问题描述】：

我正在尝试执行以下操作：确定嵌套列表中是否存在“NA”值，如果存在，则将其替换为列表中其他元素之和的平均值。列表的元素应该是浮点数。例如：

[["1.2","3.1","0.2"],["44.0","NA","90.0"]]

应该返回

[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]

下面的代码虽然冗长且冗长，但可以正常工作：

def convert_data(data):
    first = []
    second = []
    third = []
    fourth = []
    count = 0
    for i in data:
        for y in i:
            if 'NA' not in i:
                y = float(y)
                first.append(y)
            elif 'NA' in i:
                a = i.index('NA')
                second.append(y)
    second[a] = 0

    for q in second:
        q = float(q)
        third.append(q)
        count+= q

    length = len(third)
    count = count/(length-1)
    third[a] = count
    fourth.extend([first,third])
    return fourth

data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]]
convert_data(data)

例如：

data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]] 
convert_data(data)

返回所需的输出：

[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]

但如果“NA”在第一个列表中，例如

data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]

那么它不会。有人可以解释一下如何解决这个问题吗？

【问题讨论】：

有些人建议您需要导入第三方计算包来处理这样的简单计算，这太疯狂了。
@BenQuigley 然而，这完全取决于 OP 程序的其余部分。如果他们对明显是表格数据的内容执行大量操作，他们应该使用更合适的工具。
OP，我认为我们需要您提供更多信息才能正确回答此问题。除了我上面提到的关于你的程序和数据的其余部分，例如，子列表是否总是包含 3 个元素？
我忘了补充：如果子列表完全由 'NA' 值组成会怎样？当我们讨论这个话题时，像这样使用字符串'NA' 可能是个坏主意。
@AMC 它应该返回 0。 Bhosale Shrikant 分享的代码就像一个魅力。关于元素，它们不必在列表中始终为 3。我的确实有效，但每个列表不超过一个“NA”。

标签： python nested-loops nested-lists

【解决方案1】：

data_var = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]


def replace_na_with_mean(list_entry):
    for i in range(len(list_entry)):
        index_list = []
        m = 0
        while 'NA' in list_entry[i]:
            index_list.append(list_entry[i].index('NA') + m)
            del list_entry[i][list_entry[i].index('NA')]
        if list_entry[i]:
            for n in range(len(list_entry[i])):
                list_entry[i][n] = float(list_entry[i][n])
        if index_list:
            if list_entry[i]:
                avg = sum(list_entry[i]) / len(list_entry[i])
            else:
                avg = 0
            for l in index_list:
                list_entry[i].insert(l, avg)
    return list_entry


print(replace_na_with_mean(data_var))

【讨论】：

【解决方案2】：

我建议使用 pandas 的功能，因为这些类型的操作正是 pandas 的开发目的。只需几行代码即可轻松实现您想要的：

import pandas as pd
data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]
df = pd.DataFrame(data).T.replace("NA", pd.np.nan).astype('<f8')
res = df.fillna(df.mean()).T.values.tolist()

返回想要的输出：

[[1.2, 0.7, 0.2], [44.0, 67.0, 90.0]]

顺便说一句，在这种简单的情况下，您的代码对我来说效果很好：

convert_data(data)
> [[44.0, 67.0, 90.0], [1.2, 0.7, 0.2]]

在更复杂的情况下，它肯定会开始失败或给出错误的结果，例如如果嵌套列表中有超过 1 个 "NA" 值，则会出现 ValueError 异常（您将尝试将字符串转换为浮点数）。

【讨论】：

【解决方案3】：

这应该可以解决问题，使用 numpy:

import numpy as np

x=[["1.2","3.1","0.2"],["44.0","NA","90.0"]] 

#convert to float
x=np.char.replace(np.array(x), "NA", "nan").astype(np.float)

#replace nan-s with mean
mask=x.astype(str)=="nan"
x[mask]=np.nanmean(x, axis=1)[mask.any(axis=1)]

输出：

[[ 1.2  3.1  0.2]
 [44.  67.  90. ]]

【讨论】：

【解决方案4】：

您的代码最终有点过于复杂的一个原因是您试图从解决“嵌套列表”问题开始。但实际上，您只需要一个函数来处理具有一些“NA”值的数字字符串列表，然后您可以将该函数应用于列表中的每个项目。

def float_or_average(list_of_num_strings):
    # First, convert every item that you can to a number. You need to do this
    # before you can handle even ONE "NA" value, because the "NA" values need
    # to be replaced with the average of all the numbers in the collection.
    # So for now, convert ["1.2", "NA", "2.0"] to [1.2, "NA", 2.0]

    parsed = []

    # While we're at it, let's record the sum of the floats and their count,
    # so that we can compute that average.
    numeric_sum = 0.0
    numeric_count = 0

    for item in list_of_num_strings:
        if item == "NA":
            parsed.append(item)
        else:
            floating_point_value = float(item)
            parsed.append(floating_point_value)
            numeric_sum += floating_point_value
            numeric_count += 1
    # Now we can calculate the average:
    average = numeric_sum / numeric_count

    # And replace the "NA" values with them.
    for i, item in enumerate(parsed):
        if item == "NA":
            parsed[i] == average
    return parsed
    # Or, with a list comprehension (replacing the previous four lines of
    # code):
    return [number if number != "NA" else average for number in parsed]


# Using this function on a nested list is as easy as
example_data = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]
parsed_nested_list = []
for sublist in example_data:
    parsed_nested_list.append(float_or_average(sublist))
# Or, using a list comprehension (replacing the previous three lines of code):
parsed_nested_list = [float_or_average(sublist) for sublist in example_data]

【讨论】：

【解决方案5】：

def convert_data(data):

    for lst in data:

        sum = 0
        index_na = list()

        for elem in range(len(lst)):

            if lst[elem] != 'NA':
                sum += float(lst[elem])
                lst[elem] = float(lst[elem])
            else:
                index_na.append(elem)

        if len(index_na) > 0:
            len_values = sum / (len(lst)-len(index_na))

            for i in index_na:
                lst[i] = float("{0:.2f}".format(len_values))

    return data

【讨论】：