【问题标题】:Modifying column of 2d list while iterating over it in python在 python 中迭代时修改 2d 列表的列
【发布时间】:2017-01-09 16:38:04
【问题描述】:

我正在尝试编写一个函数,将数据集中的所有非数字列转换为数字形式。

数据集是列表的列表。

这是我的代码:

def handle_non_numerical_data(data):
    def convert_to_numbers(data, index):
        items = []
        column = [line[0] for line in data]
        for item in column:
            if item not in items:
                items.append(item)
        [line[0] = items.index(line[0]) for line in data]
        return new_data

    for value in data[0]:
        if isinstance(value, str):
            convert_to_numbers(data, data[0].index(value))

显然[line[0] = items.index(line[0]) for line in data] 的语法无效,我无法弄清楚如何在迭代第一列数据时对其进行修改。

我不能使用 numpy,因为在这个函数运行之前数据不会是数字形式。

我该怎么做?为什么这么复杂?我觉得这应该比它简单得多......

也就是说,我想转这个:

[[M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

进入这个:

[[0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[0,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[1,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

请注意,第一列已从字符串更改为数字。

【问题讨论】:

  • 只需使用您的列表理解创建一个新列表并替换旧列表 my_list = [comprehension for row in my_list]
  • 您需要将起始[移动到items之前,即line[0] = [items.index(line[0]) for line in data]。但我想知道熊猫是否可以更有效地处理这个问题。如果您共享输入数据和所需输出的 ​​sn-p,那将有所帮助。
  • 要理解你想要做什么并不容易。你在哪里转换成数字?你是说如果它是一个字符串,你希望它被替换为该字符串的索引,否则保持原样?
  • 你能用示例输入和预期输出编辑问题吗?
  • @N1B4I 已添加输入和预期输出

标签: python list type-conversion


【解决方案1】:

解决方案

data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
        ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
        ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

values = {'M': 0, 'F': 1}

new_data = [[values.get(val, val) for val in line] for line in data]
new_data

输出:

[[0, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15],
 [0, 0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7],
 [1, 0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9]]

说明

您可以利用 Python 字典及其 get 方法。

这些是字符串的值:

values = {'M': 0, 'F': 1}

您还可以添加更多字符串,例如I,并带有相应的值。

如果字符串是values,你会从dict中获取值:

>>> values.get('M', 'M')
0 

否则,你会得到原来的值:

>>> values.get(10, 10)
10

【讨论】:

  • 这会将行中的所有'M''F' 替换为其预定值。在某些情况下可能是不希望的结果。也无法通过提供唯一整数来适应未知字符串。
  • @LoganByers 还有什么意义? MF 都在索引 0 处。您可以随时向values 添加更多键值对,例如{'M': 0, 'F': 1, 'I': 2}.
  • 这没什么大不了的,但如果另一列编码为{'A', 'B', 'C', 'D', 'E', 'F', ...},则该列中的F 将被替换。
  • 也许这是需要的。 OP:“......将数据集中的所有非数字列转换为数字形式。”
【解决方案2】:

除了索引(我不确定它在您的示例中应该如何工作)之外,您还可以创建一个字母到数字的字典映射。这样的事情应该可以工作。

raw_data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
            ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
            ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

def handle_non_numerical_data(data):
    mapping = {'M': 0, 'F': 1, 'I': 2}

    for item in raw_data:
        if isinstance(item[0], str):
            item[0] = mapping.get(item[0], -1) # Returns -1 if letter not found
    return data

run = handle_non_numerical_data(raw_data)
print(run)

【讨论】:

    【解决方案3】:

    此答案将使用dict 将编码从str 存储到int。可以预加载,也可以在数据替换后进行调查。

    # MODIFIES DATA IN PLACE
    data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
            ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
            ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
    
    coding_dict = {} # can also preload this {'M': 0, 'F':1}
    for row in data:
        if row[0] not in coding_dict:
            coding_dict[row[0]] = len(coding_dict)
        row[0] = coding_dict[row[0]]
    

    【讨论】:

      猜你喜欢
      • 2018-10-12
      • 1970-01-01
      • 2020-09-07
      • 2014-09-17
      • 2017-12-05
      • 1970-01-01
      • 1970-01-01
      • 2021-08-11
      • 2019-04-04
      相关资源
      最近更新 更多