从 txt 文件读取时按制表符分隔无法正常工作答案

【问题标题】：Separation by tab when reading from txt-file not working properly从 txt 文件读取时按制表符分隔无法正常工作
【发布时间】：2021-11-23 18:36:04
【问题描述】：

我的主程序中有这段代码：

def readfile(file):
  with open(file, encoding="utf-8") as file:
    list = []
    for row in file:
      temp = row.split("\t")
      temp[1] = temp[1].strip()
      list.append(temp)
    return list

我想读取这种格式的 txt 文件：

21-10-22  2348.84
21-10-25  2330.13
21-10-26  2344.20
21-10-27  2323.17
21-10-28  2313.24
21-10-29  2290.85
21-11-01  2302.26
21-11-02  2302.67
21-11-03  2317.67
21-11-04  2330.07
21-11-05  2324.90
21-11-08  2331.84
21-11-09  2327.12
21-11-10  2331.42
21-11-11  2346.46
21-11-12  2365.45
21-11-15  2374.47
21-11-16  2385.63
21-11-17  2384.10
21-11-18  2373.04
21-11-19  2373.92
21-11-22  2368.71

我想返回一个列表，其中包含文本文件右侧列中的每个值。但是当我做 print(readfile("file.txt")) 它只是打印 "['21-10-22 2348.84\n']" 然后

Traceback (most recent call last):
  File "main.py", line 51, in readfile
    temp[1] = temp[1].strip()
IndexError: list index out of range

为什么只有第一行存储在列表中？代码中的错误是什么？没找到……

【问题讨论】：

看起来至少其中一行不包含 \t 字符，因此 row.split("\t") 返回长度为 1 的列表。这意味着temp[1] 不存在。
你确定有标签吗？异常在第一行引发。
好吧，给出的示例文本文件没有制表符，但双倍空格作为分隔符。要么更改文本文件以使用选项卡，要么只使用 row.split() 并查看。
这能回答你的问题吗？ Python: Splitting txt file by tab

标签： python

【解决方案1】：

使用不带参数的.split() 会在所有空白处拆分行。这也会在最后去除换行符。因此，您可以将代码简化为：

def readfile(file):
  with open(file, encoding="utf-8") as file:
    list = []
    for row in file:
      temp = row.split()
      list.append(temp[1])
    return list

【讨论】：

【解决方案2】：

使用row.split() 而不是row.split("\t")。 split() 没有参数的函数，负责处理空白（正是您需要的）。文件中好像没有\t。

来自docs：

如果 sep 没有指定或者是None，一个不同的分割算法是应用：连续空白的运行被视为单个分隔符，结果开头不包含空字符串如果字符串有前导或尾随空格，则结束。

此外，您需要修改（和简化）您的代码：

def readfile(file):
    with open(file, encoding="utf-8") as file:
        return [line.split()[1] for line in file]

print(readfile('txttt.txt'))

_{* 不要为变量使用“list”等内置名称。}

【讨论】：

【解决方案3】：

您可以使用列表理解方法。

def readfile(file):
  with open(file, encoding="utf-8") as file:
    return [row.split()[1] for row in file]

[输出]

['2348.84', '2330.13', '2344.20', '2323.17', '2313.24', '2290.85', '2302.26', '2302.67', '2317.67', '2330.07', '2324.90', '2331.84', '2327.12', '2331.42', '2346.46', '2365.45', '2374.47', '2385.63', '2384.10', '2373.04', '2373.92', '2368.71']

【讨论】：

【解决方案4】：

这是您需要的更改：split 空格而不是 \t，请参阅下面的更新代码。也永远不要使用像 list 这样的内置函数作为变量名

def readfile(file):
  with open(file, encoding="utf-8") as file:
    out_list = []
    for row in file:
      temp = row.split()    # Split on space rather than \t
      temp[1] = temp[1].strip()
      out_list.append(temp[1])
    return out_list

【讨论】：

这将根据需要返回 ['21-10-22', '2348.84']，而不是仅返回右侧列 (OP)。
更新了代码，谢谢指出..