从列表中提取不同布局的项目答案

【问题标题】：Extracting items of different layouts from lists从列表中提取不同布局的项目
【发布时间】：2019-01-17 13:09:49
【问题描述】：

我有一个来自 Linux 程序的奇怪文件；例如第一行是：

 1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000

我只需要从每一行中提取两个值：

191340
725670
1.4378e+06
2.178e+06
.... etc

1.00000
2.00000
3.00000
4.00000
.... etc

这段代码：

import csv
with open('NGC1365GaiaPhotomLogTestTenLines.dat', "rb") as infile:
read = csv.reader(infile)
    for row in read :
        print (row)

生成：

['         1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000']
['         2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000']
['         3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000']
['         4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000']
['         5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000']
['         6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000']
['         7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000']
['         8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000']
['         9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000']
['        10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000']

问题是生成的列表不是用逗号分隔的好项目 - 输入文件中的项目由空格分隔，并且空格的数量可能会有所不同，因为第一列中值的格式也可能会有所不同。

虽然我不会很难，但我已经咨询了很多线程，但一无所获。

【问题讨论】：

只需line.split()。注意csv.reader 在这种情况下是多余的。

标签： python list extract items

【解决方案1】：

与此处的其他答案相反，我认为您应该使用csv 模块。如果您的文件包含标题或引用字段，那么您会比事后尝试修改自定义解决方案更快乐：

with open('filename') as infile:
    r = csv.reader(infile, delimiter=' ', skipinitialspace=True)
    for row in r:
        print(row)

您的文件看起来可能在您的计算机上以制表符分隔。在这种情况下，您可以将上面的 delimiter=' ' 更改为 delimiter='\t'。

您也可以使用pandas，它具有更通用的空白模式

df = pd.read_csv("filename", header=None, delim_whitespace=True)

【讨论】：

不，绝对是空格，注意这些值是如何右对齐的。

【解决方案2】：

感谢@Eugen Constantin Dinca 和@tobias_k 简化代码

with open('csv.dat', "rb") as infile:
  for row in infile:
    print row.split()

输出：

['1', '1011.720000', '1830.340000', '0', '0', '0', '191340', '?', '1.000000']
['2', '1011.720000', '1830.340000', '0', '0', '0', '725670', '?', '2.000000']
['3', '1011.720000', '1830.340000', '0', '0', '0', '1.4378e+06', '?', '3.000000']
['4', '1011.720000', '1830.340000', '0', '0', '0', '2.178e+06', '?', '4.000000']
['5', '1011.720000', '1830.340000', '0', '0', '0', '2.8806e+06', '?', '5.000000']
['6', '1011.720000', '1830.340000', '0', '0', '0', '3.5353e+06', '?', '6.000000']
['7', '1011.720000', '1830.340000', '0', '0', '0', '4.1598e+06', '?', '7.000000']
['8', '1011.720000', '1830.340000', '0', '0', '0', '4.7729e+06', '?', '8.000000']
['9', '1011.720000', '1830.340000', '0', '0', '0', '5.3924e+06', '?', '9.000000']
['10', '1011.720000', '1830.340000', '0', '0', '0', '6.0281e+06', '?', '10.000000']

【讨论】：

请注意，至少在 Python 2 中，调用 row.split() 就足够了，就像 sep is not specified or is None 然后 runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. （来自 docs.python.org/2/library/stdtypes.html#str.split）
@EugenConstantinDinca 谢谢，你是对的。我不是蟒蛇人
在这里使用csv.reader 真的没有意义。直接阅读for row in infile:的行并拆分print(row.split())

【解决方案3】：

这是你可以使用的代码

还有几点关于您的代码csv.reader 是矫枉过正。一切都是使用简单的内置插件完成的——没有外部依赖。

同时使用像read 这样的变量名也不是一个好主意。

lines = """1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000"""

for line in lines.split("\n"):
    toks = line.split() # This should split the line into tokens separated by one or more white space characters. 

    if len(toks) == 9: # Just to make sure there are enough tokens. 
        # do whatever you want
        print (toks[6])

【讨论】：

好吧 - 是的，但它不是“内置”，同意它来自标准库。