【问题标题】:Extracting items of different layouts from lists从列表中提取不同布局的项目
【发布时间】:2019-01-17 13:09:49
【问题描述】:

我有一个来自 Linux 程序的奇怪文件;例如第一行是:

 1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000

我只需要从每一行中提取两个值:

191340
725670
1.4378e+06
2.178e+06
.... etc

1.00000
2.00000
3.00000
4.00000
.... etc

这段代码:

import csv
with open('NGC1365GaiaPhotomLogTestTenLines.dat', "rb") as infile:
read = csv.reader(infile)
    for row in read :
        print (row)

生成:

['         1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000']
['         2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000']
['         3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000']
['         4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000']
['         5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000']
['         6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000']
['         7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000']
['         8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000']
['         9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000']
['        10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000']

问题是生成的列表不是用逗号分隔的好项目 - 输入文件中的项目由空格分隔,并且空格的数量可能会有所不同,因为第一列中值的格式也可能会有所不同。

虽然我不会很难,但我已经咨询了很多线程,但一无所获。

【问题讨论】:

  • 只需line.split()。注意csv.reader 在这种情况下是多余的。

标签: python list extract items


【解决方案1】:

与此处的其他答案相反,我认为您应该使用csv 模块。如果您的文件包含标题或引用字段,那么您会比事后尝试修改自定义解决方案更快乐:

with open('filename') as infile:
    r = csv.reader(infile, delimiter=' ', skipinitialspace=True)
    for row in r:
        print(row)

您的文件看起来可能在您的计算机上以制表符分隔。在这种情况下,您可以将上面的 delimiter=' ' 更改为 delimiter='\t'

您也可以使用,它具有更通用的空白模式

df = pd.read_csv("filename", header=None, delim_whitespace=True)

【讨论】:

  • 不,绝对是空格,注意这些值是如何右对齐的。
【解决方案2】:

感谢@Eugen Constantin Dinca 和@tobias_k 简化代码

with open('csv.dat', "rb") as infile:
  for row in infile:
    print row.split()

输出:

['1', '1011.720000', '1830.340000', '0', '0', '0', '191340', '?', '1.000000']
['2', '1011.720000', '1830.340000', '0', '0', '0', '725670', '?', '2.000000']
['3', '1011.720000', '1830.340000', '0', '0', '0', '1.4378e+06', '?', '3.000000']
['4', '1011.720000', '1830.340000', '0', '0', '0', '2.178e+06', '?', '4.000000']
['5', '1011.720000', '1830.340000', '0', '0', '0', '2.8806e+06', '?', '5.000000']
['6', '1011.720000', '1830.340000', '0', '0', '0', '3.5353e+06', '?', '6.000000']
['7', '1011.720000', '1830.340000', '0', '0', '0', '4.1598e+06', '?', '7.000000']
['8', '1011.720000', '1830.340000', '0', '0', '0', '4.7729e+06', '?', '8.000000']
['9', '1011.720000', '1830.340000', '0', '0', '0', '5.3924e+06', '?', '9.000000']
['10', '1011.720000', '1830.340000', '0', '0', '0', '6.0281e+06', '?', '10.000000']

【讨论】:

  • 请注意,至少在 Python 2 中,调用 row.split() 就足够了,就像 sep is not specified or is None 然后 runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. (来自 docs.python.org/2/library/stdtypes.html#str.split
  • @EugenConstantinDinca 谢谢,你是对的。我不是蟒蛇人
  • 在这里使用csv.reader 真的没有意义。直接阅读for row in infile:的行并拆分print(row.split())
【解决方案3】:

这是你可以使用的代码

还有几点关于您的代码csv.reader 是矫枉过正。一切都是使用简单的内置插件完成的——没有外部依赖。

同时使用像read 这样的变量名也不是一个好主意。

lines = """1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000"""

for line in lines.split("\n"):
    toks = line.split() # This should split the line into tokens separated by one or more white space characters. 

    if len(toks) == 9: # Just to make sure there are enough tokens. 
        # do whatever you want
        print (toks[6])

【讨论】:

  • 好吧 - 是的,但它不是“内置”,同意它来自标准库。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-09-22
  • 2018-07-15
  • 1970-01-01
  • 1970-01-01
  • 2020-06-01
  • 1970-01-01
  • 2016-09-13
相关资源
最近更新 更多