【问题标题】:How to Alter the Structure of a File into a Tabular Format?如何将文件的结构更改为表格格式?
【发布时间】:2015-12-10 00:37:39
【问题描述】:

我有一个文件包含以下数据:

输入:

Query= A1 bird
Hit= B1 owl
Score= 1.0 4.0 2.5
Hit= B2 bluejay
Score= 10.0 6.0 7.0
Query= A2 shark
Hit= C1 catshark
Score= 10.0 7.0 2.0
Query= A3 cat
Hit= D1 dog
Score= 7.0 2.0 1.0

我想编写一个程序来处理数据的结构,使其成为表格 (.csv) 格式...类似于以下内容:

输出:

Query = A1 bird, Hit= B1 owl, Score= 1.0 4.0 2.5 #The first query, hit, score 
Query = A1 bird, Hit= B2 bluejay, Score= 10.0 6.0 7.0 #The second hit and score associated with the first query
Query = A2 shark, Hit= C1 catshark, Score= 10.0 7.0 2.0 #The second query, hit, socre
Query = A3 cat, Hit= D1 dog, Score= 7.0 2.0 1.0 #The third query, hit, score

我尝试通过Takis 执行以下建议的解决方案:

with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL, 
                           fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
    key, value = line.split('=')
    data[key.strip()] = value.strip()
    if len(data.keys()) == 3:
        csvwriter.writerow(data)
        data = {}

问题: 如何使程序识别与每个查询相关的命中和分数,以便我可以将它们打印在一行中?如果查询下有多个命中和分数(关联),则打印查询、第二个命中和第二个分数。和下面的输出一模一样:

"A1 bird","B1 owl","1.0 4.0 2.5" #1st Query, its 1st Hit, its 1st Score
"A1 bird","B2 bluejay", "10.0 6.0 7.0" #1st Query, its 2nd Hit, its 2nd Score
"A2 shark","C1 catshark", "10.0 7.0 2.0"#2nd Query, 1st and only Hit, 1st and only Score
"A3 cat","D1 dog","7.0 2.0 1.0"#3d Query, 1st and only Hit, 1st and only Score  

有什么想法吗?

【问题讨论】:

    标签: python csv formatting format tabular


    【解决方案1】:

    我会使用csv 包中的DictWriter 类将解析后的数据写入CSV。没有错误处理,程序假定每个查询都会出现三个所需的数据项,尽管它们不需要为每个查询以相同的顺序给出。

    import csv
    
    with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
        fieldnames = ['Query', 'Hit', 'Score']
        csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL, 
                                   fieldnames=fieldnames)
        csvwriter.writeheader()
        data = {}
        for line in f:
            key, value = line.split('=')
            data[key.strip()] = value.strip()
            if len(data.keys()) == 3:
                csvwriter.writerow(data)
                data = {}
    

    【讨论】:

    • 非常感谢!这很整洁......我仍然很好奇如何让程序识别每个“查询”行下多次出现的“命中”和“得分”行。有办法解决这个问题吗?
    【解决方案2】:

    更改最后一行

    print line.rstrip("\n\r"), #print of the first score
    

    print line.rstrip("\n\r") #print of the first score
    

    (删除最后一个逗号)。

    如果要重复上一个查询,需要添加一些变量:

    query = None
    prev_query = None
    
    for line in file:
       if line.startswith("Query="):
          query = line.rstrip("\n\r")
          print query, #print of the query line
       elif line.startswith("Hit="):
          if not query:
              print prev_query,
          print line.rstrip("\n\r"), #print of the first hit
       elif line.startswith("Score="):
          print line.rstrip("\n\r") #print of the first score
          prev_query = query
          query = None
    

    【讨论】:

    • 太棒了!谢谢!如果程序有另一个命中和与之关联的分数,是否可以让程序在新行中再次输出查询行。像这样:A1 bird B1 owl 1.0 4.0 2.5
      A1 bird B2 bluejay 10.0 6.0 7.0
    • 效果惊人!我真的很感谢你的帮助......如果你不介意一个问题......我们如何打印没有"Query=""Hit=""Score="的行我尝试使用.split(),但不知道如何将其正确地合并到代码中。
    猜你喜欢
    • 2018-12-22
    • 1970-01-01
    • 1970-01-01
    • 2012-06-14
    • 2020-05-05
    • 2021-12-24
    • 2012-06-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多