【问题标题】:Break line in the text file into several columns for CSV将文本文件中的行拆分为 CSV 的几列
【发布时间】:2019-08-21 13:37:48
【问题描述】:

我有一个类似这样的文本文件。我想打破线条,这样我就可以获得单独的列,以便能够制作图表。

node name | requested bytes | total execution time | accelerator execution time | cpu execution time
prefix/up23/conv2d_transpose     37.75MB (100.00%, 15.34%),      150.71ms (100.00%, 4.83%),             0us (0.00%, 0.00%),      150.71ms (100.00%, 4.83%)
prefix/up20/conv2d_transpose       18.87MB (84.66%, 7.67%),       115.01ms (95.17%, 3.68%),             0us (0.00%, 0.00%),       115.01ms (95.17%, 3.68%)
prefix/up17/conv2d_transpose       18.87MB (76.99%, 7.67%),        91.43ms (91.49%, 2.93%),             0us (0.00%, 0.00%),        91.43ms (91.49%, 2.93%)
prefix/fres19/conv_b_1x3/Conv2D        2.10MB (69.33%, 0.85%),        46.41ms (88.56%, 1.49%),             0us (0.00%, 0.00%),        46.41ms (88.56%, 1.49%)
prefix/fres5/conv_b_3x1/Conv2D        2.10MB (68.47%, 0.85%),        44.63ms (87.07%, 1.43%),             0us (0.00%, 0.00%),        44.63ms (87.07%, 1.43%)
prefix/fres6/conv_a_3x1/Conv2D        2.10MB (67.62%, 0.85%),        40.19ms (85.64%, 1.29%),             0us (0.00%, 0.00%),        40.19ms (85.64%, 1.29%)
prefix/fres22/conv_a_3x1/Conv2D        2.10MB (66.77%, 0.85%),        39.97ms (84.36%, 1.28%),             0us (0.00%, 0.00%),        39.97ms (84.36%, 1.28%)
prefix/fres21/conv_a_3x1/Conv2D        2.10MB (65.92%, 0.85%),        38.85ms (83.08%, 1.24%),             0us (0.00%, 0.00%),        38.85ms (83.08%, 1.24%)
pref

我尝试了以下截图,但它给出的输出如下:

with open('file.txt','r') as inp:
    arr = []
    for f in inp:
        arr.append(f)
b = arr[514]
c = b.split(' ')

输出:

['prefix/up23/conv2d_transpose', '', '', '', '', '37.75MB', '(100.00%,', '15.34%),', '', '', '', '', '', '150.71ms', '(100.00%,', '4.83%),', '', '', '', '', '', '', '', '', '', '', '', '', '0us', '(0.00%,', '0.00%),', '', '', '', '', '', '150.71ms', '(100.00%,', '4.83%)\n']

请建议让数据进入 csv 中的不同列。

【问题讨论】:

  • 为什么不直接使用pandas
  • 对不起,我不知道如何在 pandas 中做到这一点。
  • 你可以这样做:c = b.split() 而不是c = b.split(' ')
  • @Ruturaj 谢谢。我没有得到 ' '。但是现在我应该如何将剩余的字符串添加到列中。
  • @ashutosh 我刚刚发布了答案

标签: python csv readlines


【解决方案1】:

你可以这样做:

with open('test.txt','r') as inp:
    for f in inp.readlines():
        print(f.split())

哪些打印:

['node', 'name', '|', 'requested', 'bytes', '|', 'total', 'execution', 'time', '|', 'accelerator', 'execution', 'time', '|', 'cpu', 'execution', 'time']
['prefix/up23/conv2d_transpose', '37.75MB', '(100.00%,', '15.34%),', '150.71ms', '(100.00%,', '4.83%),', '0us', '(0.00%,', '0.00%),', '150.71ms', '(100.00%,', '4.83%)']
['prefix/up20/conv2d_transpose', '18.87MB', '(84.66%,', '7.67%),', '115.01ms', '(95.17%,', '3.68%),', '0us', '(0.00%,', '0.00%),', '115.01ms', '(95.17%,', '3.68%)']
['prefix/up17/conv2d_transpose', '18.87MB', '(76.99%,', '7.67%),', '91.43ms', '(91.49%,', '2.93%),', '0us', '(0.00%,', '0.00%),', '91.43ms', '(91.49%,', '2.93%)']
['prefix/fres19/conv_b_1x3/Conv2D', '2.10MB', '(69.33%,', '0.85%),', '46.41ms', '(88.56%,', '1.49%),', '0us', '(0.00%,', '0.00%),', '46.41ms', '(88.56%,', '1.49%)']
['prefix/fres5/conv_b_3x1/Conv2D', '2.10MB', '(68.47%,', '0.85%),', '44.63ms', '(87.07%,', '1.43%),', '0us', '(0.00%,', '0.00%),', '44.63ms', '(87.07%,', '1.43%)']
['prefix/fres6/conv_a_3x1/Conv2D', '2.10MB', '(67.62%,', '0.85%),', '40.19ms', '(85.64%,', '1.29%),', '0us', '(0.00%,', '0.00%),', '40.19ms', '(85.64%,', '1.29%)']
['prefix/fres22/conv_a_3x1/Conv2D', '2.10MB', '(66.77%,', '0.85%),', '39.97ms', '(84.36%,', '1.28%),', '0us', '(0.00%,', '0.00%),', '39.97ms', '(84.36%,', '1.28%)']
['prefix/fres21/conv_a_3x1/Conv2D', '2.10MB', '(65.92%,', '0.85%),', '38.85ms', '(83.08%,', '1.24%),', '0us', '(0.00%,', '0.00%),', '38.85ms', '(83.08%,', '1.24%)']

这是你想要的吗?

【讨论】:

    【解决方案2】:

    你可以使用:

    with open('file.txt','r') as inp: 
            arr = []
            for f in inp: 
               arr.append(f)
    b = arr[514]
    c = b.split()
    

    另外,如果你想坚持你的代码,那么你可以从列表中删除空元素 使用

    Output = [x for x in c if x]
    

    【讨论】:

      【解决方案3】:

      这不是有史以来最漂亮的代码,但我相信它可以解决您的问题。我考虑过使用一些正则表达式来避免拆分百分比,但认为数据始终遵循相同的模式,所以这应该可行。

      def remove_dangling_comma(content):
          if content[-1] == ',':
              return content[:-1]
          return content
      
      data_columns = []
      with open("words.txt", 'r') as f:
          for i, line in enumerate(f):
              if i == 0:
                  continue  # skip header
              parts = line.split()
              node_name = parts[0]
              # concatenate broken parts of the same data and remove dangling commas, if any
              requested_bytes = remove_dangling_comma(' '.join([parts[1], parts[2], parts[3]]))
              total_time = remove_dangling_comma(' '.join([parts[4], parts[5], parts[6]]))
              accelerator_time = remove_dangling_comma(' '.join([parts[7], parts[8], parts[9]]))
              cpu_time = remove_dangling_comma(' '.join([parts[10], parts[11], parts[12]]))
      
              # append the processed data to the list
              data_columns.append([node_name, requested_bytes, total_time, accelerator_time, cpu_time])       
      
          print(data_columns)
      
      

      输出:

      [
          ['prefix/up20/conv2d_transpose', '18.87MB (84.66%, 7.67%)', '115.01ms (95.17%, 3.68%)', '0us (0.00%, 0.00%)', '115.01ms (95.17%, 3.68%)'],
          ['prefix/up17/conv2d_transpose', '18.87MB (76.99%, 7.67%)', '91.43ms (91.49%, 2.93%)', '0us (0.00%, 0.00%)', '91.43ms (91.49%, 2.93%)'], 
          ['prefix/fres19/conv_b_1x3/Conv2D', '2.10MB (69.33%, 0.85%)', '46.41ms (88.56%, 1.49%)', '0us (0.00%, 0.00%)', '46.41ms (88.56%, 1.49%)'], 
          ['prefix/fres5/conv_b_3x1/Conv2D', '2.10MB (68.47%, 0.85%)', '44.63ms (87.07%, 1.43%)', '0us (0.00%, 0.00%)', '44.63ms (87.07%, 1.43%)'],
          ['prefix/fres6/conv_a_3x1/Conv2D', '2.10MB (67.62%, 0.85%)', '40.19ms (85.64%, 1.29%)', '0us (0.00%, 0.00%)', '40.19ms (85.64%, 1.29%)'], 
          ['prefix/fres22/conv_a_3x1/Conv2D', '2.10MB (66.77%, 0.85%)', '39.97ms (84.36%, 1.28%)', '0us (0.00%, 0.00%)', '39.97ms (84.36%, 1.28%)'], 
          ['prefix/fres21/conv_a_3x1/Conv2D', '2.10MB (65.92%, 0.85%)', '38.85ms (83.08%, 1.24%)', '0us (0.00%, 0.00%)', '38.85ms (83.08%, 1.24%)']
      ]
      

      【讨论】: