【问题标题】:paste same column from multiple files into one将多个文件中的同一列粘贴到一个文件中
【发布时间】:2015-02-03 04:21:20
【问题描述】:

我有大约 50 个制表符分隔的文件,我想将列 $7 打印到一个新文件中。所有文件都具有相同数量的列和相同数量的行。在输出中,来自不同文件的列应彼此相邻粘贴,由制表符分隔。

我正在考虑使用“ls”、“xargs”和“awk”的组合。所以 ls 找到我想要的所有文件,然后 awk 打印第 7 列并创建 output.txt

ls /folder/*_name.txt | awk '{print $7}' xargs {} > output.txt

我的主要问题是 xargs 的使用以及如何在输出文件的不同列中打印所有 $7

【问题讨论】:

  • 也许你可以使用 "cut -f 7 /folder/*_name.txt > output.txt"
  • 有两个文件你可以做paste <(awk '{print $7}' a1) <(awk '{print $7}' a2),但我猜50个你需要另一种方法。
  • @swstephe 这会将所有 $7 的所有值打印在一列中,这不是我想要的

标签: awk


【解决方案1】:

如果我理解您要正确执行的操作,那么您可以使用 awk

awk -F '\t' 'FNR == 1 { ++file } { col[FNR, file] = $7 } END { for(i = 1; i <= FNR; ++i) { line = col[i, 1]; for(j = 2; j <= file; ++j) { line = line "\t" col[i, j] }; print line } }' file1 file2 file3 file4

代码是

FNR == 1 { ++file }                 # in the first line of a file, increase
                                    # the file counter, so file is the number
                                    # of the file we're processing
{                         
  col[FNR, file] = $7               # remember the 7th column from all lines
}                                   # by line and file number

END {                               # at the end:
  for(i = 1; i <= FNR; ++i) {       # walk through the lines,
    line = col[i, 1]                # paste together the columns in that line
    for(j = 2; j <= file; ++j) {    # from each file
      line = line "\t" col[i, j]
    }
    print line                      # and print the result.
  }
}

编辑:调整为在运行中而不是在最后组装线,这可以缩短为

awk -F '\t' 'FNR == 1 && FNR != NR { sep = "\t" } { line[FNR] = line[FNR] sep $7 } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'

那是

FNR == 1 && FNR != NR {   # in the first line, but not in the first file
  sep = "\t"              # set the separator to a tab (in the first it's empty)
}
{                         # assemble the line on the fly
  line[FNR] = line[FNR] sep $7
}
END {                     # and in the end, print the lines.
  for(i = 1; i <= FNR; ++i) {
    print line[i]
  }
}

让自己呆呆地呆着,这可以进一步缩短为

awk -F '\t' '{ line[FNR] = line[FNR] sep $7 } ENDFILE { sep = "\t" } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'

...但是 ENDFILE 不为其他 awk 实现(例如 mawk)所知道,因此您可能更愿意避免使用它。

【讨论】:

  • 我在命令末尾看到了“file1 file2 file3 file4”。我不想输入文件的所有路径。这就是为什么我试图使用 ls & xargs ...
  • 你可以把$(ls foo)放在那里。或者只是/folder/*_name.txt,实际上。
【解决方案2】:

我知道这并不漂亮,但您可以使用 Python 轻松完成此任务。我在 5 分钟内编写了此代码,并在三个具有相同列和行的文件上对其进行了测试,并且成功了

import csv, os

def getData(fileDir, newFile, COLUMN):
    COLUMN = COLUMN - 1
    newFile = os.path.join(fileDir,newFile)

    #gets all filepaths for all your files in a directory
    filePaths = []
    for file in os.listdir(fileDir):
        filePaths.append(os.path.join(fileDir,file))
    originalData = []
    for f in filePaths:
        file = []
        with open(f, 'rb') as d:
            reader = csv.reader(d, delimiter='\t')
            #header = (reader.next())[COLUMN] #if you have a header in your csv file uncomment this line so it skips it
            for row in reader:
                file.append(row[COLUMN])
        originalData.append(file)

    #gets a count of how many rows are in your file
    rows = len(originalData[0])

    #creates a new list from the old list and it is now structured like below
    #new list = [[File1_Col7_Row1, File2_Col7_Row1, File3_Col7_Row1],[File1_Col7_Row2, File2_Col7_Row2, File3_Col7_Row2]]
    newData = []
    for i in range(rows):
        r = []
        for item in originalData:
            row = item[i]
            r.append(row)
        newData.append(r)

    #writes the new data to a new file
    with open(newFile, 'wb') as f:
        writer = csv.writer(f, delimiter='\t')
        for row in newData:
            writer.writerow(row)




if __name__ == "__main__":      
    #dir where ONLY the tab files reside
    fileDir = "C:\\TabFiles"
    #new file name, it will be dumped in the dir where the other files reside
    newFile = 'newTabFile.txt'
    # the column you want to grab
    columnNum = 7

    getData(fileDir, newFile, columnNum)

【讨论】:

    【解决方案3】:

    我用 Python 创建了 10 个文件:

    for i in range(1,10):
        fn='file'+str(i)+'.tsv'
        with open(fn, 'w') as f:
            for line in range(1,4):
                f.write('\t'.join('{}, line: {}, col: {}'.format(fn, line, col) for col in range(1,10)))
                f.write('\n')
    

    这会创建 10 个这种类型的文件:

    file1.tsv, line: 1, col: 1  file1.tsv, line: 1, col: 2  file1.tsv, line: 1, col: 3  file1.tsv, line: 1, col: 4  file1.tsv, line: 1, col: 5  file1.tsv, line: 1, col: 6  file1.tsv, line: 1, col: 7  file1.tsv, line: 1, col: 8  file1.tsv, line: 1, col: 9
    file1.tsv, line: 2, col: 1  file1.tsv, line: 2, col: 2  file1.tsv, line: 2, col: 3  file1.tsv, line: 2, col: 4  file1.tsv, line: 2, col: 5  file1.tsv, line: 2, col: 6  file1.tsv, line: 2, col: 7  file1.tsv, line: 2, col: 8  file1.tsv, line: 2, col: 9
    file1.tsv, line: 3, col: 1  file1.tsv, line: 3, col: 2  file1.tsv, line: 3, col: 3  file1.tsv, line: 3, col: 4  file1.tsv, line: 3, col: 5  file1.tsv, line: 3, col: 6  file1.tsv, line: 3, col: 7  file1.tsv, line: 3, col: 8  file1.tsv, line: 3, col: 9
    ...
    file9.tsv, line: 1, col: 1  file9.tsv, line: 1, col: 2  file9.tsv, line: 1, col: 3  file9.tsv, line: 1, col: 4  file9.tsv, line: 1, col: 5  file9.tsv, line: 1, col: 6  file9.tsv, line: 1, col: 7  file9.tsv, line: 1, col: 8  file9.tsv, line: 1, col: 9
    file9.tsv, line: 2, col: 1  file9.tsv, line: 2, col: 2  file9.tsv, line: 2, col: 3  file9.tsv, line: 2, col: 4  file9.tsv, line: 2, col: 5  file9.tsv, line: 2, col: 6  file9.tsv, line: 2, col: 7  file9.tsv, line: 2, col: 8  file9.tsv, line: 2, col: 9
    file9.tsv, line: 3, col: 1  file9.tsv, line: 3, col: 2  file9.tsv, line: 3, col: 3  file9.tsv, line: 3, col: 4  file9.tsv, line: 3, col: 5  file9.tsv, line: 3, col: 6  file9.tsv, line: 3, col: 7  file9.tsv, line: 3, col: 8  file9.tsv, line: 3, col: 9
    

    现在您有了这些示例文件(这就是答案),只需使用cut

    $ cut -f 7 *.tsv
    file1.tsv, line: 1, col: 7
    file1.tsv, line: 2, col: 7
    file1.tsv, line: 3, col: 7
    file2.tsv, line: 1, col: 7
    file2.tsv, line: 2, col: 7
    file2.tsv, line: 3, col: 7
    file3.tsv, line: 1, col: 7
    file3.tsv, line: 2, col: 7
    file3.tsv, line: 3, col: 7
    file4.tsv, line: 1, col: 7
    file4.tsv, line: 2, col: 7
    file4.tsv, line: 3, col: 7
    file5.tsv, line: 1, col: 7
    file5.tsv, line: 2, col: 7
    file5.tsv, line: 3, col: 7
    file6.tsv, line: 1, col: 7
    file6.tsv, line: 2, col: 7
    file6.tsv, line: 3, col: 7
    file7.tsv, line: 1, col: 7
    file7.tsv, line: 2, col: 7
    file7.tsv, line: 3, col: 7
    file8.tsv, line: 1, col: 7
    file8.tsv, line: 2, col: 7
    file8.tsv, line: 3, col: 7
    file9.tsv, line: 1, col: 7
    file9.tsv, line: 2, col: 7
    file9.tsv, line: 3, col: 7
    

    然后使用tr 标记这些结果:

    $ cut -f 7 *.tsv | tr '\n' '\t'
    file1.tsv, line: 1, col: 7  file1.tsv, line: 2, col: 7  file1.tsv, line: 3, col: 7  file2.tsv, line: 1, col: 7  file2.tsv, line: 2, col: 7  file2.tsv, line: 3, col: 7  file3.tsv, line: 1, col: 7  file3.tsv, line: 2, col: 7  file3.tsv, line: 3, col: 7  file4.tsv, line: 1, col: 7  file4.tsv, line: 2, col: 7  file4.tsv, line: 3, col: 7  file5.tsv, line: 1, col: 7  file5.tsv, line: 2, col: 7  file5.tsv, line: 3, col: 7  file6.tsv, line: 1, col: 7  file6.tsv, line: 2, col: 7  file6.tsv, line: 3, col: 7  file7.tsv, line: 1, col: 7  file7.tsv, line: 2, col: 7  file7.tsv, line: 3, col: 7  file8.tsv, line: 1, col: 7  file8.tsv, line: 2, col: 7  file8.tsv, line: 3, col: 7  file9.tsv, line: 1, col: 7  file9.tsv, line: 2, col: 7  file9.tsv, line: 3, col: 7  
    

    paste:

    $ cut -f 7 *.tsv | paste -s -d '\t' - 
    file1.tsv, line: 1, col: 7  file1.tsv, line: 2, col: 7  file1.tsv, line: 3, col: 7  file2.tsv, line: 1, col: 7  file2.tsv, line: 2, col: 7  file2.tsv, line: 3, col: 7  file3.tsv, line: 1, col: 7  file3.tsv, line: 2, col: 7  file3.tsv, line: 3, col: 7  file4.tsv, line: 1, col: 7  file4.tsv, line: 2, col: 7  file4.tsv, line: 3, col: 7  file5.tsv, line: 1, col: 7  file5.tsv, line: 2, col: 7  file5.tsv, line: 3, col: 7  file6.tsv, line: 1, col: 7  file6.tsv, line: 2, col: 7  file6.tsv, line: 3, col: 7  file7.tsv, line: 1, col: 7  file7.tsv, line: 2, col: 7  file7.tsv, line: 3, col: 7  file8.tsv, line: 1, col: 7  file8.tsv, line: 2, col: 7  file8.tsv, line: 3, col: 7  file9.tsv, line: 1, col: 7  file9.tsv, line: 2, col: 7  file9.tsv, line: 3, col: 7
    

    【讨论】:

      猜你喜欢
      • 2021-09-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-25
      • 2019-11-18
      • 2016-01-15
      • 1970-01-01
      相关资源
      最近更新 更多