【问题标题】:Data Formatting within a txt. Filetxt 中的数据格式。文件
【发布时间】:2020-11-18 05:35:50
【问题描述】:

我有以下 txt 文件,需要对整个文件中的数据使用特定的开始和结束位置进行格式化。例如,第 1 列是空白的,将被读取为条目号。此数据类型的值是数字 9,应具有以下位置 (1-9)。接下来是职位 (10-15) 的员工 ID.. 等等。值不需要分隔符。

,MB4858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,MD6535,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,PM7858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,RM0111,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,RY2585,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,TM0617 ,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,VE2495,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,VJ8913,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,FJ4815 ,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,OM0188,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,H00858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H08392,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H15624,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H27573,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H40249,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H44581,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H48473,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H51570,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H55768,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H64315,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H71507,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H72248,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H78527,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H90393,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H95973,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,

【问题讨论】:

  • 你卡在哪里了? Python 输出格式和字符串操作在许多地方都有在线记录。给您带来问题的代码在哪里?
  • 你确定吗?这看起来更像是一个 csv 文件。
  • 是的,这是我从 excel 导出的 csv 文件。我想编辑格式,删除逗号分隔符并在每种数据类型之间添加一个特定的值(空格)。
  • 你能分享一下你到目前为止的尝试吗?

标签: python pandas


【解决方案1】:

您可以尝试从这里开始:

import sys

inFile = sys.argv[1]

outFile = "newFile.txt"

with open(inFile, 'r') as inf, open(outFile, 'w') as outf:

        for line in inf:
            line = line.split(',')
            print(line)

其中 sys argv[1] 是从命令行运行 python 脚本时 txt 文件的名称。

您可以看到它将打印出一个列表,其中包含您在 txt 数据文件中的逗号分隔符之间的各个字符串。从那里您可以进行列表操作来格式化数据。然后像这样写到outf(例子):

# do what ever manipulations here to the output line 

output_line = line[0] + " " + line[1]
                outf.write(output_line)
                outf.write('\n'

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-05-14
    • 1970-01-01
    • 2012-01-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多