【问题标题】:How can I merge multiple csv files with Python as I want?如何根据需要将多个 csv 文件与 Python 合并?
【发布时间】:2019-09-03 21:45:44
【问题描述】:

我有几个 csv 文件用于作业。我想将它们组合起来,如下例所示。但我不知道该怎么做。

Exp1.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"

Exp2.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"

我想像这样合并 2 个文件。我只想获取 VOL% 列。


"DATE","Exp1","Exp2"
"01.09.2019","-7,04%","0,73%"
"25.08.2019","4,07%","1,98%"

谢谢大家:) 我找到了这样的解决方案并应用了它。

import glob
import os
import pandas.io

path =r'/Users/baris/Documents/Files/'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pandas.read_csv(f) for f in all_files)
concatenated_df = pandas.concat(df_from_each_file, axis=1)
concatenated_df_clean = (concatenated_df.drop('DATE',1).drop('NOW',1).drop('OPEN',1).drop('HIGH.',1).drop('Low',1).drop('Hac.',1)

df_dates_file = pandas.read_csv('/Users/baris/Documents/Files/Exp1.csv')
df_date_export = concatenated_df.iloc[:, 0]

final_result = pandas.concat([df_date_export,concatenated_df_clean], axis=1)
print(final_result)


【问题讨论】:

  • VOL % 永远是第 7 列吗?
  • 是的,它总是在第 7 列中。

标签: python pandas csv row


【解决方案1】:
import csv

with open('Exp1.csv', 'r') as f1:
    csv_reader = csv.reader(f1, delimiter=',')
    lines1 = [row for row in csv_reader]

with open('Exp2.csv', 'r') as f2:
    csv_reader = csv.reader(f2, delimiter=',')
    lines2 = [row for row in csv_reader]

del lines1[0]
del lines2[0]
with open('output.csv', 'w+') as output_file:
    output_file.write('"DATE","Exp1","Exp2"\n')
    for index, _ in enumerate(lines1):
        date = lines1[index][0]
        vol1 = lines1[index][6]
        vol2 = lines2[index][6]
        output_file.write(f'"{date}","{vol1}","{vol2}"\n')

这假设如下:

  • VOL % 始终位于第 7 列(如您的示例中)
  • DATE 总是在第一列(就像你的例子一样)
  • Exp1.csvExp2.csv 中的行数总是相同的
  • "DATE"Exp1.csvExp2.csv 中总是相同的

阅读有关 CSV 模块的更多信息:https://docs.python.org/3/library/csv.html

【讨论】:

  • 我有 100 个 .csv 文件。这种方式对我不利,但我为我的其他项目保存了您的代码。谢谢你:)
  • @BarisOZER 不用担心,很乐意提供帮助。如果您在我的回答中看到有价值的信息,您可以点击我的回答旁边的绿色复选标记接受它。
【解决方案2】:

您可以使用 pandas 包来读取和保存 csv。 但是,在合并 csv 文件时不能删除列,但可以保存所需的列 看看我下面的代码。 将 csv 文件名和列名替换为您的。

import pandas as pd

# create list of files you want to merge
all_filenames = ['test.csv','test1.csv']

# use pandas concat function to merge csv's
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

# export the csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig',columns=['test1'])

【讨论】:

    【解决方案3】:

    试试这样的:

    df = pd.read_csv('Exp1.csv')
    
    df1 = pd.read_csv('Exp2.csv')
    
    df['DATE'] = pd.to_datetime(df['DATE'])
    df1['DATE'] = pd.to_datetime(df['DATE'])
    
    final_df = pd.merge(df[['DATE', 'VOL %']], df1[['DATE', 'VOL %']], on='DATE')
    
    print(final_df)
          DATE VOL %_x VOL %_y
    2019-01-09  -7,04%   0,73%
    2019-08-25   4,07%   1,98%
    

    【讨论】:

      【解决方案4】:

      使用 csv 模块。

      https://docs.python.org/3/library/csv.html

      阅读本教程:

      https://realpython.com/python-csv/

      这样的事情就可以了:(教育代码)

      import io
      import csv
      
      target = {}
      
      file_one_string =\
      """"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
      "01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
      "25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"
      """
      file_two_string = \
      """"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
      "01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
      "25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"
      """
      
      
      with io.StringIO(file_one_string) as file_one:
          csv_reader = csv.DictReader(file_one,delimiter=',',quotechar='"')
          for row in csv_reader:
              if 'VOL %' in row:
                  target[row['DATE']] ={'Exp1': row['VOL %']}
      
      with io.StringIO(file_two_string) as file_two:
          csv_reader = csv.DictReader(file_two,dialect="excel")
          for row in csv_reader:
              if row['DATE'] in target:
                  target[row['DATE']]['Exp2'] = row['VOL %']
              else:
                  print('Missing DATE {} in file_two'.format(row['DATE']))
          lines2 = [row for row in csv_reader]
      
      
      with io.StringIO() as output_file:
          fieldnames = ['DATE', 'Exp1', 'Exp2']
          csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
          csv_writer.writeheader()
          for key, value in target.items():
              csv_writer.writerow({
                  'DATE': key,
                  'Exp1': value['Exp1'],
                  'Exp2': value['Exp2']
              })
      
          print(output_file.getvalue())
      
      

      【讨论】:

        猜你喜欢
        • 2021-04-28
        • 1970-01-01
        • 2020-07-30
        • 2021-01-04
        • 2021-05-12
        • 2022-01-03
        • 2014-05-17
        • 1970-01-01
        • 2019-11-08
        相关资源
        最近更新 更多