【发布时间】:2022-01-20 10:56:09
【问题描述】:
我正在尝试创建一个循环,该循环设置一个带有索引 (df1) 的数据框并遍历选定的文件夹,找到一个 txt 文件并提取第二列(称为计数)并将其添加到 df1。然后它继续遍历文件夹并对下一个文件执行相同的操作,将其添加到 df1。结果,它应该给我一个已处理的 txt 文件,其中包含索引和第一个文件的计数列,下一列包含第二个文件的计数,依此类推。
我的循环确实存在问题,无法让它停止覆盖第一个 txt 文件计数。最重要的是,它一直将新的列标题视为数据单元格,这会使所有内容失去平衡。就目前而言,它只是覆盖并在本应成为下一列的第一行中留下一个随机整数。
任何帮助将不胜感激。为打印行数道歉,我只是想确定我理解每个步骤在做什么。
def changeFolder(self):
folder = QFileDialog.getExistingDirectory(None, 'Project Data', '.csv files')
print(folder)
if folder == None:
return
else:
print(folder)
from glob import glob
import pandas as pd
import numpy as np
import os
# create lag variable for the time lag array from -50 to 50
lag = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99]
#generate data frame with the lag time in one column
df1 = pd.DataFrame(index=lag)
#print
print('df1', df1)
#for every file in the directory folder specified
for file in os.listdir(folder):
print('folder', folder)
if file.endswith(".txt"):
print('file', file)
selfolder = folder
newpath = os.path.abspath(os.path.join(selfolder, file))
print('newpath', newpath)
#read the file in the loop
df2 = pd.read_csv(newpath, delimiter=" ", dtype="Int64", header=None)
df2.to_string(index=False)
#df2.columns = ['Lag', 'Counts']
#take the second column of said folder and save it to the original dataframe
print('df2', df2)
#counts = df2.iloc[:,1]
print('now for the counts')
print(df2.iloc[:,1])
df2['count'] = df2.iloc[:,1]
df1['df1count'] = df2['count']
df1.df1count = df1.df1count.astype(float)
print(df1.df1count)
count_df = pd.DataFrame(data={len(df2['count'].groupby(df2['count']))}, columns=['test'])
new_df = pd.concat([df1, count_df], axis=1)
print(new_df)
continue
savepath = newpath[:-4]
#save and convert to .txt
new_df.to_csv(savepath + ' processed.txt')
##Dialogue box in case of success
mbox = QMessageBox()
mbox.setText("Hopefully this worked!")
mbox.setDetailedText("")
mbox.setStandardButtons(QMessageBox.Ok)
mbox.setWindowTitle('CSV Batch Processor')
mbox.exec_()
【问题讨论】:
标签: python pandas dataframe loops