从 CSV 文件列表生成 Pandas DataFrames答案

【问题标题】：Generate Pandas DataFrames from CSV file list从 CSV 文件列表生成 Pandas DataFrames
【发布时间】：2021-11-12 01:34:34
【问题描述】：

提出问题。我正在搜索所有 csv 文件的目录。我将每个 csv 文件的路径与描述一起保存到 DataFrame 中。我知道想要遍历 DataFrame，并将特定的 csv 文件读入一个数据帧，其名称是从原始文件名生成的。我无法弄清楚如何动态生成这些数据帧。几天前我开始编码，如果语法不好，请见谅。

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

【问题讨论】：

标签： python pandas dataframe csv

【解决方案1】：

最好的方法是创建一个字典，其键是文件名，值是对应的 DataFrame。现代方法不是使用os.path 和glob，而是使用标准库中的pathlib。

假设您实际上不需要包含文件名的 DataFrame，而只需要每个 csv 文件的 DataFrame，您可以简单地做

from pathlib import Path

PATH = Path("Z:\Adam")
EXT = "*.csv"

# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}

# recursive search for csv files in PATH folder and subfolders 
for csv_file in PATH.rglob(EXT):
    filename = csv_file.name     # get the filename 
    df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
    files_dfs[filename] = df     # add the DataFrame to the dictionary

然后，要访问特定文件的 DataFrame，您可以这样做

filename_df = files_dfs["<filename>"]

【讨论】：

非常感谢您的回复。它确实工作得很好。我将不得不更加关注做这些事情的现代方式，因为这是一个非常简单的解决方案。再次感谢您！
@Adam 不客气！我很高兴能教你一些新东西；）别担心，我花了很长时间才知道pathlib。解决方案的简单性和有效性只是实践问题！这需要时间，但它自然而然，你会看到！编码愉快！