【问题标题】:Generate Pandas DataFrames from CSV file list从 CSV 文件列表生成 Pandas DataFrames
【发布时间】:2021-11-12 01:34:34
【问题描述】:

提出问题。我正在搜索所有 csv 文件的目录。我将每个 csv 文件的路径与描述一起保存到 DataFrame 中。我知道想要遍历 DataFrame,并将特定的 csv 文件读入一个数据帧,其名称是从原始文件名生成的。我无法弄清楚如何动态生成这些数据帧。几天前我开始编码,如果语法不好,请见谅。

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

【问题讨论】:

    标签: python pandas dataframe csv


    【解决方案1】:

    最好的方法是创建一个字典,其键是文件名,值是对应的 DataFrame。现代方法不是使用os.pathglob,而是使用标准库中的pathlib

    假设您实际上不需要包含文件名的 DataFrame,而只需要每个 csv 文件的 DataFrame,您可以简单地做

    from pathlib import Path
    
    PATH = Path("Z:\Adam")
    EXT = "*.csv"
    
    # dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
    files_dfs = {}
    
    # recursive search for csv files in PATH folder and subfolders 
    for csv_file in PATH.rglob(EXT):
        filename = csv_file.name     # get the filename 
        df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
        files_dfs[filename] = df     # add the DataFrame to the dictionary
    

    然后,要访问特定文件的 DataFrame,您可以这样做

    filename_df = files_dfs["<filename>"]
    

    【讨论】:

    • 非常感谢您的回复。它确实工作得很好。我将不得不更加关注做这些事情的现代方式,因为这是一个非常简单的解决方案。再次感谢您!
    • @Adam 不客气!我很高兴能教你一些新东西;)别担心,我花了很长时间才知道pathlib。解决方案的简单性和有效性只是实践问题!这需要时间,但它自然而然,你会看到!编码愉快!
    猜你喜欢
    • 1970-01-01
    • 2017-12-25
    • 2018-02-09
    • 2014-02-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多