将多个 csv 文件加载到 Dataframe 中：列名问题答案

【问题标题】：load multiple csv files into Dataframe: columns names issue将多个 csv 文件加载到 Dataframe 中：列名问题
【发布时间】：2018-04-25 18:22:19
【问题描述】：

我有多个格式相同的 csv 文件（14 行 4 列）。我试图将它们全部加载到一个数据帧中，并使用文件名重命名第一列的值（1-14）

    1   500 0   0
    2   350 0   1
    3   500 1   0
    .............
    13  600 0   0
    14  800 0   0

我尝试了以下代码，但没有得到预期的结果：

    filenames = os.listdir('Threshold/')
    Y = pd.DataFrame () #empty df
    # file name are in the following foramt "subx_ICA_thre.csv"
    # need to get x (subject number to be used later for renaming columns values)
    Sub_list=[]
    for filename in filenames:
    s= int(''.join(filter(str.isdigit, filename)))
    Sub_list.append(int(s))
    S_Sub_list= sorted(Sub_list) 

    for x in S_Sub_list: # get the file according to the subject number
    temp = pd.read_csv('sub' +str(x)+'_ICA_thre.csv' )
    df = pd.concat([Y, temp])  # concat the obtained frame with the empty frame
    df.columns = ['id', 'data', 'isEB', 'isEM']
    #  replace the column values using subject id
         for sub in range(1,15):
           df['id'].replace(sub, 'sub' +str(x)+'_ICA_'+str(sub) ,inplace=True)
    print (df)

输出：

                id  data  isEB  isEM
   0    sub1_ICA_2   200     0     0
   1    sub1_ICA_3   275     0     0
   2    sub1_ICA_4   500     1     0
   ................................
   11  sub1_ICA_13   275     0     0
   12  sub1_ICA_14   300     0     0
                id  data  isEB  isEM
   0    sub2_ICA_2   275     0     0
   1    sub2_ICA_3   500     0     0
   2    sub2_ICA_4   400     0     0
   .................................
   11  sub2_ICA_13   300     0     0
   12  sub2_ICA_14   450     0     0

首先，似乎代码使不同的dataFrame不是一个。其次，第一行被删除（sub1_ICA_1丢失，可能被替换为列名）。我在使用的循环中找不到问题

【问题讨论】：

标签： python-2.7 pandas csv

【解决方案1】：

我认为您需要首先创建DataFrames 的列表，然后在MultiIndex 中使用range 的参数keys 为新值创建concat，然后修改列id，最后删除MultiIndex reset_index:

还将参数名称添加到 read_csv 以用于自定义列名称。

Y = []
for x in S_Sub_list: 
    n = ['id', 'data', 'isEB', 'isEM']
    temp = pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n)
    Y.append(temp)

#list comprehension alternative
#n = ['id', 'data', 'isEB', 'isEM']
#Y = [pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n) for x in S_Sub_list]

df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))

df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'_ICA_'+ df['id'].astype(str)
df = df.reset_index(drop=True)

【讨论】：

谢谢。它有效，除了来自 df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'ICA'+ df['id'] 的错误（不能连接 'str' 和 'long' 对象）
看来id列不是strings，所以需要df['id'].astype(str)
谢谢杰拉泽尔！它可以工作，但是当我需要通过删除 df = pd.concat(Y, keys=range(1,15)) 中的键来加载所有文件时，我将第三列和第四列的值设为 float 0.0 和 1.0跨度>
嗯，没有你的文件有点复杂，但似乎你需要df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))，因为不只有14文件;)
非常感谢您的帮助，我使用了 df['isEB'] = df.isEB.astype(int)