【发布时间】:2021-09-29 08:05:03
【问题描述】:
虽然有时我的代码运行良好,但我不确定为什么会收到此错误!
Excel file format cannot be determined, you must specify an engine manually.
下面是我的代码和步骤:
1- 客户 ID 列的列表:
customer_id = ["ID","customer_id","consumer_number","cus_id","client_ID"]
2- 在文件夹中查找所有 xlsx 文件并读取它们的代码:
l = [] #use a list and concat later, faster than append in the loop
for f in glob.glob("./*.xlsx"):
df = pd.read_excel(f).reindex(columns=customer_id).dropna(how='all', axis=1)
df.columns = ["ID"] # to have only one column once concat
l.append(df)
all_data = pd.concat(l, ignore_index=True) # concat all data
我添加了引擎openpyxl
df = pd.read_excel(f, engine="openpyxl").reindex(columns = customer_id).dropna(how='all', axis=1)
现在我得到一个不同的错误:
BadZipFile: File is not a zip file
熊猫版本:1.3.0 蟒蛇版本:python3.9 操作系统:MacOS
有没有更好的方法从文件夹中读取所有 xlsx 文件?
【问题讨论】:
标签: python python-3.x pandas dataframe