【问题标题】:Reading a csv file into a Pandads dataframe with more than one separator for the values将 csv 文件读入具有多个分隔符的 Pandas 数据框中的值
【发布时间】:2021-06-06 18:39:08
【问题描述】:

我有一个 csv 文件,它以逗号符号作为分隔符,同时值用 " 分隔。第一行是文本,第二行是空的,第三行由列标题组成。如果我尝试使用代码将文件导入使用 pandas 的数据框

IE00B0M62Q58 = pd.read_csv('ETF/sample.csv', sep=',')

我收到类似的错误

ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 13

如何将文件读入 Pandas 中的数据框?

我复制并粘贴了 sample.csv 文件,如下所示:

Fondsposition per,"03.Jun.2021"


Emittententicker,Name,Anlageklasse,Gewichtung (%),Kurs,Nominale,Marktwert,Nominalwert,Sektor,ISIN,Börse,Standort,Marktwährung
"AAPL","APPLE INC","Aktien","3,63","123,54","1.722.459","212.792.585","212.792.584,86","IT","US0378331005","NASDAQ","Vereinigte Staaten","USD"
"MSFT","MICROSOFT CORP","Aktien","3,08","245,71","735.512","180.722.654","180.722.653,52","IT","US5949181045","NASDAQ","Vereinigte Staaten","USD"
"AMZN","AMAZON COM INC","Aktien","2,38","3.187,01","43.863","139.791.820","139.791.819,63","Zyklische Konsumgüter ","US0231351067","NASDAQ","Vereinigte Staaten","USD"
"FB","FACEBOOK CLASS A INC","Aktien","1,37","326,04","245.671","80.098.573","80.098.572,84","Kommunikation","US30303M1027","NASDAQ","Vereinigte Staaten","USD"
"GOOG","ALPHABET INC CLASS C","Aktien","1,24","2.404,61","30.223","72.674.528","72.674.528,03","Kommunikation","US02079K1079","NASDAQ","Vereinigte Staaten","USD"

【问题讨论】:

  • 我尝试编辑文本以设置格式,但在初始格式中不清楚 Fondsposition per,"03.Jun.2021" 是否在 sample.csv 中。
  • 谢谢,是的 "Fondsposition per,"03.Jun.2021" 是第一行,之后我会删除该行我设法将 csv 文件读入数据框,但我不是由于有两个分隔符和“.

标签: python pandas dataframe


【解决方案1】:

尝试在调用中使用decimal 参数

IE00B0M62Q58 = pd.read_csv('ETF/sample.csv', sep=',', decimal=',')

另外如果.是数字分隔符,2.404,612404.61,那么你可以使用thousands参数:

IE00B0M62Q58 = pd.read_csv('ETF/sample.csv', sep=',', decimal=',' thousands='.')

如果您想在开头跳过读取特定行,请添加跳过

IE00B0M62Q58 = pd.read_csv('ETF/sample.csv', sep=',',skiprows=2, thousands='.', decimal=',') 

【讨论】:

  • 非常感谢,它运行良好。我唯一不明白为什么我们不需要考虑“,但它仍然有效。我需要使用skiprow = 2 IE00B0M62Q58 = pd.read_csv('ETF / sample.csv',sep ='跳过前两行,',skiprows=2,数千='.', decimal=',')
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-05-31
  • 1970-01-01
  • 2021-01-22
  • 1970-01-01
  • 1970-01-01
  • 2020-10-03
相关资源
最近更新 更多