【问题标题】:python pandas, unicode decode error on read_csv [duplicate]python pandas,read_csv上的unicode解码错误[重复]
【发布时间】:2019-12-27 03:58:34
【问题描述】:

导入 csv 文件时出现错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte

追溯:

Traceback (most recent call last):

  File "<ipython-input-2-99e71d524b4b>", line 1, in <module>
    runfile('C:/AppData/FinRecon/py_code/python3/DataJoin.py', wdir='C:/AppData/FinRecon/py_code/python3')

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
    execfile(filename, namespace)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 500, in <module>
    M5()

  File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 221, in M5
    s3 = pd.read_csv(working_dir+"S3.csv", sep=",") #encode here encoding='utf-16

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
    data = parser.read(nrows)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
    ret = self._engine.read(nrows)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read
    data = self._reader.read(nrows)

  File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read

  File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory

  File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows

  File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data

  File "pandas/_libs/parsers.pyx", line 1176, in pandas._libs.parsers.TextReader._convert_tokens

  File "pandas/_libs/parsers.pyx", line 1299, in pandas._libs.parsers.TextReader._convert_with_dtype

  File "pandas/_libs/parsers.pyx", line 1315, in pandas._libs.parsers.TextReader._string_convert

  File "pandas/_libs/parsers.pyx", line 1553, in pandas._libs.parsers._string_box_utf8

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte

我试过的:

`s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='utf-16')`

我收到错误UnicodeError: UTF-16 stream does not start with BOM

如何才能正确读取此文件?

【问题讨论】:

  • 假定的重复与 Unicode 解析错误完全无关。投票重新开放。

标签: python-3.x pandas


【解决方案1】:

尝试使用s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='Latin-1')

大多数编码问题与数据中的字符有关。根据 pandas 的文档,虽然 utf-8 支持所有语言,但 utf-8 具有必须始终遵守的字节结构。一些未包含在 utf-8 中的值是带有分音符号的拉丁小写字母 i、右双角引号、倒置问号。这分别映射为 0xef、0xbb 和 0xbf 字节。因此你的错误。

【讨论】:

  • 谢谢,它成功了。你能解释一下为什么吗?
猜你喜欢
  • 2013-05-21
  • 2018-06-02
  • 2021-09-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-11-21
  • 1970-01-01
相关资源
最近更新 更多