预期的 str、bytes 或 os.PathLike 对象，而不是 DataFrame答案

【问题标题】：expected str, bytes or os.PathLike object, not DataFrame预期的 str、bytes 或 os.PathLike 对象，而不是 DataFrame
【发布时间】：2021-12-23 16:15:46
【问题描述】：

我尝试使用 NLP 加载嵌入文件以进行词性分析。但它显示

TypeError                                 Traceback (most recent call last)
<ipython-input-33-94170a7f0621> in <module>()
      2 
      3 def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
----> 4 embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE))

TypeError: expected str, bytes or os.PathLike object, not DataFrame

我该怎么办？

import pandas as pd
from google.colab import drive
    
drive.mount('/content/drive/')
    
EMBEDDING_FILE = pd.read_csv('/content/drive/MyDrive/ML/paragram_300_sl999-2.txt', encoding= 'unicode_escape', sep=" ", header=None)
    
def get_coefs(word,*arr): 
    return word, np.asarray(arr, dtype='float32')

embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE))

【问题讨论】：

您可以添加一个文本文件示例吗？
请提供足够的代码，以便其他人更好地理解或重现问题。

标签： python nlp part-of-speech

【解决方案1】：

问题是由将txt 文件转换为pandas.DataFrame 引起的。您可以像这样简单地使用文本文件：

import numpy as np 
EMBEDDING_FILE = '../input/paragram-300-sl999/paragram_300_sl999.txt'
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE, encoding="utf8", errors='ignore') if len(o)>100)
print(type(embeddings_index), type(embeddings_index['the']), embeddings_index['the'].shape, len(list(embeddings_index.keys())))

输出：

<class 'dict'> <class 'numpy.ndarray'> (300,) 66199

【讨论】：

@lui carmen 的答案有帮助吗？