【问题标题】:AttributeError: format not found - pyodide + joblib.dump + scikit-learn (TfidfVectorizer)AttributeError:找不到格式 - pyodide + joblib.dump + scikit-learn (TfidfVectorizer)
【发布时间】:2021-03-21 09:55:05
【问题描述】:

我使用 pickle 腌制了一个 SMS 垃圾邮件预测模型。现在,我想使用 Pyodide 在浏览器中加载模型。

我已经在浏览器中使用 pickle.loads 加载了腌制文件:

console.log("Pyodide loaded, downloading pretrained ML model...")
const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "")
console.log("Loading model into Pyodide...")
await pyodide.loadPackage("scikit-learn")
await pyodide.loadPackage("joblib")
pyodide.runPython(`
    import base64
    import pickle
    from io import BytesIO
    classifier, vectorizer = pickle.loads(base64.b64decode('${model}'))
`)

这行得通。

但是,当我尝试打电话时:

const prediction = pyodide.runPython(`
    vectorized_message = vectorizer.transform(["Call +172949 if you want to get $1000 immediately!!!!"])
    classifier.predict(vectorized_message)[0]
`)

它给出了一个错误(在vectorizer.transform中):AttributeError: format not found

完整的错误转储如下。

Uncaught (in promise) Error: Traceback (most recent call last):
  File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code
    eval(compile(mod, "<exec>", mode="exec"), ns, ns)
  File "<exec>", line 2, in <module>
  File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform
    return self._tfidf.transform(X, copy=False)
  File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform
    X = X * self._idf_diag
  File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__
    return self._mul_sparse_matrix(other)
  File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix
    other = self.__class__(other)  # convert to this format
  File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__
    if arg1.format == self.format and copy:
  File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: format not found

    _hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    __runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    _runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    <anonymous> http://localhost/:41
    async* http://localhost/:46
pyodide.asm.js:8:39788

但在 Python 中它可以正常工作。

我可能做错了什么?

【问题讨论】:

    标签: python scikit-learn pyodide


    【解决方案1】:

    这可能是泡菜可移植性问题。 Pickles 应该可以在架构之间移植¹,这里是 amd64wasm32 但是 they are not portable across package versions。这意味着包版本在您训练模型的环境和进行推理的环境 (pyodide) 之间应该是相同的。

    pyodide 0.16.1 包括 Python 3.8.2、scipy 0.17.1 和 scikit-learn 0.22.2。不幸的是,这意味着您必须从源代码构建该版本的 scipy(可能还有 numpy)来训练模型,因为对于这种过时版本的 scipy,不存在 Python 3.8 二进制轮。将来应该通过pyodide#1293 解决这个问题。

    您遇到的特定错误可能是由于scipy.sparse 版本不匹配,请参阅scipy#6533

    ¹尽管,目前 scikit-learn 中基于树的模型不能跨架构移植,因此不会在 pyodide 中解开。这是应该修复的已知错误 (scikit-learn#19602)

    【讨论】:

      猜你喜欢
      • 2018-01-23
      • 2014-08-22
      • 2014-11-12
      • 2019-04-03
      • 2017-05-26
      • 2015-08-30
      • 1970-01-01
      • 2016-08-16
      相关资源
      最近更新 更多