【发布时间】:2020-03-08 07:59:38
【问题描述】:
我正在使用 python3 在我的服务器上的 Jupyter Notebooks 中运行我的抓取项目。由于某种原因,运行 Tabula.read_pdf 时出现 Tabula-py / Tabula 错误并返回 TypeError: expected str, bytes or os.PathLike object, not builtin_function_or_method。我如何使它工作?我正在传递实际的 PDF 文件。
我的错误代码
import tabula
df = tabula.read_pdf("20200125-sitrep-5-2019-ncov.pdf", pages=all)
我的错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-4f86b7402956> in <module>
----> 1 df = tabula.read_pdf("20200125-sitrep-5-2019-ncov.pdf", pages=all)
/usr/local/lib/python3.7/dist-packages/tabula/io.py in read_pdf(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, user_agent, **kwargs)
320
321 try:
--> 322 output = _run(java_options, kwargs, path, encoding)
323 finally:
324 if temporary:
/usr/local/lib/python3.7/dist-packages/tabula/io.py in _run(java_options, options, path, encoding)
83 stderr=subprocess.PIPE,
84 stdin=subprocess.DEVNULL,
---> 85 check=True,
86 )
87 if result.stderr:
/usr/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
470 kwargs['stderr'] = PIPE
471
--> 472 with Popen(*popenargs, **kwargs) as process:
473 try:
474 stdout, stderr = process.communicate(input, timeout=timeout)
/usr/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
773 c2pread, c2pwrite,
774 errread, errwrite,
--> 775 restore_signals, start_new_session)
776 except:
777 # Cleanup if the child failed starting.
/usr/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1451 errread, errwrite,
1452 errpipe_read, errpipe_write,
-> 1453 restore_signals, start_new_session, preexec_fn)
1454 self._child_created = True
1455 finally:
TypeError: expected str, bytes or os.PathLike object, not builtin_function_or_method
我的 PDF 文件名为 20200125-sitrep-5-2019-ncov.pdf。这是我抓取的 pdf - https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200125-sitrep-5-2019-ncov.pdf?sfvrsn=429b143d_8
【问题讨论】:
标签: python-3.x pdf web-scraping jupyter-notebook