运行 shell 命令的 Python 脚本答案

【问题标题】：Python script to run shell command运行 shell 命令的 Python 脚本
【发布时间】：2013-08-21 18:13:41
【问题描述】：

我遇到了一些麻烦。我想在 Python 中的特定目录中运行 shell 命令。根据我在互联网上找到的代码，我需要以下内容：

import os
import subprocess
import shlex

然后代码本身在下面

os.chdir('/etc/test/')
cmd = 'scrapy crawl test'
subprocess.call(shlex.split(cmd))

看起来，我正在尝试在 /etc/test/ 目录中运行命令“scrapy crawl test”。当我用终端手动运行它时，它似乎工作正常，但是当我用这个 python 代码运行它时，它给了我一个错误：

抓取时发生信息异常：[Errno 2] 没有此类文件或目录

有没有人能告诉我我的代码是否不正确，或者我是否以错误的方式处理这个问题。

【问题讨论】：

是否有任何额外的回溯信息，或者只是那一行错误？
附带说明，cmd = ['scrapy', 'crawl', 'test'] 然后subprocess.call(cmd) 更简单，而且可能更难出错；无需在这里使用shlex。但这不会影响您要解决的问题。
@abamert 恐怕我找不到任何进一步的追溯信息。在你的情况下，我还需要 os.chdir 命令吗？
@Jimmy：scrapy 是一个 Python 库。你读过入门和教程的东西吗？你想在 Python 中做的所有事情，都可以在 Python 中完成。您想在 shell 或 cronjob 中执行的所有操作，都可以使用命令行工具来完成。如果您尝试从 Python 中运行命令行工具，那么您可能在此过程的早期犯了一个错误……但是如果没有更多关于您正在尝试做什么的信息，就很难确定那是什么。
@Jimmy：无论如何，您看到的错误来自scrapy，而不是来自您的代码。这可能意味着您的蜘蛛存在错误，或者您的目录布局不是您所期望的，或者其他一百万件事。您是否尝试过使用scrapy shell 进行调试，如教程中所述？

标签： python shell debian subprocess scrapy

【解决方案1】：

为什么要使用子流程？从脚本运行Scrapy 的常见做法是使用twisted 的reactor。取自docs：

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

有很多例子：

希望对您有所帮助。

【讨论】：