【发布时间】:2015-03-07 10:54:12
【问题描述】:
import scrapy
from ex.items import ExItem
class reddit(scrapy.Spider):
name = "dmoz"
allowed_domains = ["reddit.com"]
start_urls = [
"http://www.reddit.com/"]
"""docstring for reddit"""
def parse(self, response):
item = ExItem()
item ["title"] = response.xpath('//p[contains(@class,"title")]/a/text()').extract()
item ["rank"] = response.xpath('//span[contains(@class,"rank")]/text()').extract()
item ["votes_dislike"] = response.xpath('//div[contains(@class,"score dislikes")]/text()').extract()
item ["votes_unvoted"] = response.xpath('//div[contains(@class,"score unvoted")]/text()').extract()
item ["votes_likes"] = response.xpath('//div[contains(@class,"score likes")]/text()').extract()
item ["video_reference"] = response.xpath('//a[contains(@class,"thumbnail may-blank")]/@href').extract()
item ["image"] = response.xpath('//a[contains(@class,"thumbnail may-blank")]/img/@src').extract()
我能够将其转换为 JSON,但在输出中我得到了 JSON 中的一个项目符号,如何删除它并仍然具有 JSON 格式?
【问题讨论】:
-
我想从我的 json 输出中完全删除它
标签: python json python-2.7 web-scraping scrapy